HOW TO READ OUTLOOK EXPRESS FILES (.DBX)

1) How these files are organized

The message headers are stored in an orderly way using tables. The table is
divided in two parts, the header and the entries. The position or offset from
BOF of the first table is stored in offset $30 from BOF (also, the number of
message headers is stored in offset $C4 form BOF).

The header contain the number of entries in the table, and the position of the
next or previous table. I use this structure to read the table header:

Toe5_IndexHeader = record
    FilePos: longint;  {this is the offset of the structure from BOF, use for
control}
    Unknown1: longint; { ??? }
    PrevIndex: longint;  {position or offset from BOF of previous table}
    NextIndex: longint;  {position or offset from BOF of next table}
    Count: longint; { number of entries in the table, decode or convert using
count shr 8 }
    Unknown2: longint; { ??? }
end;

To get the real number of entries in the table once you read the header, need
to convert the count value using < count shr 8 > .

Each entry contain the position or offset (always from BOF) of the Message
Header, and  the position or offset of other index table (this table is used to
maintain the message threading, that is, any message header pointed by this
table is a child of this message). Here is the structure of an entry:

Toe5_IndexItem = record
    HeaderPos: longint;  {position or offset from BOF of the message header}
    ChildIndex: longint; {position or offset from BOF of a child index table}
    Unknown: longint;
end;

I've found that the best way (at least for me) to read the tables is using a
recursive function, here is how they work:

ReadTable(position)
Begin
     Add position to a visited list
     Read Toe5_IndexHeader

     If  Toe5_IndexHeader.NextIndex <> 0 then
        If not Toe5_IndexHeader.NextIndex in visited list then
           ReadTable(Toe5_IndexHeader.NextIndex)

     If  Toe5_IndexHeader.PrevIndex <> 0 then
        If not Toe5_IndexHeader.PrevIndex in visited list then
           ReadTable(Toe5_IndexHeader.PrevIndex)

     For each entry of the table do
          Read Toe5_IndexItem
          Get MessageHeader pointed by Toe5_IndexItem.HeaderPos
          If Toe5_IndexItem.ChildIndex <>0 then
             If not Toe5_IndexItem.ChildIndex in visited list then
                ReadTable(Toe5_IndexItem.ChildIndex)
End


Message Header:
The message header contain the relevant information of the message, outlook
express use this to avoid access the message until this is necessary . This
structure is divided in three parts, a) header of the structure, b) a table of
DWORD, and c) a data block

a) the header of the structure is what you read first, this is necessary to
determine the size of the other two parts. In the header is stored the size of
the structure ( the three parts), the size of the table and data part summed,
and the number of elements in the table ( I call this elements FLAGS).

THeaderData = record
    position: longint;  {this is the offset of the structure from BOF, use for
control}
    DataLength: longint; {size of the table and data}
    HeaderLength: WORD;  {size of the three parts}
    FlagCount: WORD;  {number of elements in the table}
end;

To get the size of the table use Flagcount * sizeof(DWORD), and to get the size
of the data use DataLength - size of the table.

b) each element (flag) in the table need to be decoded, to obtain the Id and
the value of the flag: to get the Id use element and $FF, to get the value use
element shr 8.

c) The data block contain information like received date, sent date, subject,
receipt (to:), from:, references, account, etc.; to read these information use
the flags, here is an example of the flags and the data block:

Flags = 16

80: 00000074
81: 00000081
02: 00000000
84: 0002ECA0
05: 00000008
06: 00000025
07: 0000002D
08: 0000006E
0D: 0000008B
0E: 000000A5
90: 00000003
91: 0000376F
12: 000000D4
13: 000000DC
14: 00000102
1C: 0000012A

Data Block:

00 72 F3 E4 58 22 C0 01 41 63 74 69 76 65 57 65
    r  o  a  X  "  A  _  A  c  t  i  v  e  W  e

[skipped]

Now let's see which signifies each flag:
For "Folders.dbx":

$2 : the value is the offset from the begin of the data block to the name of
the folder (null terminated string)

$3 : the value is the offset from the begin of the data block to the name of
the dbx file that store this folder (null terminated string)

$6 : the value is not important, if a folder have this flag then is what I call
special folders, these folders doesn't have a correspondent file and are used
for a matter of organization

$80 : the value is the Id of the folder

$81 : the value is the Id of the parent folder, (the flag $80 of the parent
folder)

for the other files:

$2: the value is the offset from the begin of the data block to the "Sent
Date", the date is
stored in a TFileTime type

$4: sometimes, a DWORD is not sufficient to store the flag and the position of
the message in the file, then the position is stored in the data block, this id
is the offset from the begin of the data block of the position of the message
(DWORD)

$7: the value is the offset from the begin of the data block to the MessageID
of the message (null terminated string)

$8: the value is the offset from the begin of the data block to the Subject of
the message (null terminated string)

$9: the value is the offset from the begin of the data block to the "From
Reply" of the message (null terminated string)

$A: the value is the offset from the begin of the data block to the References
of the message (null terminated string)

$B: the value is the offset from the begin of the data block to the NewsGroup
of the message (null terminated string)

$D: the value is the offset from the begin of the data block to the "From:"
data (null terminated string)

$E: the value is the offset from the begin of the data block to the "Reply to:"
data (null terminated string)

$12: the value is the offset from the begin of the data block to the "Received
Date", the date is stored in a TFileTime type

$13: the value is the offset from the begin of the data block to the "Receipt
(To:)" data (null terminated string)

$1A: the value is the offset from the begin of the data block to the "Account"
data (null terminated string)

$1B: the value is the offset from the begin of the data block to the
"AccountID" data (null terminated string)

$80: The value is the Message number

$81: the value is used to store the status of the message

$84: the value is the position of the message in the file

$91: the value is the size of the message

Messages:

Each mail or news message is stored in blocks of 512 bytes with a header, that
is, the message is divided in several blocks, then a header is added to each
block containing the next information: size of the data block, size of the used
part of the data block, and position of the next block. I use the next
structure to read the blocks including the header:

Toe5_MsgItem = record
    FilePos: longint;  {this is the offset of the structure from BOF, use for
control}
    Unknown: longint;  {size of the data block, I think}
    ItemSize: longint;  {used part of the data block}
    NextItem: longint;  {position or offset of the next block from BOF, 0 if is
the final block}
    MsgContent: array[0..511] of Char;  {data block that contain the message}
end;

2) Deleted messages

When a message is deleted, first is added to the "Deleted elements" folder, and
therefore to the correspondent dbx file for deleted elements; then the space
used by the message is added to a list of free space, thus when a new message
is added, outlook express use this space first. The position of the first
element in the list is stored in the offset $48 of the dbx file.
Each element of this list is divided in two parts: a header and the block of
free space. Here is the structure of the header:

Toe5_FreeSpace = record
    FilePos: longint;  {this is the offset of the structure from BOF, use for
control}
    ElementSize: longint;  {size of the structure, header and free space}
    FreeSpaceSize: longint;  {size of the free space}
    PreviousElement: longint;  {this is the offset of the previous element from
BOF }
    NextElement: longint;  {this is the offset of the next element from BOF }
end;

3) Dates

All the dates in the message header are TFileTime type, and based on the
Coordinated Universal Time (UTC), you need to convert to your local time.
Here is an example how to do this:

function FiletimeToDatetime(const date: TFileTime): TDateTime;
var
  st: TSystemTime;
  localft: TFileTime;
begin
  FileTimeToLocalFileTime(date, localft);
  FileTimeToSystemTime(localft, st);
  Result:=SystemTimeToDateTime(st);
end;

4) Get the status of a message

To get the status of a message use the value of the flag $81, here is how to
do:
:
x := flag $81 value
If (x AND constant) <> 0 then :..
:

and here are some constants:
DOWNLOADED = $1
READED = $80
MARKED = $20   (small flag icon)
ATTACHMENTS = $4000
REPLY = $20000
INSPECT_CONVERSATION = $400000  (small glasses icon)

Finally, sorry for my English, and please, send modifications, suggestions,
comments, critics, etc to walther_e@yahoo.com

Walther Estergaard
walther_e@yahoo.com
