by Lluis Sanchez Gual (lluis@ideary.com)
This document describes the format used by the class BinaryFormatter to serialize object graphs. The document is based on the analysis of the output of the BinaryFormatter of the Microsoft .NET runtime, so it is probably not complete, since I cannot be sure that I have tested all cases. In fact, there are some gaps in some tables of codes, so if you find a meaning for the missing codes, please contact me and I'll update the document.
An object serialization is a sequence of binary elements. A binary element coluld be for example a description of an object, an array, an assembly, etc. Each binary element has a specific format, which is described in the following sections.
This table shows the available binary elements:
Code | Label | Description |
0 | Header | Allways written at the beggining of a serialization |
1 | RefTypeObject | Object with no type metadata |
4 | RuntimeObject | Corlib object |
5 | ExternalObject | Object |
6 | String | String |
7 | GenericArray | Array |
8 | BoxedPrimitiveTypeValue | Primitive type value |
9 | ObjectReference | Object reference |
10 | NullValue | Null value |
11 | End | End of stream |
12 | Assembly | Assembly declaration |
13 | ArrayFiller8b | Null filler (8 bit length) |
14 | ArrayFiller32b | Null filler (16 bit length) |
15 | ArrayOfPrimitiveType | Array of primitive type |
16 | ArrayOfObject | Array of Object |
17 | ArrayOfString | Array of string |
21 | MethodCall | Method call |
22 | MethodResponse | Method response |
All elements begin with a byte that identifies the type of element. It is shown in the "Code" column. In the implementation of the formatter I use an enum to represent those codes. The "Label" column is the name of the corresponding enum element.
The best way to underestand the format is to look at an example. Let's see how the following structure of classes would be serialized:
class A
{
B bval = new B();
C cval = new C();
string msg = "hello";
}
class B
{
string str = "bye";
}
struct C
{
string[] info = new string[] {"hello","world"}
}
The serialization of an instance of class A would result in a sequence of binary elements like the following:
Element | Bytes | Data | Comments |
Header | 0 1,0,0,0, 255,255,255,255, 1,0,0,0,0,0,0,0 |
Element code ? | This sequence of bytes is serialized at the beginning. I'm sure it has a meaning, but I don't know it. |
Assembly | 1 1,0,0,0 "MyAssembly" |
Element code ID of the assembly (1) Full name of the assembly |
Before serializing an object, the assembly where the object is implemented has to be serialized. The formatter assigns an ID to the assembly (ID 1 in this case). This ID will by used to refer to this assembly. |
ExternalObject | 5 2,0,0,0 "A" 3,0,0,0 "bval","cval","msg" 4,4,1 "B" 1,0,0,0 "C" 1,0,0,0 1,0,0,0 |
Element code Object ID (2) Class name Field count Field names Field type tags Class name of field "bval" Assembly ID of field "bval" Class name of field "cval" Assembly ID of field "cval" Assembly ID of this object |
Serialization of the root object. Each object has an ID that is used, for example, to specify object relations. The object binary element has two parts. The first one is type metadata: the name and type of serialized fields. The second part is the object data: field values. The data part is shown in the following nested elements. |
ObjectReference | 9 5,0,0,0 |
Element code ID of the referred object (5) |
Reference objects are not serialized inside the container element. Instead, an ObjectReference is serialized, and the object itself queued for later serialization. |
ExternalObject | 5 3,0,0,0 C 1,0,0,0 "info" 6 1,0,0,0 |
Element code Object ID (3) Class name Field count Field name Field type tag Assembly ID of this object |
On the other hand, value type objects are serialized inside the container element. |
ObjectReference | 9 7,0,0,0 |
Element code ID of the referred object (7) |
This is again a reference object, so it is serialized later. |
String | 6 4,0,0,0 "hello" |
Element code Object ID (4) String value |
Strings are serialized like value objects |
ExternalObject | 5 5,0,0,0 "B" 1,0,0,0 "str" 1 1,0,0,0 |
Element code Object ID (5) Class name Field count Field name Field type tag Assembly ID of this object |
Reference objects queued for serialization are serialized after the root object. |
String | 6 6,0,0,0 "bye" |
Element code Object ID (6) String value |
A string |
ArrayOfString | 17 7,0,0,0 2,0,0,0 |
Element code Object ID (7) Element count |
This could be also encoded using the binary element Array (7), but ArrayOfString is more specific and saves bytes. |
ObjectReference | 9 4,0,0,0 |
Element code ID of the referred object (4) |
This string was already serialized. Use a backwards reference. |
String | 6 8,0,0,0 "world" |
Element code Object ID (8) String value |
Another string |
The following sections show the format of each binary element. The format is presented in a table with two columns. The first one shows the sequence of bytes and the second one a description of each element in the sequence.
A special notation is used to represent the bytes. Here are some examples:
Example of element | Description |
(byte) 7 | A single byte |
uint | Any uint value (4 bytes) |
type-tag | Names in italic are described in the section "Other elements" |
string * | * represents a sequence of elements |
object | Full serialization of an object |
An object is serialized in two parts. The first one is type metadata, and the second one is the object data. When several objects of the same type are serialized, only the first one has the metadata part. The other objects are serialized using the RefTypeObject element, which instead of the metadata, it includes an ID of an object that is of the same type as the one being serialized.
Element | Description |
(byte) 1 | Element code |
uint | Object ID |
uint | ID of a previously serialized object from which to take type metadata. |
value * | Values of the fields of the object |
This element is used to serialize objects of types that are implemented in the core library of the framework. The only difference from the format for other objects if that it does not include assembly information, which is not needed since the assembly will always be mscorlib.
Element | Description |
(byte) 4 | Element code |
uint | Object ID |
string | Class name, including namespace |
uint | Number of serialized fields |
string * | Names of the fields |
type-tag * | type-tag of each field |
type-spec * | type-spec of each field |
value * | Values of the fields of the object |
This element can be used to serialize any object from any assembly.
Element | Description |
(byte) 5 | Element code |
uint | Object ID |
string | Class name, including namespace |
uint | Number of serialized fields |
string * | Names of the fields |
type-tag * | type-tag of each field |
type-spec * | type-spec of each field |
uint | ID of the assembly where the class is defined (the assembly must have been serialized before the class using the binary element 12) |
value * | Values of the fields of the object |
A string value.
Element | Description |
(byte) 6 | Element code |
uint | Object ID |
string | Value of the string |
This element can be used to represent any array.
Element | Description |
(byte) 7 | Element code |
uint | Object ID |
byte | Array type: 0:single dimension, 1: jagged, 2: multi-dimensional |
uint | Number of dimensions (rank) |
uint * | Number of elements for each dimension |
type-tag | type-tag of array's element type |
type-spec | type-spec of array's element type |
value * | Values of the elements, row by row |
This element represents a primitive type value boxed as an object.
Element | Description |
(byte) 8 | Element code |
type-spec | type-spec of the primitive type |
primitive-value | Raw value |
This element represents a reference to an object already serialized (backwards reference) or that will be serialized later (forward reference).
Element | Description |
(byte) 9 | Element code |
uint | ID of the referred object |
A null value.
Element | Description |
(byte) 10 | Element code |
11 - End
This element marks the end of the serialized object graph.
Element | Description |
(byte) 11 | Element code |
12 - Assembly
Defines an assembly. Each assembly is defined only once and has an ID. This ID is used when serializing an object (element 5) to specify the assembly where object's type is implemented.
Element | Description |
(byte) 12 | Element code |
uint | Assembly ID |
string | Full name of the assembly |
13 - ArrayFiller8b
This element can be used when serializing array data to specify multiple consecutive null values. It it only used in single dimension arrays of reference objects (not valid for value-type objects).
Element | Description |
(byte) 13 | Element code |
byte | Number of consecutive null values |
14 - ArrayFiller32b
The same as ArrayFiller8b, but it uses a uint to specify the length.
Element | Description |
(byte) 14 | Element code |
uint | Number of consecutive null values |
15 - ArrayOfPrimitiveType
This element can be used to represent a single dimension array of primitive type values.
Element | Description |
(byte) 15 | Element code |
uint | Object ID |
uint | Number of elements |
type-spec | type-spec of array's element type |
primitie-value * | Values of the elements |
This element can be used to represent a single dimension array of Object (i.e. an object[] ).
Element | Description |
(byte) 16 | Element code |
uint | Object ID |
uint | Number of elements |
object * | Values of the elements |
This element can be used to represent a single dimension array of String (i.e. an string[] ).
Element | Description |
(byte) 17 | Element code |
uint | Object ID |
uint | Number of elements |
object * | Values of the elements |
Represents a method call. The format of a method call can vary depending on the type of the parameters. The following table shows the common format:
Element | Description |
(byte) 21 | Element code |
method-call-flags | Describes wich information includes the method call |
(byte) 0, 0, 0 | ??? |
type-spec primitive-value |
Method name |
type-spec primitive-value |
Class name (including namespace and assembly) |
The following tables describe the format of the message content depending on the value of method-call-flags:
Used for calls to methods without parameters.
Element | Description |
Header[] | Only if there are Headers and method-call-flags has the flag IncludeLogicalCallContext. Headers are serialized only if there is context info. This must be a bug in MS.NET. |
object[] |
Array with the following values:
If the array is empty, it is not serialized. |
Used for calls to methods in which all parameters are primitive types.
Element | Description |
uint | Number of parameters |
( type-spec primitive-value ) * |
One value for each parameter |
Header[] | Only if there are Headers and method-response-flags has the flag IncludeLogicalCallContext. Headers are serialized only if there is context info. This must be a bug in MS.NET. |
object[] |
Array with the following values:
If the array is empty, it is not serialized. |
Used for calls to methods in which at least one parameter is not a primitive type, and when no other info needs to be serialized (i.e. context or signature).
Element | Description |
Header[] | Only if there are Headers. |
object[] |
Array of parameters. |
Used for calls to methods in which at least one parameter is not a primitive type, and when other info needs to be serialized (i.e. context or signature).
Element | Description |
Header[] | Only if there are Headers. |
object[] |
Array with the following values:
If the array is empty, it is not serialized. |
Represents a method response. The format of a method response can vary depending on the type of the return value and parameters. The following table shows the common format:
Element | Description |
(byte) 22 | Element code |
method-response-flags | Describes which information includes the method call |
return-type-tag | Describes which kind of value is returned |
(bytes) 0, 0 | ??? |
The following tables describe the format of the message content depending on the value of method-response-flags:
Used when the method has no out arguments.
Element | Description |
type-spec primitive-value |
Only if return-type-tag was PrimitiveType. |
Header[] | Only if there are Headers. |
object[] |
Array with the following values:
If the array is empty, it is not serialized. |
Used when all out arguments are primitive types.
Element | Description |
type-spec primitive-value |
Only if return-type-tag was PrimitiveType. |
uint | Number of out arguments |
( type-spec
primitive-value ) * |
One value for each argument |
Header[] | Only if there are Headers. Empty otherwise. |
object[] |
Array with the following values:
If the array is empty, it is not serialized. |
Used when at least one out argument is not a primitive type, return type is primitive, and no other info needs to be serialized.
Element | Description |
type-spec primitive-value |
Only if return-type-tag was PrimitiveType. |
Header[] | Only if there are Headers. |
object[] | Array that contains the out arguments |
Used when at least one out argument is not a primitive type, return type is not primitive, and no other info needs to be serialized.
Element | Description |
type-spec primitive-value |
Only if return-type-tag was PrimitiveType. |
Header[] | Only if there are Headers |
object[] |
Array with the following values:
|
A string value, serialized using BinaryWriter. It serializes the length of the string, using a 7-bit encoded int, and then the string chars.
A primitive value. It can be serialized using BinaryWriter and deserialized using BinaryReader. DateTime is serialized as a long (using the Ticks property).
It can be a primitive-value or any of the following binary elements:
Together with a type-spec value, identifies a type. Some types can be represented using several type-tags. In this case, the most specific type-tag is allways used (it will take less bytes).
type-tag can be one of the following:
Value | Label | Description | type-spec needed |
0 | PrimitiveType | A primitive type | The code of the primitive type |
1 | String | String class. type-spec is not needed. | Not needed |
2 | ObjectType | Object class. type-spec is not needed. | Not needed |
3 | RuntimeType | A type from the .NET runtime (including arrays of .NET types) | The name of the class |
4 | GenericType | Any other type (including arrays) | The name of the class and the id of the assembly |
5 | ArrayOfObject | Array of class Object | Not needed |
6 | ArrayOfString | Array of class String | Not needed |
7 | ArrayOfPrimitiveType | Array of primitive type | The code of the primitive type |
It is the name or the code of a type. To decode it, a type-tag value is needed. The following tables shows the format of type-spec for each type-tag value:
Element | Description |
primitive-type-code | The code of the primitive type |
Element | Description |
string | The name of the class, including the namespace |
Element | Description |
string | The name of the class, including the namespace |
uint | Id of the assembly where the class is defined |
For other type-tag values, no type-spec is needed.
Value | Label | Description |
1 | NoArguments | No arguments included |
2 | PrimitiveArguments | Primitive type arguments |
4 | ArgumentsInSimpleArray | At least one out argument is not from a primitive type |
8 | ArgumentsInMultiArray | At least one out argument is not from a primitive type and other info is included in the message (context or signature) |
16 | ExcludeLogicalCallContext | LogicalContext not included |
32 | ??? | |
64 | IncludesLogicalCallContext | LogicalContext included |
128 | IncludesSignature | Signature is included in the message. It is only included when calling an overloaded method. |
Value | Label | Description |
1 | NoArguments | Response with no out arguments |
2 | PrimitiveArguments | Response with primitive type out arguments |
4 | ArgumentsInSimpleArray | Response with primitive type return value, and with at least one out argument that is not a primitive type. |
8 | ArgumentsInMultiArray | Response with at least one out argument that is not a primitive type, and other info is included in the message (context or signature) |
16 | ExcludeLogicalCallContext | LogicalContext not included |
32 | ??? | |
64 | IncludesLogicalCallContext | LogicalContext included |
Value | Label | Description |
2 | Null | Null return value |
8 | PrimitiveType | Primitive type return value |
16 | ObjectType | Object instance return value |
32 | Exception | Method response is an exception |
Value | Label |
1 | Boolean |
2 | Byte |
3 | Char |
5 | Decimal |
6 | Double |
7 | Int16 |
8 | Int32 |
9 | Int64 |
10 | SByte |
11 | Single |
13 | DateTime |
14 | UInt16 |
15 | UInt32 |
16 | UInt64 |
18 | String |