Binary Serialization Format

by Lluis Sanchez Gual (lluis@ideary.com)

Introduction

This document describes the format used by the class BinaryFormatter to serialize object graphs. The document is based on the analysis of the output of the BinaryFormatter of the Microsoft .NET runtime, so it is probably not complete, since I cannot be sure that I have tested all cases. In fact, there are some gaps in some tables of codes, so if you find a meaning for the missing codes, please contact me and I'll update the document.

Format description

An object serialization is a sequence of binary elements. A binary element coluld be for example a description of an object, an array, an assembly, etc. Each binary element has a specific format, which is described in the following sections.

This table shows the available binary elements:

Code Label Description
0 Header Allways written at the beggining of a serialization
1 RefTypeObject Object with no type metadata
4 RuntimeObject Corlib object
5 ExternalObject Object
6 String String
7 GenericArray Array
8 BoxedPrimitiveTypeValue Primitive type value
9 ObjectReference Object reference
10 NullValue Null value
11 End End of stream
12 Assembly Assembly declaration
13 ArrayFiller8b Null filler (8 bit length)
14 ArrayFiller32b Null filler (16 bit length)
15 ArrayOfPrimitiveType Array of primitive type
16 ArrayOfObject Array of Object
17 ArrayOfString Array of string
21 MethodCall Method call
22 MethodResponse Method response

All elements begin with a byte that identifies the type of element. It is shown in the "Code" column. In the implementation of the formatter I use an enum to represent those codes. The "Label" column is the name of the corresponding enum element.

An example

The best way to underestand the format is to look at an example. Let's see how the following structure of classes would be serialized:

class A
{
     B bval = new B();
     C cval = new C();
     string msg = "hello";
}

class B
{
     string str = "bye";
}

struct C
{
     string[] info = new string[] {"hello","world"}
}

The serialization of an instance of class A would result in a sequence of binary elements like the following:

Element Bytes Data Comments
Header 0
1,0,0,0,
255,255,255,255,
1,0,0,0,0,0,0,0
Element code ? This sequence of bytes is serialized at the beginning. I'm sure it has a meaning, but I don't know it.
Assembly 1
1,0,0,0
"MyAssembly"
Element code
ID of the assembly (1)
Full name of the assembly

Before serializing an object, the assembly where the object is implemented has to be serialized. The formatter assigns an ID to the assembly (ID 1 in this case). This ID will by used to refer to this assembly.

ExternalObject 5
2,0,0,0
"A"
3,0,0,0
"bval","cval","msg"
4,4,1
"B"
1,0,0,0
"C"
1,0,0,0
1,0,0,0
Element code
Object ID (2)
Class name
Field count
Field names
Field type tags
Class name of field "bval"
Assembly ID of field "bval"
Class name of field "cval"
Assembly ID of field "cval"
Assembly ID of this object
Serialization of the root object. Each object has an ID that is used, for example, to specify object relations. The object binary element has two parts. The first one is type metadata: the name and type of serialized fields. The second part is the object data: field values. The data part is shown in the following nested elements.
       ObjectReference 9
5,0,0,0
Element code
ID of the referred object (5)
Reference objects are not serialized inside the container element. Instead, an ObjectReference is serialized, and the object itself queued for later serialization.
       ExternalObject 5
3,0,0,0
C
1,0,0,0
"info"
6
1,0,0,0
Element code
Object ID (3)
Class name
Field count
Field name
Field type tag
Assembly ID of this object
On the other hand, value type objects are serialized inside the container element.
              ObjectReference 9
7,0,0,0
Element code
ID of the referred object (7)
This is again a reference object, so it is serialized later.
       String 6
4,0,0,0
"hello"
Element code
Object ID (4)
String value
Strings are serialized like value objects
ExternalObject 5
5,0,0,0
"B"
1,0,0,0
"str"
1
1,0,0,0
Element code
Object ID (5)
Class name
Field count
Field name
Field type tag
Assembly ID of this object

Reference objects queued for serialization are serialized after the root object.

       String 6
6,0,0,0
"bye"
Element code
Object ID (6)
String value
A string
ArrayOfString 17
7,0,0,0
2,0,0,0
Element code
Object ID (7)
Element count
This could be also encoded using the binary element Array (7), but ArrayOfString is more specific and saves bytes.
       ObjectReference 9
4,0,0,0
Element code
ID of the referred object (4)
This string was already serialized. Use a backwards reference.
       String 6
8,0,0,0
"world"
Element code
Object ID (8)
String value
Another string

Binary elements

The following sections show the format of each binary element. The format is presented in a table with two columns. The first one shows the sequence of bytes and the second one a description of each element in the sequence.

A special notation is used to represent the bytes. Here are some examples:

Example of element Description
(byte) 7 A single byte
uint Any uint value (4 bytes)
type-tag Names in italic are described in the section "Other elements"
string * * represents a sequence of elements
object Full serialization of an object

1 - RefTypeObject

An object is serialized in two parts. The first one is type metadata, and the second one is the object data. When several objects of the same type are serialized, only the first one has the metadata part. The other objects are serialized using the RefTypeObject element, which instead of the metadata, it includes an ID of an object that is of the same type as the one being serialized.

Element Description
(byte) 1 Element code
uint Object ID
uint ID of a previously serialized object from which to take type metadata.
value * Values of the fields of the object

4 - RuntimeObject

This element is used to serialize objects of types that are implemented in the core library of the framework. The only difference from the format for other objects if that it does not include assembly information, which is not needed since the assembly will always be mscorlib.

Element Description
(byte) 4 Element code
uint Object ID
string Class name, including namespace
uint Number of serialized fields
string * Names of the fields
type-tag * type-tag of each field
type-spec * type-spec of each field
value * Values of the fields of the object

5 - ExternalObject

This element can be used to serialize any object from any assembly.

Element Description
(byte) 5 Element code
uint Object ID
string Class name, including namespace
uint Number of serialized fields
string * Names of the fields
type-tag * type-tag of each field
type-spec * type-spec of each field
uint ID of the assembly where the class is defined (the assembly must have been serialized before the class using the binary element 12)
value * Values of the fields of the object

6 - String

A string value.

Element Description
(byte) 6 Element code
uint Object ID
string Value of the string

7 - GenericArray

This element can be used to represent any array.

Element Description
(byte) 7 Element code
uint Object ID
byte Array type: 0:single dimension, 1: jagged, 2: multi-dimensional
uint Number of dimensions (rank)
uint * Number of elements for each dimension
type-tag type-tag of array's element type
type-spec type-spec of array's element type
value * Values of the elements, row by row

8 - BoxedPrimitiveTypeValue

This element represents a primitive type value boxed as an object.

Element Description
(byte) 8 Element code
type-spec type-spec of the primitive type
primitive-value Raw value

9 - ObjectReference

This element represents a reference to an object already serialized (backwards reference) or that will be serialized later (forward reference).

Element Description
(byte) 9 Element code
uint ID of the referred object

10 - NullValue

A null value.

Element Description
(byte) 10 Element code

11 - End

This element marks the end of the serialized object graph.

Element Description
(byte) 11 Element code

12 - Assembly

Defines an assembly. Each assembly is defined only once and has an ID. This ID is used when serializing an object (element 5) to specify the assembly where object's type is implemented.

Element Description
(byte) 12 Element code
uint Assembly ID
string Full name of the assembly

13 - ArrayFiller8b

This element can be used when serializing array data to specify multiple consecutive null values. It it only used in single dimension arrays of reference objects (not valid for value-type objects).

Element Description
(byte) 13 Element code
byte Number of consecutive null values

14 - ArrayFiller32b

The same as ArrayFiller8b, but it uses a uint to specify the length.

Element Description
(byte) 14 Element code
uint Number of consecutive null values

15 - ArrayOfPrimitiveType

This element can be used to represent a single dimension array of primitive type values.

Element Description
(byte) 15 Element code
uint Object ID
uint Number of elements
type-spec type-spec of array's element type
primitie-value * Values of the elements

16 - ArrayOfObject

This element can be used to represent a single dimension array of Object (i.e. an object[] ).

Element Description
(byte) 16 Element code
uint Object ID
uint Number of elements
object * Values of the elements

17 - ArrayOfString

This element can be used to represent a single dimension array of String (i.e. an string[] ).

Element Description
(byte) 17 Element code
uint Object ID
uint Number of elements
object * Values of the elements

21 Method call

Represents a method call. The format of a method call can vary depending on the type of the parameters. The following table shows the common format:

Element Description
(byte) 21 Element code
method-call-flags Describes wich information includes the method call
(byte) 0, 0, 0 ???
type-spec
primitive-value
Method name
type-spec
primitive-value
Class name (including namespace and assembly)

The following tables describe the format of the message content depending on the value of method-call-flags:

method-call-flags & NoArguments

Used for calls to methods without parameters.

Element Description
Header[] Only if there are Headers and method-call-flags has the flag IncludeLogicalCallContext. Headers are serialized only if there is context info. This must be a bug in MS.NET.
object[]

Array with the following values:

  • Method signature, only if method-call-flags has the flag IncludesSignature. It is an array of Type.
  • LogicalCallContext instance, only if method-call-flags has the flag IncludesLogicalCallContext.

If the array is empty, it is not serialized.

method-call-flags & PrimitiveArguments

Used for calls to methods in which all parameters are primitive types.

Element Description
uint Number of parameters
( type-spec
primitive-value ) *
One value for each parameter
Header[] Only if there are Headers and method-response-flags has the flag IncludeLogicalCallContext. Headers are serialized only if there is context info. This must be a bug in MS.NET.
object[]

Array with the following values:

  • Method signature, only if method-call-flags has the flag IncludesSignature. It is an array of Type.
  • LogicalCallContext instance, only if method-call-flags has the flag IncludesLogicalCallContext.

If the array is empty, it is not serialized.

method-call-flags & ArgumentsInSimpleArray

Used for calls to methods in which at least one parameter is not a primitive type, and when no other info needs to be serialized (i.e. context or signature).

Element Description
Header[] Only if there are Headers.
object[]

Array of parameters.

method-call-flags & ArgumentsInMultiArray

Used for calls to methods in which at least one parameter is not a primitive type, and when other info needs to be serialized (i.e. context or signature).

Element Description
Header[] Only if there are Headers.
object[]

Array with the following values:

  • Array of parameters.
  • Method signature, only if method-call-flags has the flag IncludesSignature. It is an array of Type.
  • LogicalCallContext instance, only if method-call-flags has the flag IncludesLogicalCallContext.

If the array is empty, it is not serialized.

22 Method Response

Represents a method response. The format of a method response can vary depending on the type of the return value and parameters. The following table shows the common format:

Element Description
(byte) 22 Element code
method-response-flags Describes which information includes the method call
return-type-tag Describes which kind of value is returned
(bytes) 0, 0 ???

The following tables describe the format of the message content depending on the value of method-response-flags:

method-response-flags & NoArguments

Used when the method has no out arguments.

Element Description
type-spec
primitive-value

Only if return-type-tag was PrimitiveType.
Return value.

Header[] Only if there are Headers.
object[]

Array with the following values:

  • Return value, only if return-type-tag was ObjectType
  • LogicalCallContext instance, only if method-response-flags has the flag IncludeLogicalCallContext

If the array is empty, it is not serialized.

method-response-flags & PrimitiveArguments

Used when all out arguments are primitive types.

Element Description
type-spec
primitive-value

Only if return-type-tag was PrimitiveType.
Return value.

uint Number of out arguments
( type-spec
primitive-value )
 *
One value for each argument
Header[] Only if there are Headers. Empty otherwise.
object[]

Array with the following values:

  • Return value, only if return-type-tag was ObjectType
  • LogicalCallContext instance, only if method-response-flags has the flag IncludeLogicalCallContext

If the array is empty, it is not serialized.

method-response-flags & ArgumentsInSimpleArray

Used when at least one out argument is not a primitive type, return type is primitive, and no other info needs to be serialized.

Element Description
type-spec
primitive-value

Only if return-type-tag was PrimitiveType.
Return value.

Header[] Only if there are Headers.
object[] Array that contains the out arguments

method-response-flags & ArgumentsInMultiArray

Used when at least one out argument is not a primitive type, return type is not primitive, and no other info needs to be serialized.

Element Description
type-spec
primitive-value

Only if return-type-tag was PrimitiveType.
Return value.

Header[] Only if there are Headers
object[]

Array with the following values:

  • Array of out arguments.
  • Return value, only if return-type-tag was ObjectType
  • LogicalCallContext instance, only if method-response-flags has the flag IncludeLogicalCallContext

Other elements

string

A string value, serialized using BinaryWriter. It serializes the length of the string, using a 7-bit encoded int, and then the string chars.

primitive-value

A primitive value. It can be serialized using BinaryWriter and deserialized using BinaryReader. DateTime is serialized as a long (using the Ticks property).

value

It can be a primitive-value or any of the following binary elements:

type-tag

Together with a type-spec value, identifies a type. Some types can be represented using several type-tags. In this case, the most specific type-tag is allways used (it will take less bytes).

type-tag can be one of the following:

Value Label Description type-spec needed
0 PrimitiveType A primitive type The code of the primitive type
1 String String class. type-spec is not needed. Not needed
2 ObjectType Object class. type-spec is not needed. Not needed
3 RuntimeType A type from the .NET runtime (including arrays of .NET types) The name of the class
4 GenericType Any other type (including arrays) The name of the class and the id of the assembly
5 ArrayOfObject Array of class Object Not needed
6 ArrayOfString Array of class String Not needed
7 ArrayOfPrimitiveType Array of primitive type The code of the primitive type

type-spec

It is the name or the code of a type. To decode it, a type-tag value is needed. The following tables shows the format of type-spec for each type-tag value:

type-tag = PrimitiveType or ArrayOfPrimitiveType

Element Description
primitive-type-code The code of the primitive type

type-tag = RuntimeType

Element Description
string The name of the class, including the namespace

type-tag = GenericType

Element Description
string The name of the class, including the namespace
uint Id of the assembly where the class is defined

Other type-tag

For other type-tag values, no type-spec is needed.

method-call-flags

Value Label Description
1 NoArguments No arguments included
2 PrimitiveArguments Primitive type arguments
4 ArgumentsInSimpleArray At least one out argument is not from a primitive type
8 ArgumentsInMultiArray At least one out argument is not from a primitive type and other info is included in the message (context or signature)
16 ExcludeLogicalCallContext LogicalContext not included
32 ???
64 IncludesLogicalCallContext LogicalContext included
128 IncludesSignature Signature is included in the message. It is only included when calling an overloaded method.

method-response-flags

Value Label Description
1 NoArguments Response with no out arguments
2 PrimitiveArguments Response with primitive type out arguments
4 ArgumentsInSimpleArray Response with primitive type return value, and with at least one out argument that is not a primitive type.
8 ArgumentsInMultiArray Response with at least one out argument that is not a primitive type, and other info is included in the message (context or signature)
16 ExcludeLogicalCallContext LogicalContext not included
32 ???
64 IncludesLogicalCallContext LogicalContext included

return-type-tag

Value Label Description
2 Null Null return value
8 PrimitiveType Primitive type return value
16 ObjectType Object instance return value
32 Exception Method response is an exception

primitive-type-code

Value Label
1 Boolean
2 Byte
3 Char
5 Decimal
6 Double
7 Int16
8 Int32
9 Int64
10 SByte
11 Single
13 DateTime
14 UInt16
15 UInt32
16 UInt64
18 String


2003 (C) Lluis Sanchez Gual  ( lluis@ideary.com)