Site Change

So, after a significantly long time of this site just sitting here, a copious amount of spam had accumulated in both the wiki and the phpbb forums. So I’ve decided that given there was a grand total of 3 articles in the wiki anyway, maybe the best shot is for me to just post stuff here. I’ve also removed the old blog posts which were just me mumbling about what to do with the website.

In this incredibly long post I shall detail the only post from the wiki I wanted to keep visible to the public.

View State
ViewState in .Net 2.0 is encoded using the ObjectStateFormatter Class, which optimises the encoding for comon cases. Knowing these common cases can help when deciding what to store in viewstate.

The viewstate object which is persisted is itself a complex object formed based on page content.

At the top level it is a Pair class with:

  • A HybridDictionary indexed by control ID of control states (which themselves recursively contain subcontrol control states)
  • A pair containing:
    • Page.GetTypeHashCode().ToString(NumberFormatInfo.InvariantInfo)
    • Constructed ViewState

Constructed ViewState is either:

A Pair:

  • Page.SaveViewState() output (or Control.SaveViewState() if its Constructed ViewState for a control)
  • ArrayList containing alternating entries of ControlID and Constructed ViewState for the control.

Or a Tripple:

  • Page.Adapter.SaveAdapterViewState() output (or Control.Adapter.SaveAdapterViewState()…)
  • Page.SaveViewState() output (or Control.SaveViewState() …)
  • ArrayList containing alternating entries of ControlID and Constructed ViewState for the control.

ObjectStateFormatter

Assembly: System.Web
Namespace: System.Web.UI
New in .Net 2.0

ObjectStateFormatter Format

The MSDN docs describe the format in a basic sense, but what follows is a more detailed description which may assist in the optimisation of viewstate for a page. For instance the MSDN docs state that Enum is optimised, but as you can see in what follows, storing the enum value into the viewstate directly is much more efficient.

This format may change in future versions of .Net so keep that in mind.

The overall format consists of two leading bytes 0xFF 0x01 followed by encoded data as follows.

The basic format consists of a ‘tag’ which is 1 byte long, followed by data, which is handled according to the tag. The contents of the tag may be nested content of the same format in some circumstances.

* Tag: 0x01
* Data: A short encoded using BinaryWriter Class
* Represents: a short.

* Tag: 0x02
* Data: Encoded integer. 7bits stored in each byte, top bit is 0 when there are no more bytes. First byte contains the least significant 7bits next byte contains the next least significant 7 bits etc etc.
* Represents: a non-zero number.

* Tag: 0x03
* Data: A byte encoded using BinaryWriter Class
* Represents: a byte.

* Tag: 0x04
* Data: A char encoded using BinaryWriter Class
* Represents: a char.

* Tag: 0x05
* Data: A string encoded using BinaryWriter Class (This is UTF8)
* Represents: a non-empty string.

* Tag: 0x06
* Data: A long encoded using BinaryWriter Class
* Represents: A DateTime instance, where the long is the output from DateTime.ToBinary()

* Tag: 0x07
* Data: A double encoded using BinaryWriter Class
* Represents: a double.

* Tag: 0x08
* Data: A float encoded using BinaryWriter Class
* Represents: a float.

* Tag: 0x09
* Data: An int encoded using BinaryWriter Class.
* Represents: An instance of the Color struct. The encoded int is an ARGB value from Color.ToArgb().

* Tag: 0x0A
* Data: A 7 bit per byte encoded value (see above).
* Represents: A Color corresponding to a member of the KnownColor enumeration. Value indicates the value of the KnownColor to which the stored color corresponds.

* Tag: 0x0B
* Data: An encoded type (see Data definition for Tag 0x19) followed by a 7 bit per byte encoded value (see above).
* Represents: An instance of a typed enum which has base type if int. The encoded type is the type of enum, and the value is the integer value of the enum instance.

* Tag: 0x0C
* Data: Empty (0 bytes)
* Represents: Color.Empty

* Tag: 0x0F
* Data: Two consecutive blocks encoded using this format.
* Represents: an instance of the Pair Class – first encoded block is the First property, second is the Second property

* Tag: 0x10
* Data: Three consecutive blocks encoded using this format.
* Represents: an instance of the Tripple Class – first is First property, second is Second prperty, third is Third property

* Tag: 0x14
* Data: An encoded type (see Data definition for Tag 0x19) followed by a 7 bit per byte encoded count (see above) followed by ‘count’ blocks encoded using this format.
* Represents: A one-dimensional typed array which contains more than 25% non-null values. The encoded type represents the type of the array. The encoded count represents the length. The blocks represent the contents of the array encoded in order from 0th element to the length-1 element.

* Tag: 0x15
* Data: 7bit per byte encoded count (see above) followed by ‘count’ strings encoded with BinaryWriter Class (This uses UTF8).
* Represents: an instance of string[] which contains no null values.

* Tag: 0x16
* Data: 7bit per byte encoded count (see above) followed by ‘count’ encoded blocks.
* Represents: an instance of the ArrayList Class – not a class which inherits from ArrayList, each block represents a value stored in the ArrayList in the order they are stored.

* Tag: 0x17
* Data: 7bit per byte encoded count (see above) followed by 2*’count’ encoded blocks.
* Represents: an instance of the Hashtable Class – not a class which inherits from Hashtable, each pair of blocks represents a key value pair from the Hashtable.

* Tag: 0x18
* Data: 7bit per byte encoded count (see above) followed by 2*’count’ encoded blocks.
* Represents: an instance of the HybridDictionary Class – not a class which inherits from HybridDictionary, each pair of blocks represents a key value pair from the HybridDictionary

* Tag: 0x19
* Data: a tag byte followed by tag specific data.
* Represents: a type.
o If tag byte is 0x29 the following data is the full name string of a type in the System.Web Assembly encoded using BinaryWriter Class.
o If tag byte is 0x2A the following data is the assembly qualified name string of a type encoded using BinaryWriter Class.
o If tag byte is 0x2B the following data is a 7bit per byte encoded integer index into the order of arrival of types to the encoder. Unlike IndexedString encoding, the counter is a full int and not just a byte, so there is no elaborate reseting system.
o Subtags 0x29 and 0x2A are only used for the first occurance of each specific type in the output stream. All subsequent references use 0x2B subtags.

* Tag: 0x1A (not used by encoder??)
* Data: An encoded type (see Data definition for Tag 0x19) followed by a block encoded using this format.
* Represents: An instance of a nullable type. Implementation appears broken, as it uses reflection to access a FromObject method on the Nullable`1 Class which does not seem to exist. The encoded type is supposed to be the underlying type of the nullable instance, and the encoded block the current value of the nullable instance.

* Tag: 0x1B
* Data: A double encoded using BinaryWriter Class followed by an int encoded using BinaryWriter Class
* Represents: An instance of the Unit Class. The double corresponds to the value of the Unit, and the int corresponds to the enumeration value of the UnitType property of the Unit.

* Tag: 0x1C
* Data: Empty (0 bytes)
* Represents: Unit.Empty

* Tag: 0x1E
* Data: string encoded using BinaryWriter Class (This is UTF8)
* Represents: An occurance of an IndexedString. Gauranteed to be different from the previous 256 occurances encoded using this tag.

* Tag: 0x1F
* Data: a single byte.
* Represents: An IndexedString. Which IndexedString is determined by using the byte as a lookup into the current IndexedStrings array.

* Note: the above two tags are encoded as follows. A lookup is used which contains upto 255 entries. As each IndexedString arrives at the encoder, it is checked to see if it is in the lookup. If so, the index stored in the lookup is stored using the 0x1F tag. Otherwise it is stored in the lookup as the next number, unless the next number is 255. If the next number is 255 it replaces the entry mapped to 0 in the lookup, and the next number is set to 1. The IndexedString is then stored using the 0x1E tag. This can be decoded by reading 0x1E tag encoding into a lookup and when the lookup index reaches 255, replacing 0 then 1 then 2 … etc in turn. 0x1F is decoded by indexing into the lookup array.

* Tag: 0x28
* Data: An encoded type (see Data definition for Tag 0x19) followed by a string encoded using BinaryWriter Class (This is UTF8)
* Represents: An instance of an object which has a type converter which can convert to/from string. The encoded type represents the type of the object, and the string represents the encoding using TypeConverter.ConvertToInvariantString(Object).

* Tag: 0x32
* Data: A 7bit per byte encoded length (see above) followed by a byte array stored using BinaryWriter Class.
* Represents: An instance of an object which can stored using BinaryFormatter. The length indicates the length of the byte array which follows and the byte array itself represents the output of BinaryFormatter.Serialize(Stream,Object). A memory stream to hold this output during encoding is by default initialized to 256bytes in length. Decoding uses a pool of 4k memory streams.

* Tag: 0x3C
* Data: An encoded type (see Data definition for Tag 0x19) followed by a 7 bit per byte encoded length followed by a 7 bit per byte encoded count followed by ‘count’ pairs of 7 bit per byte encoded indexes and data encoded using this format.
* Represents: a typed one dimensional array with a large number of null values (>75%). The encoded length represents the length of the array, and the encoded count represents the number of non-null values. The following pairs represent indexes into the array and the non-null values stored in them.
* Note: the encoder uses recursion at this point, so theoretically an excessively deep nested sparse array could cause a stack overflow, Extremely unlikely though.

* Tag: 0x64
* Data: Empty (0 bytes)
* Represents: null reference

* Tag: 0x65
* Data: Empty (0 bytes)
* Represents: empty string

* Tag: 0x66
* Data: Empty (0 bytes)
* Represents: The integer 0

* Tag: 0x67
* Data: Empty (0 bytes)
* Represents: boolean true

* Tag: 0x68
* Data: Empty (0 bytes)
* Represents: boolean false

Notes:
* Deserialization uses recursion, so a paticually nested data structure of supported types could cause a stack overflow (Extremely unlikely)
* ArrayList takes less space to encode than int[] double[] or any basic type array other than string[] due to the fully qualified assembly name for int, double, whatever having to be encoded at least once in the output, and the type lookup index for any other uses.
* No support for generic collections. (Other than binary serialization/type converter)
* No support for multi-dimensional arrays. (Other than binary serialization/type converter)
* No support for nullable type instances.
* If you have an instance of SortedList, best conversion is not to DictionaryEntry[], an ArrayList with even indexes for the keys and odd indexes for the values will take much less space and perform better.
* IDictionary is not optimised despite being listed as such on the msdn docs, only Hashtable and HybridDictionary instances of IDictionary are.
* The above Overall Format is potentially Encrypted or appended with a keyed MAC depending on Page settings. Finally it is base64 encoded. It is not compressed.
* Guid is not a fundamental type in the encoding. Oddly enough Guid.ToString(“N”) and Guid.ToByteArray() will encode in virtual about the same space, but ToByteArray will have to store a type identifier at least once, and is longer even as the number of guid’s approaches infinity. Therefore Guid.ToString(“N”) is best simple encoding method – new ArrayList(Guid.ToByteArray()) is an equivelent in size alternative, but harder to reconstruct in load view state. (Another more complex alternative is to convert the ToByteArray into a string using a better encoding then hex.)