The MSpec format

The MSpec format (Message Specification) was a result of a brainstorming session after evaluating a lot of other options.

We simply sat down and started to write some imaginary format (imaginary was even the initial Name we used Machine-Readable SPEC = mspec). After we had an initial format that seemed to do the trick, we then stated creating parses for this and started iteratively fine-tuning both spec and parsers as part of the process of implementing new protocols and language templates.

It’s a text-based format.

At the root level of these specs are a set of type, discriminatedType, dataIo and enum blocks.

type elements are objects who`s content and structure is independent of the input.

An example would be the TPKTPacket of the S7 format:

[type TPKTPacket
    [const    uint 8                 protocolId 0x03]
    [reserved uint 8                 '0x00']
    [implicit uint 16                len       'payload.lengthInBytes + 4']
    [simple   COTPPacket('len - 4') payload]
]

A discriminatedType type, in contrast, is an object who`s content and structure is influenced by the input.

Every discriminated type can contain an arbitrary number of normal fields but must contain exactly one typeSwitch element.

For example part of the spec for the S7 format looks like this:

[discriminatedType S7Message
    [const         uint 8  protocolId      0x32]
    [discriminator uint 8  messageType]
    [reserved      uint 16 '0x0000']
    [simple        uint 16 tpduReference]
    [implicit      uint 16 parameterLength 'parameter != null ? parameter.lengthInBytes : 0']
    [implicit      uint 16 payloadLength   'payload != null ? payload.lengthInBytes : 0']
    [typeSwitch messageType
        ['0x01' S7MessageRequest
        ]
        ['0x02' S7MessageResponse
            [simple uint 8 errorClass]
            [simple uint 8 errorCode]
        ]
        ['0x03' S7MessageResponseData
            [simple uint 8 errorClass]
            [simple uint 8 errorCode]
        ]
        ['0x07' S7MessageUserData
        ]
    ]
    [optional S7Parameter ('messageType')              parameter 'parameterLength > 0']
    [optional S7Payload   ('messageType', 'parameter') payload   'payloadLength > 0'  ]
]

A type`s start is declared by an opening square bracket [ followed by the type or discriminatedType keyword, which is directly followed by a name. A Type definition is ended with a closing square bracket ].

Every type definition contains a list of so-called fields.

The list of available field types are:

  • abstract: used in the parent type declaration do declare a field that has to be defined with the identical type in all subtypes (reserved for discriminatedType).

  • array: array of simple or complex typed objects.

  • assert: generally similar to constant fields, however do they throw AssertionExceptions instead of hard ParseExceptions. They are used in combination with optional fields.

  • checksum: used for calculating and verifying checksum values.

  • const: expects a given value and causes a hard exception if the value doesn’t match.

  • discriminator: special type of simple typed field which is used to determine the concrete type of object (reserved for discriminatedType).

  • enum: special form of field, used if an enum types property is to be used instead of it’s primary value.

  • implicit: a field required for parsing, but is usually defined though other data, so it’s not stored in the object, but calculated on serialization.

  • manualArray: like an array field, however the logic for serializing, parsing, number of elements and size have to be provided manually.

  • manual: simple field, where the logic for parsing, serializing and size have to be provided manually.

  • optional: simple or complex typed object, that is only present if an optional condition expression evaluates to true and no AssertionException is thrown when parsing the referenced type.

  • padding: field used to add padding data to make datastructures aligned.

  • peek: field that tries to parse a given structure without actually consuming the bytes.

  • reserved: expects a given value, but only warns if condition is not meet.

  • simple: simple or complex typed object.

  • typeSwitch: not a real field, but indicates the existence of subtypes, which are declared inline (reserved for discriminatedType).

  • unknown: field used to declare parts of a message that still has to be defined. Generally used when reverse-engineering a protocol. Messages with unknown fields can only be parsed and not serialized.

  • validation: this field is not actually a real field, it’s more a condition that is checked during parsing and if the check fails, it throws a validation exception, wich is handled by

  • virtual: generates a field in the message, that is generally only used for simplification. It’s not used for parsing or serializing.

The full syntax and explanations of these type follow in the following chapters.

Another thing we have to explain are how types are specified.

In general, we distinguish between two types of types used in field definitions:

  • simple types

  • complex types

Simple Types

Simple types are usually raw data the format is:

{base-type} {size}

The base types available are currently:

  • bit: Simple boolean value or bit.

  • byte: Special value fixed to 8 bit, which defaults to either signed or unsigned depending on the programming language (Java it defaults to signed integer values and in C and Go it defaults to unsigned integers).

  • int: The input is treated as signed integer value.

  • uint: The input is treated as unsigned integer value.

  • float: The input is treated as floating point number.

  • string: The input is treated as string.

Then for dataIo types we have some additional types: - time: The input is treated as a time representation - date: The input is treated as a date representation - dateTime: The input is treated as a date with time

All except the bit and byte types take a size value which provides how many bits should be read. For the bit field, this obviously defaults to 1 and for the byte the bits default to 8.

So reading an unsigned 8-bit integer would be: uint 8.

There is currently one special type, reserved for string values, whose length is determined by an expression instead of a fixed number of bits. It is considered a variable length string:

  • vstring: The input is treated as a variable length string and requires an expression tp provide the number of bits to read.

Complex Types

In contrast to simple types, complex types reference other complex types (Root elements of the spec document).

How the parser should interpret them is defined in the referenced types definition.

In the example above, for example the S7Parameter is defined in another part of the spec.

Field Types and their Syntax

array Field

An array field is exactly what you expect. It generates an field which is not a single-value element but an array or list of elements.

[array {bit|byte}           {name} {count|length|terminated} '{expression}']
[array {simple-type} {size} {name} {count|length|terminated} '{expression}']
[array {complex-type}       {name} {count|length|terminated} '{expression}']

Array types can be both simple and complex typed and have a name. An array field must specify the way it’s length is determined as well as an expression defining it’s length. Possible values are: - count: This means that exactly the number of elements are parsed as the expression specifies. - length: In this case a given number of bytes are being read. So if an element has been parsed and there are still bytes left, another element is parsed. - terminated: In this case the parser will continue reading elements until it encounters a termination sequence.

assert Field

An assert field is pretty much identical to a const field. The main difference however it how the case is handled, if the parsed value does not match the expected value.

[assert         {bit|byte}            {name}          '{assert-value}']
[assert         {simple-type} {size}  {name}          '{assert-value}']

While a const field would abort parsing in total with an error, an assert field with abort parsing, but the error will only bubble up in the stack till the first optional field is found.

In this case the parser will be rewound to the position before starting to parse the optional field and continue parsing with the next field, skipping the optional field.

If there is no upstream optional field, then parsing of the message terminates with an error.

See also: - validation field: Similar to an assert field, however no parsing is done, and instead simply a condition is checked. - optional field: optional fields are aware of the types of parser errors produced by assert and validation fields

checksum Field

A checksum field can only operate on simple types.

[checksum {bit|byte}           {name} '{checksum-expression}']
[checksum {simple-type} {size} {name} '{checksum-expression}']

When parsing a given simple type is parsed and then the result is compared to the value the checksum-expression provides. If they don’t match an exception is thrown.

When serializing, the checksum-expression is evaluated and the result is then output.

Note: As quite often a checksum is calculated based on the byte data of a message read up to the checksum, an artificial variable is available in expressions called checksumRawData of type byte[] which contains an array of all the byte data read in the current message element and it’s sub types in case of a discriminated type.

This field doesn’t keep any data in memory.

See also: - implicit field: A checksum field is similar to an implicit field, however the checksum-expression is evaluated are parsing time and throws an exception if the values don’t match.

const Field

A const field simply reads a given simple type and compares to a given reference value.

[const {bit|byte}           {name} {reference}]
[const {simple-type} {size} {name} {reference}]

When parsing it makes the parser throws an Exception if the parsed value does not match the expected one.

When serializing is simply outputs the expected constant.

This field doesn’t keep any data in memory.

See also: - implicit field: A const field is similar to an implicit field, however it compares the parsed input to the reference value and throws an exception if the values don’t match.

discriminator Field

Discriminator fields are only used in `discriminatedType`s.

[discriminator {simple-type} {size} {name}]

They are used, in cases where the value of a field determines the concrete type of a discriminated type. In this case we don’t have to waste memory on storing the discriminator value and this can be statically assigned to the type.

When parsing a discriminator fields result just in being a locally available variable.

When serializing is accesses the discriminated types constants and uses these as output.

See also: - implicit field: A discriminator field is similar to an implicit field, however doesn’t provide a serialization expression as it uses the discrimination constants of the type it is. - discriminated types

implicit Field

Implicit types are fields that get their value implicitly from the data they contain.

[implicit {bit|byte}           {name} '{serialization-expression}']
[implicit {simple-type} {size} {name} '{serialization-expression}']

When parsing an implicit type is available as a local variable and can be used by other expressions.

When serializing the serialization-expression is executed and the resulting value is output.

This type of field is generally used for fields that handle numbers of elements or length values as these can be implicitly calculated at serialization time.

This field doesn’t keep any data in memory.

manualArray Field

[manualArray {bit|byte}           {name} {count|length|terminated} '{loop-expression}' '{serialization-expression}' '{deserialization-expression}' '{length-expression}']
[manualArray {simple-type} {size} {name} {count|length|terminated} '{loop-expression}' '{serialization-expression}' '{deserialization-expression}' '{length-expression}']
[manualArray {complex-type}       {name} {count|length|terminated} '{loop-expression}' '{serialization-expression}' '{deserialization-expression}' '{length-expression}']

manual Field

[manual {bit|byte}           {name} '{serialization-expression}' '{deserialization-expression}' '{length-expression}']
[manual {simple-type} {size} {name} '{serialization-expression}' '{deserialization-expression}' '{length-expression}']
[manual {complex-type}       {name} '{serialization-expression}' '{deserialization-expression}' '{length-expression}']

optional Field

An optional field is a type of field that can also be null.

[optional {bit|byte}           {name} ('{optional-expression}')?]
[optional {simple-type} {size} {name} ('{optional-expression}')?]
[optional {complex-type}       {name} ('{optional-expression}')?]

The optional-expression attribute is optional. If it is provided the optional-expression is evaluated. If this results in`false` nothing is parsed, if it evaluates to true it is parsed.

In any case, if when parsing the content of an optional field a assert or validation field fails, the parser is rewound to the position before starting to parse the optional field, the optional field is then skipped and the parser continues with the next field.

When serializing, if the field is null nothing is output, if it is not null it is serialized normally.

See also: - simple field: In general optional fields are identical to simple fields except the ability to be null or be skipped. - assert: Assert fields are similar to const fields, but can abort parsing of an optional filed. - validation: If a validation field in any of the subtypes fails, this aborts parsing of the optional field.

padding Field

A padding field allows aligning of data blocks. It outputs additional padding data, given amount of times specified by padding expression. Padding is added only when result of expression is bigger than zero.

[padding {bit|byte}            {name} '{pading-value}' '{times-padding}']
[padding {simple-type} {size}  {name} '{pading-value}' '{times-padding}']

When parsing a padding field is being parsed, the times-padding expressions determines how often the padding-value should be read. So it doesn’t really check if the read values match the padding-value, it just ensures the same amount of bits are being read. The read values are simply discarded.

When serializing, the times-padding defines how often the padding-value should be written.

This field doesn’t keep any data in memory.

peek Field

reserved Field

Reserved fields are very similar to const fields, however they don’t throw exceptions, but instead log messages if the values don’t match.

The reason for this is that in general reserved fields have the given value until they start being used.

If the field starts to be used this shouldn’t break existing applications, but it should raise a flag as it might make sense to update the drivers.

[reserved {bit|byte}           {name} '{reference}']
[reserved {simple-type} {size} {name} '{reference}']

When parsing the values a reserved field is parsed and the result is compared to the reference value and then discarded.

If the values don’t match, a log message is written.

This field doesn’t keep any data in memory.

See also: - const field

simple Field

Simple fields are the most common types of fields.

A simple field directly mapped to a normally typed field of a message type.

[simple {bit|byte}           {name}]
[simple {simple-type} {size} {name}]
[simple {complex-type}       {name}]

When parsing, the given type is parsed (can’t be null) and saved in the corresponding model instance’s property field.

When serializing it is serialized normally using either a simple type serializer or by delegating serialization to a complex type.

typeSwitch Field

These types of fields can only occur in discriminated types.

A discriminatedType must contain exactly one typeSwitch field, as it defines the sub-types.

[typeSwitch {field-or-attribute-1}(,{field-or-attribute-2}, ...)
    ['{field-1-value-1}' {subtype-1-name}
        ... Fields ...
    ]
    ['{field-1-value-2}', '{field-2-value-1}' {subtype-2-name}
        ... Fields ...
    ]
    ['{field-1-value-3}', '{field-2-value-2}' {subtype-2-name} [uint 8 'existing-attribute-1', uint 16 'existing-attribute-2']
        ... Fields ...
    ]

A type switch element must contain a list of at least one argument expression. Only the last option can stay empty, which results in a default type.

Each subtype declares a comma-separated list of concrete values.

It must contain at most as many elements as arguments are declared for the type switch.

The matching type is found during parsing by starting with the first argument.

If it matches and there are no more values, the type is found, if more values are provided, they are compared to the other argument values.

If no type is found, an exception is thrown.

Inside each subtype can declare fields using a subset of the types (discriminator and typeSwitch can’t be used here)

The third case in above code-snippet also passes a named attribute to the subtype. The name must be identical to any argument or named field parsed before the switchType. These arguments are then available for expressions or passing on in the subtypes.

See also: - discriminatedType

unknown Field

This type of field is mainly used when working on reverse-engineering a new protocol. It allows parsing any type of information, storing and using it and serializing it back.

In general, it’s something similar to a simple field, just explicitly states, that we don’t yet quite know how to handle the content.

validation Field

As mentioned before, a validation field is not really a field, it’s a check that is added to the type parser.

If the expression provided in the validation field fails, the parser aborts parsing and goes up the stack, till it finds the first optional field. If it finds one, it rewinds the parser to the position just before starting to parse the optional field, then skips the optional fields and continues with the next field.

If there is no optional field up the stack, then parsing fails.

virtual Field

Virtual fields have no impact on the input or output. They simply result in creating artificial get-methods in the generated model classes.

[virtual {bit|byte}           {name} '{value-expression}']
[virtual {simple-type} {size} {name} '{value-expression}']
[virtual {complex-type}       {name} '{value-expression}']

Instead of being bound to a property, the return value of a virtual property is created by evaluating the value-expression.

Parameters

Sometimes it is necessary to pass along additional parameters.

If a complex type requires parameters, these are declared in the header of that type.

[discriminatedType S7Payload(uint 8 'messageType', S7Parameter 'parameter')
    [typeSwitch 'parameter.discriminatorValues[0]', 'messageType'
        ['0xF0' S7PayloadSetupCommunication]
        ['0x04','0x01' S7PayloadReadVarRequest]
        ['0x04','0x03' S7PayloadReadVarResponse
            [arrayField S7VarPayloadDataItem 'items' count 'CAST(parameter, S7ParameterReadVarResponse).numItems']
        ]
        ['0x05','0x01' S7PayloadWriteVarRequest
            [arrayField S7VarPayloadDataItem 'items' count 'COUNT(CAST(parameter, S7ParameterWriteVarRequest).items)']
        ]
        ['0x05','0x03' S7PayloadWriteVarResponse
            [arrayField S7VarPayloadStatusItem 'items' count 'CAST(parameter, S7ParameterWriteVarResponse).numItems']
        ]
        ['0x00','0x07' S7PayloadUserData
        ]
    ]
]

Therefore, wherever a complex type is referenced an additional list of parameters can be passed to the next type.

Here comes an example of this in above snippet:

[field S7Payload   'payload'   ['messageType', 'parameter']]

Serializer and Parser-Arguments

Arguments influence the way the parser or serializer operates.

Wherever an parser-argument is used, this should also be valid in all subtypes the parser processes.

byteOrder

A byteOrder argument can set or change the byte-order used by the parser.

We currently support two variants:

  • BIG_ENDIAN

  • LITTLE_ENDIAN

encoding

Each simple type has a default encoding, which is ok for a very high percentage of cases.

Unsigned integers for example use 2s-complement notation, floating point values are encoded in IEEE 754 single- or double precision encoding. Strings are encoded as UTF-8 per default.

However, in some cases an alternate encoding needs to be used. Especially when dealing with Strings, different encodings, such as ASCII, UTF-16 and many more, can be used. But also for numeric values, different encodings might be used. For example does KNX use a 16bit floating point encoding, which is not standard or in S7 drivers a special encoding was used to encode numeric values so they represent the number in hex format.

An encoding attribute can be used to select a non-default encoding.