The MSpec format
The MSpec
format (Message Specification) was a result of a brainstorming session after evaluating a lot of other options.
We simply sat down and started to write some imaginary format (imaginary
was even the initial Name we used) and created parses for this afterwards and fine-tuned spec and parsers as part of the process of implementing first protocols and language templates.
It’s a text-based format.
At the root level of these specs are a set of type
, discriminatedType
, dataIo
and enum
blocks.
type
elements are objects who’s content is independent of the input.
An example would be the TPKTPacket
of the S7 format:
[type TPKTPacket [const uint 8 protocolId 0x03] [reserved uint 8 '0x00'] [implicit uint 16 len 'payload.lengthInBytes + 4'] [field COTPPacket 'payload'] ]
A discriminatedType
type, in contrast, is an object who’s content and structure is influenced by the input.
Every discriminated type can contain an arbitrary number of discriminator
fields and exactly one typeSwitch
element.
For example part of the spec for the S7 format looks like this:
[discriminatedType S7Message [const uint 8 protocolId 0x32] [discriminator uint 8 messageType] [reserved uint 16 '0x0000'] [simple uint 16 tpduReference] [implicit uint 16 parameterLength 'parameter.lengthInBytes'] [implicit uint 16 payloadLength 'payload.lengthInBytes'] [typeSwitch 'messageType' ['0x01' S7MessageRequest ] ['0x03' S7MessageResponse [simple uint 8 errorClass] [simple uint 8 errorCode ] ] ['0x07' S7MessageUserData ] ] [simple S7Parameter('messageType') parameter] [simple S7Payload('messageType', 'parameter') payload ] ]
A types start is declared by an opening square bracket [
and ended with a closing one ]
.
Also, to both provide a name as first argument.
Every type definition contains a list of fields that can have different types.
The list of available types are:
-
abstract: used in the parent type declaration do declare a field that has to be defined with the identical type in all sub-types (reserved for
discriminatedType
). -
array: array of simple or complex typed objects.
-
checksum: used for calculating and verifying checksum values.
-
const: expects a given value and causes a hard exception if the value doesn’t match.
-
discriminator: special type of simple typed field which is used to determine the concrete type of object (reserved for
discriminatedType
). -
enum: special form of field, used if an enum types property is to be used instead of it’s primary value.
-
implicit: a field required for parsing, but is usually defined though other data, so it’s not stored in the object, but calculated on serialization.
-
assert: generally similar to
constant
fields, however do they throwAssertionExceptions
instead of hardParseExceptions
. They are used in combination with optional fields. -
manualArray: like an array field, however the logic for serializing, parsing, number of elements and size have to be provided manually.
-
manual: simple field, where the logic for parsing, serializing and size have to be provided manually.
-
optional: simple or complex typed object, that is only present if an optional condition expression evaluates to
true
and noAssertionException
is thrown when parsing the referenced type. -
padding: field used to add padding data to make datastructures aligned.
-
reserved: expects a given value, but only warns if condition is not meet.
-
simple: simple or complex typed object.
-
typeSwitch: not a real field, but indicates the existence of sub-types, which are declared inline (reserved for
discriminatedType
). -
unknown: field used to declare parts of a message that still has to be defined. Generally used when reverse-engineering a protocol. Messages with
unknown
fields can only be parsed and not serialized. -
virtual: generates a field in the message, that is generally only used for simplification. It’s not used for parsing or serializing.
The full syntax and explanations of these type follow in the following chapters.
Another thing we have to explain are how types are specified.
In general, we distinguish between two types of types used in field definitions:
-
simple types
-
complex types
Simple Types
Simple types are usually raw data the format is:
{base-type} {size}
The base types available are currently:
-
bit: Simple boolean value or bit.
-
byte: Special value fixed to 8 bit, which defaults to either signed or unsigned depending on the programming language (Java it defaults to signed integer values and in C and Go it defaults to unsigned integers).
-
uint: The input is treated as unsigned integer value.
-
int: The input is treated as signed integer value.
-
float: The input is treated as floating point number.
-
string: The input is treated as string.
All above types take a size
value which provides how many bits
should be read.
All except the bit
type, which is fixed to one single bit.
So reading an unsigned byte would be: uint 8
.
There is currently one special type, reserved for string values, whose length is determined by an expression instead of a fixed number of bits. It is considered a variable length string:
-
vstring: The input is treated as a variable length string and requires an expression tp provide the number of bits to read.
Complex Types
In contrast to simple types, complex type reference other complex types (Root elements of the spec document).
How the parser should interpret them is defined in the referenced types definition.
In the example above, for example the S7Parameter
is defined in another part of the spec.
Field Types and their Syntax
array Field
An array
field is exactly what you expect.
It generates an field which is not a single-value element but an array or list of elements.
[array {simple-type} {size} '{name}' {'count', 'length', 'terminated'} '{expression}']
[array {complex-type} '{name}' {'count', 'length', 'terminated'} '{expression}']
Array types can be both simple and complex typed and have a name.
An array field must specify the way it’s length is determined as well as an expression defining it’s length.
Possible values are:
- count
: This means that exactly the number of elements are parsed as the expression
specifies.
- length
: In this case a given number of bytes are being read. So if an element has been parsed and there are still bytes left, another element is parsed.
- terminated
: In this case the parser will continue reading elements until it encounters a termination sequence.
checksum Field
A checksum field can only operate on simple types.
[checksum {simple-type} {size} '{name}' '{checksum-expression}']
When parsing a given simple type is parsed and then the result is compared to the value the checksum-expression
provides.
If they don’t match an exception is thrown.
When serializing, the checksum-expression
is evaluated and the result is then output.
Note: As quite often a checksum is calculated based on the byte data of a message read up to the checksum, an artificial variable is available in expressions called checksumRawData
of type byte[]
which contains an array of all the byte data read in the current message element and it’s sub types in case of a discriminated type.
This field doesn’t keep any data in memory.
See also:
- implicit field: A checksum field is similar to an implicit field, however the checksum-expression
is evaluated are parsing time and throws an exception if the values don’t match.
const Field
A const field simply reads a given simple type and compares to a given reference value.
[const {simple-type} {size} '{name}' {reference}]
When parsing it makes the parser throw an Exception if the parsed value does not match.
When serializing is simply outputs the expected constant.
This field doesn’t keep any data in memory.
See also: - implicit field: A const field is similar to an implicit field, however it compares the parsed input to the reference value and throws an exception if the values don’t match.
discriminator Field
Discriminator fields are only used in `discriminatedType`s.
[discriminator {simple-type} {size} '{name}']
When parsing a discriminator fields result just in being a locally available variable.
When serializing is accesses the discriminated types constants and uses these as output.
See also: - implicit field: A discriminator field is similar to an implicit field, however doesn’t provide a serialization expression as it uses the discrimination constants of the type it is. - discriminated types
implicit Field
Implicit types are fields that get their value implicitly from the data they contain.
[implicit {simple-type} {size} '{name}' '{serialization-expression}']
When parsing an implicit type is available as a local variable and can be used by other expressions.
When serializing the serialization-expression is executed and the resulting value is output.
This type of field is generally used for fields that handle numbers of elements or length values as these can be implicitly calculated at serialization time.
This field doesn’t keep any data in memory.
manualArray Field
[manualArray {simple-type} {size} '{name}' {'count', 'length', 'terminated'} '{loop-expression}' '{serialization-expression}' '{deserialization-expression}' '{length-expression}']
[manualArray {complex-type} '{name}' {'count', 'length', 'terminated'} '{loop-expression}' '{serialization-expression}' '{deserialization-expression}' '{length-expression}']
manual Field
[manual {simple-type} {size} '{name}' '{serialization-expression}' '{deserialization-expression}' '{length-expression}']
[manual {complex-type} '{name}' '{serialization-expression}' '{deserialization-expression}' '{length-expression}']
optional Field
An optional field is a type of field that can also be null
.
[optional {simple-type} {size} '{name}' '{optional-expression}']
[optional {complex-type} '{name}' '{optional-expression}']
When parsing the optional-expression
is evaluated. If this results in`false` nothing is output, if it evaluates to true
it is serialized as a simple
field.
When serializing, if the field is null
nothing is output, if it is not null
it is serialized normally.
See also:
- simple field: In general optional
fields are identical to simple
fields except the ability to be null
or be skipped.
padding Field
A padding field allows aligning of data blocks. It outputs additional padding data, given amount of times specified by padding expression. Padding is added only when result of expression is bigger than zero.
[padding {simple-type} {size} '{pading-value}' '{padding-expression}']
When parsing a padding
field is just consumed without being made available as property or local variable if the padding-expression
evaluates to value greater than zero.
If it doesn’t, it is just skipped.
This field doesn’t keep any data in memory.
reserved Field
Reserved fields are very similar to const
fields, however they don’t throw exceptions, but instead log messages if the values don’t match.
The reason for this is that in general reserved fields have the given value until they start to be used.
If the field starts to be used this shouldn’t break existing applications, but it should raise a flag as it might make sense to update the drivers.
[reserved {simple-type} {size} '{name}' '{reference}']
When parsing the values is parsed and the result is compared to the reference value. If the values don’t match, a log message is sent.
This field doesn’t keep any data in memory.
See also:
- const
field
simple Field
Simple fields are the most common types of fields.
A simple
field directly mapped to a normally typed field.
[simple {simple-type} {size} '{name}']
[simple {complex-type} '{name}']
When parsing, the given type is parsed (can’t be null
) and saved in the corresponding model instance’s property field.
When serializing it is serialized normally.
virtual Field
Virtual fields have no impact on the input or output. They simply result in creating artificial get-methods in the generated model classes.
[virtual {simple-type} {size} '{name}' '{value-expression}']
[virtual {complex-type} '{name}' '{value-expression}']
Instead of being bound to a property, the return value of a virtual
property is created by evaluating the value-expression
.
typeSwitch Field
These types of fields can only occur in discriminated types.
A discriminatedType
must contain exactly one typeSwitch
field, as it defines the sub-types.
[typeSwitch '{arument-1}', '{arument-2}', ... ['{argument-1-value-1}' {subtype-1-name} ... Fields ... ] ['{vargument-1-value-2}', '{argument-2-value-1}' {subtype-2-name} ... Fields ... ] ['{vargument-1-value-3}', '{argument-2-value-2}' {subtype-2-name} [uint 8 'existing-attribute-1', uint 16 'existing-attribute-2'] ... Fields ... ]
A type switch element must contain a list of at least one argument expression. Only the last option can stay empty, which results in a default type.
Each sub-type declares a comma-separated list of concrete values.
It must contain at most as many elements as arguments are declared for the type switch.
The matching type is found during parsing by starting with the first argument.
If it matches and there are no more values, the type is found, if more values are provided, they are compared to the other argument values.
If no type is found, an exception is thrown.
Inside each sub-type can declare fields using a subset of the types (discriminator
and typeSwitch
can’t be used here)
The third case in above code-snippet also passes a named attribute to the sub-type. The name must be identical to any argument or named field parsed before the switchType. These arguments are then available for expressions or passing on in the subtypes.
See also:
- discriminatedType
Parameters
Some times it is necessary to pass along additional parameters.
If a complex type requires parameters, these are declared in the header of that type.
[discriminatedType S7Payload(uint 8 'messageType', S7Parameter 'parameter') [typeSwitch 'parameter.discriminatorValues[0]', 'messageType' ['0xF0' S7PayloadSetupCommunication] ['0x04','0x01' S7PayloadReadVarRequest] ['0x04','0x03' S7PayloadReadVarResponse [arrayField S7VarPayloadDataItem 'items' count 'CAST(parameter, S7ParameterReadVarResponse).numItems'] ] ['0x05','0x01' S7PayloadWriteVarRequest [arrayField S7VarPayloadDataItem 'items' count 'COUNT(CAST(parameter, S7ParameterWriteVarRequest).items)'] ] ['0x05','0x03' S7PayloadWriteVarResponse [arrayField S7VarPayloadStatusItem 'items' count 'CAST(parameter, S7ParameterWriteVarResponse).numItems'] ] ['0x00','0x07' S7PayloadUserData ] ] ]
Therefore wherever a complex type is referenced an additional list of parameters can be passed to the next type.
Here comes an example of this in above snippet:
[field S7Payload 'payload' ['messageType', 'parameter']]