Common Language Infrastructure (CLI)

Partition II:
Metadata Definition and Semantics

 

Table of contents

 

1                  Scope  8

2                  Overview  9

3                  Validation and Verification  10

4                  Introductory Examples  11

4.1              Hello World Example  11

4.2              Examples  11

5                  General Syntax  12

5.1              General Syntax Notation  12

5.2              Terminals  12

5.3              Identifiers  13

5.4              Labels and Lists of Labels  14

5.5              Lists of Hex Bytes  15

5.6              Floating point numbers  15

5.7              Source Line Information  15

5.8              File Names  15

5.9              Attributes and Metadata  15

5.10            ilasm Source Files  15

6                  Assemblies, Manifests and Modules  15

6.1              Overview of Modules, Assemblies, and Files  15

6.2              Defining an Assembly  15

6.2.1          Information about the Assembly (<asmDecl>) 15

6.2.2          Manifest Resources  15

6.2.3          Files in the Assembly  15

6.3              Referencing Assemblies  15

6.4              Declaring Modules  15

6.5              Referencing Modules  15

6.6              Declarations inside a Module or Assembly  15

6.7              Exported Type Definitions  15

7                  Types and Signatures  15

7.1              Types  15

7.1.1          modreq and modopt 15

7.1.2          pinned  15

7.2              Built-in Types  15

7.3              References to User-defined Types (<typeReference>) 15

7.4              Native Data Types  15

8                  Visibility, Accessibility and Hiding  15

8.1              Visibility of Top-Level Types and Accessibility of Nested Types  15

8.2              Accessibility  15

8.3              Hiding  15

9                  Defining Types  15

9.1              Type Header (<classHead>) 15

9.1.1          Visibility and Accessibility Attributes  15

9.1.2          Type Layout Attributes  15

9.1.3          Type Semantics Attributes  15

9.1.4          Inheritance Attributes  15

9.1.5          Interoperation Attributes  15

9.1.6          Special Handling Attributes  15

9.2              Body of a Type Definition  15

9.3              Introducing and Overriding Virtual Methods  15

9.3.1          Introducing a Virtual Method  15

9.3.2          The .override Directive  15

9.3.3          Accessibility and Overriding  15

9.4              Method Implementation Requirements  15

9.5              Special Members  15

9.5.1          Instance constructors  15

9.5.2          Instance Finalizer 15

9.5.3          Type Initializer 15

9.6              Nested Types  15

9.7              Controlling Instance Layout 15

9.8              Global Fields and Methods  15

10                Semantics of Classes  15

11                Semantics of Interfaces  15

11.1             Implementing Interfaces  15

11.2             Implementing Virtual Methods on Interfaces  15

12                Semantics of Value Types  15

12.1             Referencing Value Types  15

12.2             Initializing Value Types  15

12.3             Methods of Value Types  15

13                Semantics of Special Types  15

13.1             Vectors  15

13.2             Arrays  15

13.3             Enums  15

13.4             Pointer Types  15

13.4.1         Unmanaged Pointers  15

13.4.2         Managed Pointers  15

13.5             Method Pointers  15

13.6             Delegates  15

13.6.1         Synchronous Calls to Delegates  15

13.6.2         Asynchronous Calls to Delegates  15

14                Defining, Referencing, and Calling Methods  15

14.1             Method Descriptors  15

14.1.1         Method Declarations  15

14.1.2         Method Definitions  15

14.1.3         Method References  15

14.1.4         Method Implementations  15

14.2             Static, Instance, and Virtual Methods  15

14.3             Calling Convention  15

14.4             Defining Methods  15

14.4.1         Method Body  15

14.4.2         Predefined Attributes on Methods  15

14.4.3         Implementation Attributes of Methods  15

14.4.4         Scope Blocks  15

14.4.5         vararg Methods  15

14.5             Unmanaged Methods  15

14.5.1         Method Transition Thunks  15

14.5.2         Platform Invoke  15

14.5.3         Via Function Pointers  15

14.5.4         COM Interop  15

14.5.5         Data Type Marshaling  15

14.5.6         Managed Native Calling Conventions (x86) 15

15                Defining and Referencing Fields  15

15.1             Attributes of Fields  15

15.1.1         Accessibility Information  15

15.1.2         Field Contract Attributes  15

15.1.3         Interoperation Attributes  15

15.1.4         Other Attributes  15

15.2             Field Init Metadata  15

15.3             Embedding Data in a PE File  15

15.3.1         Data Declaration  15

15.3.2         Accessing Data from the PE File  15

15.3.3         Unmanaged Thread-local Storage  15

15.4             Initialization of Non-Literal Static Data  15

15.4.1         Data Known at Link Time  15

15.5             Data Known at Load Time  15

15.5.1         Data Known at Run Time  15

16                Defining Properties  15

17                Defining Events  15

18                Exception Handling  15

18.1             Protected Blocks  15

18.2             Handler Blocks  15

18.3             Catch  15

18.4             Filter 15

18.5             Finally  15

18.6             Fault Handler 15

19                Declarative Security  15

20                Custom Attributes  15

20.1             CLS Conventions: Custom Attribute Usage  15

20.2             Attributes Used by the CLI 15

20.2.1         Pseudo Custom Attributes  15

20.2.2         Custom Attributes Defined by the CLS  15

20.2.3         Custom Attributes for CIL-to-Native-Code Compiler and Debugger 15

20.2.4         Custom Attributes for Remoting  15

20.2.5         Custom Attributes for Security  15

20.2.6         Custom Attributes for TLS  15

20.2.7         Pseudo Custom Attributes for the Assembly Linker 15

20.2.8         Custom Attributes Provided for Interoperation with Unmanaged Code  15

20.2.9         Custom Attributes, Various  15

21                Metadata Logical Format: Tables  15

21.1             Metadata Validation Rules  15

21.2             Assembly : 0x20  15

21.3             AssemblyOS : 0x22  15

21.4             AssemblyProcessor : 0x21  15

21.5             AssemblyRef : 0x23  15

21.6             AssemblyRefOS : 0x25  15

21.7             AssemblyRefProcessor : 0x24  15

21.8             ClassLayout : 0x0F  15

21.9             Constant : 0x0B  15

21.10           CustomAttribute : 0x0C  15

21.11           DeclSecurity : 0x0E  15

21.12           EventMap : 0x12  15

21.13           Event : 0x14  15

21.14           ExportedType : 0x27  15

21.15           Field : 0x04  15

21.16           FieldLayout : 0x10  15

21.17           FieldMarshal : 0x0D   15

21.18           FieldRVA : 0x1D   15

21.19           File : 0x26  15

21.20           ImplMap : 0x1C  15

21.21           InterfaceImpl : 0x09  15

21.22           ManifestResource : 0x28  15

21.23           MemberRef : 0x0A   15

21.24           Method : 0x06  15

21.25           MethodImpl : 0x19  15

21.26           MethodSemantics : 0x18  15

21.27           Module : 0x00  15

21.28           ModuleRef : 0x1A   15

21.29           NestedClass : 0x29  15

21.30           Param : 0x08  15

21.31           Property : 0x17  15

21.32           PropertyMap : 0x15  15

21.33           StandAloneSig : 0x11  15

21.34           TypeDef : 0x02  15

21.35           TypeRef : 0x01  15

21.36           TypeSpec : 0x1B  15

22                Metadata Logical Format: Other Structures  15

22.1             Bitmasks and Flags  15

22.1.1         Values for AssemblyHashAlgorithm   15

22.1.2         Values for AssemblyFlags  15

22.1.3         Values for Culture  15

22.1.4         Flags for Events [EventAttributes] 15

22.1.5         Flags for Fields [FieldAttributes] 15

22.1.6         Flags for Files [FileAttributes] 15

22.1.7         Flags for ImplMap [PInvokeAttributes] 15

22.1.8         Flags for ManifestResource [ManifestResourceAttributes] 15

22.1.9         Flags for Methods [MethodAttributes] 15

22.1.10       Flags for Methods [MethodImplAttributes] 15

22.1.11       Flags for MethodSemantics [MethodSemanticsAttributes] 15

22.1.12       Flags for Params [ParamAttributes] 15

22.1.13       Flags for Properties [PropertyAttributes] 15

22.1.14       Flags for Types [TypeAttributes] 15

22.1.15       Element Types used in Signatures  15

22.2             Blobs and Signatures  15

22.2.1         MethodDefSig  15

22.2.2         MethodRefSig  15

22.2.3         StandAloneMethodSig  15

22.2.4         FieldSig  15

22.2.5         PropertySig  15

22.2.6         LocalVarSig  15

22.2.7         CustomMod  15

22.2.8         TypeDefOrRefEncoded  15

22.2.9         Constraint 15

22.2.10       Param   15

22.2.11       RetType  15

22.2.12       Type  15

22.2.13       ArrayShape  15

22.2.14       TypeSpec  15

22.2.15       Short Form Signatures  15

22.3             Custom Attributes  15

22.4             Marshalling Descriptors  15

23                Metadata Physical Layout 15

23.1             Fixed Fields  15

23.2             File Headers  15

23.2.1         Metadata root 15

23.2.2         Stream Header 15

23.2.3         #Strings heap  15

23.2.4         #US and #Blob heaps  15

23.2.5         #GUID heap  15

23.2.6         #~ stream   15

23.2.7         Coded Indexes  15

24                File Format Extensions to PE  15

24.1             Structure of the Runtime File Format 15

24.2             PE Headers  15

24.2.1         MS-DOS Header 15

24.2.2         PE File Header 15

24.2.3         PE Optional Header 15

24.3             Section Headers  15

24.3.1         Import Table and Import Address Table (IAT) 15

24.3.2         Relocations  15

24.3.3         CLI Header 15

24.4             Common Intermediate Language Physical Layout 15

24.4.1         Method Header Type Values  15

24.4.2         Tiny Format 15

24.4.3         Fat Format 15

24.4.4         Flags for Method Headers  15

24.4.5         Method Data Section  15

24.4.6         Exception Handling Clauses  15

 


1         Scope

Partition I_alink=Partition_I of the Common Language Infrastructure (CLI) describes the overall architecture of the CLI, and provides the normative description of the Common Type System (CTS), the Virtual Execution System (VES), and the Common Language Specification (CLS).  It also provides a non-normative description of the metadata and a comprehensive set of abbreviations, acronyms (Partition I_alink=Partition_I) and definitions, included by reference (Partition I_alink=Partition_I) from all other Partitions.

Partition II (this specification) provides the normative description of the metadata: its physical layout (as a file format), its logical contents (as a set of tables and their relationships), and its semantics (as seen from a hypothetical assembler, ilasm).

2         Overview

This document focuses on the structure and semantics of metadata. The semantics of metadata, which dictate much of the operation of the VES, are described using the syntax of ilasm, an assembler language for CIL.  The ilasm syntax itself is considered a normative part of this ECMA standard.  This constitutes Chapters 5_5_General_Syntax through 20_20_Custom_Attributes. A complete syntax for ilasm is included in Partition V_alink=Partition_V. The structure (both logical and physical) is covered in Chapters 21_21_Metedata_Logical_Format_Tables through 24_24_File_Format_Extensions_to_PE.

Rationale: An assembly language is really just syntax for specifying the metadata in a file and the CIL instructions in that file.   Specifying ilasm provides a means of interchanging programs written directly for the CLI without the use of a higher-level language and also provides a convenient way to express examples.

The semantics of the metadata also can be described independently of the actual format in which the metadata is stored.  This point is important because the storage format as specified Chapters 21_21_Metedata_Logical_Format_Tables through 24_24_File_Format_Extensions_to_PE is engineered to be efficient for both storage space and access time but this comes at the cost of the simplicity desirable for describing its semantics.

3         Validation and Verification

Validation refers to a set of tests that can be performed on any file to check that the file format, metadata, and CIL are self-consistent. These tests are intended to ensure that the file conforms to the mandatory requirements of this specification.  The behavior of conforming implementations of the CLI when presented with non-conforming files is unspecified.

Verification refers to a check of both CIL and its related metadata to ensure that the CIL code sequences do not permit any access to memory outside the program’s logical address space. In conjunction with the validation tests, verification ensures that the program cannot access memory or other resources to which it is not granted access.

Partition III_alink=Partition_III specifies the rules for both valid and verifiable use of CIL instructions.  Partition III_alink=Partition_III also provides an informative description of rules for validating the internal consistency of metadata (the rules follow, albeit indirectly, from the specification in this Partition) as well as containing a normative description of the verification algorithm.  A mathematical proof of soundness of the underlying type system is possible, and provides the basis for the verification requirements.  Aside from these rules this standard does not specify:

·              at what time (if ever) such an algorithm should be performed

·              what a conforming implementation should do in case of failure of verification. 

The following graph makes this relationship clearer (see next paragraph for a description):

Figure 1: Relationship between valid and verifiable CIL

In the above figure, the outer circle contains all code permitted by the ilasm syntax. The next circle represents all code that is valid CIL. The dotted inner circle represents all type safe code.  Finally, the black innermost circle contains all code that is verifiable.  (The difference between typesafe code and verifiable code is one of provability: code which passes the VES verification algorithm is, by-definition, verifiable; but that simple algorithm rejects certain code, even though a deeper analysis would reveal it as genuinely typesafe).  Note that even if a program follows the syntax described in Partition V_alink=Partition_V, the code may still not be valid, because valid code shall adhere to restrictions presented in this document and in Partition III_alink=Partition_III.

Verification is a very stringent test. There are many programs that will pass validation but will fail verification. The VES cannot guarantee that these programs do not access memory or resources to which they are not granted access. Nonetheless, they may have been correctly constructed so that they do not access these resources. It is thus a matter of trust, rather than mathematical proof, whether it is safe to run these programs. A conforming implementation of the CLI may allow unverifiable code (valid code that does not pass verification) to be executed, although this may be subject to administrative trust controls that are not part of this standard.  A conforming implementation of the CLI shall allow the execution of verifiable code, although this may be subject to additional implementation-specified trust controls.


4         Introductory Examples

This section and its subsections contain only informative text.

Before diving into the details, it is useful to see an introductory sample program to get a feeling for the ilasm assembly language. The next section shows the famous Hello World program, this time in the ilasm assembly language.

4.1         Hello World Example

This section gives a simple example to illustrate the general feel of ilasm. Below is code that prints the well known “Hello world!” salutation. The salutation is written by calling WriteLine, a static method found in the class System.Console that is part of the assembly mscorlib (see Partition IV_alink=Partition_IV).

Example (informative):

.assembly extern mscorlib {}

.assembly hello {}

.method static public void main() cil managed

{ .entrypoint

  .maxstack 1

  ldstr "Hello world!"

  call void [mscorlib]System.Console::WriteLine(class System.String)

  ret

}

The .assembly extern declaration references an external assembly, mscorlib, which defines System.Console. The .assembly declaration in the second line declares the name of the assembly for this program.  (Assemblies are the deployment unit for executable content for the CLI.)  The .method declaration defines the global method main.   The body of the method is enclosed in braces.  The first line in the body indicates that this method is the entry point for the assembly (.entrypoint), and the second line in the body specifies that it requires at most one stack slot (.maxstack).

The method contains only three instructions. The ldstr instruction pushes the string constant "Hello world!" onto the stack and the call instruction invokes System.Console::WriteLine, passing the string as its only argument (note that string literals in CIL are instances of the standard class System.String). As shown, call instructions shall include the full signature of the called method. Finally, the last instruction returns (ret) from main.

4.2         Examples

This document contains integrated examples for most features of the CLI metadata. Many sections conclude with an example showing a typical use of the feature. All these examples are written using the ilasm assembly language.  In addition, Partition V_alink=Partition_V contains a longer example of a program written in the ilasm assembly language.  All examples are, of course, informative only.

End informative text


5         General Syntax

This section describes aspects of the ilasm syntax that are common to many parts of the grammar.  The term “ASCII” refers to the American Standard Code for Information Interchange, a standard seven-bit code that was proposed by ANSI in 1963, and finalized in 1968.  The ASCII repertoire of Unicode is the set of 128 Unicode characters from U+0000 to U+007F.

5.1         General Syntax Notation

This document uses a modified form of the BNF syntax notation. The following is a brief summary of this notation.

Bold items are terminals. Items placed in angle brackets (e.g. <int64>) are names of syntax classes and shall be replaced by actual instances of the class. Items placed in square brackets (e.g. [<float>]) are optional, and any item followed by * can appear zero or more times. The character “|” means that the items on either side of it are acceptable. The options are sorted in alphabetical order (to be more specific: in ASCII order, ignoring “<” for syntax classes, and case-insensitive). If a rule starts with an optional term, the optional term is not considered for sorting purposes.

ilasm is a case-sensitive language. All terminals shall be used with the same case as specified in this reference.

Example (informative):

A grammar such as

<top> ::= <int32> | float <float> |

          floats [<float> [, <float>]*] | else <QSTRING>

would consider the following all to be legal:

     12

     float 3

     float –4.3e7

     floats

     floats 2.4

     floats 2.4, 3.7

     else "Something \t weird"

but all of the following to be illegal:

     else 3

     3, 4

     float 4.3, 2.4

     float else

     stuff

5.2         Terminals

The basic syntax classes used in the grammar are used to describe syntactic constraints on the input intended to convey logical restrictions on the information encoded in the metadata.

The syntactic constraints described in this clause are informative only.  The semantic constraints (e.g. “shall be represented in 32 bits”) are normative.

<int32> is either a decimal number or “0x” followed by a hexadecimal number, and shall be represented in 32 bits.

<int64> is either a decimal number or “0x” followed by a hexadecimal number, and shall be represented in 64 bits.

<hexbyte> is a 2-digit hexadecimal number that fits into one byte.

<realnumber> is any syntactic representation for a floating point number that is distinct from that for all other terminal nodes.  In this document, a period (.) is used to separate the integer and fractional parts, and “e” or “E” separates the mantissa from the exponent.  Either (but not both) may be omitted.

Note: A complete assembler may also provide syntax for infinities and NaNs.

<QSTRING> is a string surrounded by double quote (″) marks. Within the quoted string the character “\” can be used as an escape character, with “\t” for a tab character, “\n” for a new line character, or followed by three octal digits in order to insert an arbitrary byte into the string. The “+” operator can be used to concatenate string literals. This way, a long string can be broken across multiple lines by using “+” and a new string on each line. An alternative is using “\” as the last character in a line, in which case the line break is not entered into the generated string. Any white characters (space, line feed, carriage return, and tab) between the “\” and the first character on the next line are ignored. See also examples below.

Note: A complete assembler will need to deal with the full set of issues required to support Unicode encodings, see Partition I_alink=Partition_I (especially CLS Rule 4).

<SQSTRING> is similar to <QSTRING> with the difference that it is surround by single quote (′) marks instead of double quote marks.

<ID> is a contiguous string of characters which starts with either an alphabetic character or one of “_”, “$”, “@” or “?” and is followed by any number of alphanumeric characters or any of “_”, “$”,  “@”, or “?”. An <ID> is used in only two ways:

·              As a label of a CIL instruction

·              As an <id> which can either be an <ID> or an <SQSTRING>, so that special characters can be included.

Example (informative):

The following examples shows breaking of strings:

    ldstr "Hello " + "World " +

    "from CIL!"

and

    ldstr "Hello World\

       \040from CIL!"

become both "Hello World from CIL!".

5.3         Identifiers

Identifiers are used to name entities. Simple identifiers are just equivalent to an <ID>. However, the ilasm syntax allows the use of any identifier that can be formed using the Unicode character set (see Partition I_alink=Partition_I). To achieve this an identifier is placed within single quotation marks. This is summarized in the following grammar.

<id> ::=

  <ID>

   | <SQSTRING>

 

Keywords may only be used as identifiers if they appear in single quotes (see