Common Language Infrastructure (CLI)
Partition II:
Metadata Definition and Semantics
Table of contents
5.4 Labels and Lists of Labels
6 Assemblies, Manifests and Modules
6.1 Overview of Modules, Assemblies, and Files
6.2.1 Information about the Assembly (<asmDecl>)
6.6 Declarations inside a Module or Assembly
7.3 References to User-defined Types (<typeReference>)
8 Visibility, Accessibility and Hiding
8.1 Visibility of Top-Level Types and Accessibility of Nested Types
9.1.1 Visibility and Accessibility Attributes
9.1.3 Type Semantics Attributes
9.1.5 Interoperation Attributes
9.1.6 Special Handling Attributes
9.3 Introducing and Overriding Virtual Methods
9.3.1 Introducing a Virtual Method
9.3.3 Accessibility and Overriding
9.4 Method Implementation Requirements
9.7 Controlling Instance Layout
11.2 Implementing Virtual Methods on Interfaces
13.6.1 Synchronous Calls to Delegates
13.6.2 Asynchronous Calls to Delegates
14 Defining, Referencing, and Calling Methods
14.2 Static, Instance, and Virtual Methods
14.4.2 Predefined Attributes on Methods
14.4.3 Implementation Attributes of Methods
14.5.1 Method Transition Thunks
14.5.6 Managed Native Calling Conventions (x86)
15 Defining and Referencing Fields
15.1.1 Accessibility Information
15.1.2 Field Contract Attributes
15.1.3 Interoperation Attributes
15.3 Embedding Data in a PE File
15.3.2 Accessing Data from the PE File
15.3.3 Unmanaged Thread-local Storage
15.4 Initialization of Non-Literal Static Data
15.4.1 Data Known at Link Time
20.1 CLS Conventions: Custom Attribute Usage
20.2 Attributes Used by the CLI
20.2.1 Pseudo Custom Attributes
20.2.2 Custom Attributes Defined by the CLS
20.2.3 Custom Attributes for CIL-to-Native-Code Compiler and Debugger
20.2.4 Custom Attributes for Remoting
20.2.5 Custom Attributes for Security
20.2.6 Custom Attributes for TLS
20.2.7 Pseudo Custom Attributes for the Assembly Linker
20.2.8 Custom Attributes Provided for Interoperation with Unmanaged Code
20.2.9 Custom Attributes, Various
21 Metadata Logical Format: Tables
21.1 Metadata Validation Rules
21.7 AssemblyRefProcessor : 0x24
22 Metadata Logical Format: Other Structures
22.1.1 Values for AssemblyHashAlgorithm
22.1.2 Values for AssemblyFlags
22.1.4 Flags for Events [EventAttributes]
22.1.5 Flags for Fields [FieldAttributes]
22.1.6 Flags for Files [FileAttributes]
22.1.7 Flags for ImplMap [PInvokeAttributes]
22.1.8 Flags for ManifestResource [ManifestResourceAttributes]
22.1.9 Flags for Methods [MethodAttributes]
22.1.10 Flags for Methods [MethodImplAttributes]
22.1.11 Flags for MethodSemantics [MethodSemanticsAttributes]
22.1.12 Flags for Params [ParamAttributes]
22.1.13 Flags for Properties [PropertyAttributes]
22.1.14 Flags for Types [TypeAttributes]
22.1.15 Element Types used in Signatures
24 File Format Extensions to PE
24.1 Structure of the Runtime File Format
24.3.1 Import Table and Import Address Table (IAT)
24.4 Common Intermediate Language Physical Layout
24.4.1 Method Header Type Values
24.4.4 Flags for Method Headers
24.4.6 Exception Handling Clauses
Partition I_alink=Partition_I of the Common Language Infrastructure (CLI) describes the overall architecture of the CLI, and provides the normative description of the Common Type System (CTS), the Virtual Execution System (VES), and the Common Language Specification (CLS). It also provides a non-normative description of the metadata and a comprehensive set of abbreviations, acronyms (Partition I_alink=Partition_I) and definitions, included by reference (Partition I_alink=Partition_I) from all other Partitions.
Partition II (this specification) provides the normative description of the metadata: its physical layout (as a file format), its logical contents (as a set of tables and their relationships), and its semantics (as seen from a hypothetical assembler, ilasm).
This document focuses on the structure and semantics of metadata. The semantics of metadata, which dictate much of the operation of the VES, are described using the syntax of ilasm, an assembler language for CIL. The ilasm syntax itself is considered a normative part of this ECMA standard. This constitutes Chapters 5_5_General_Syntax through 20_20_Custom_Attributes. A complete syntax for ilasm is included in Partition V_alink=Partition_V. The structure (both logical and physical) is covered in Chapters 21_21_Metedata_Logical_Format_Tables through 24_24_File_Format_Extensions_to_PE.
Rationale: An assembly language is really just syntax for specifying the metadata in a file and the CIL instructions in that file. Specifying ilasm provides a means of interchanging programs written directly for the CLI without the use of a higher-level language and also provides a convenient way to express examples.
The semantics of the metadata also can be described independently of the actual format in which the metadata is stored. This point is important because the storage format as specified Chapters 21_21_Metedata_Logical_Format_Tables through 24_24_File_Format_Extensions_to_PE is engineered to be efficient for both storage space and access time but this comes at the cost of the simplicity desirable for describing its semantics.
Validation refers to a set of tests that can be performed on any file to check that the file format, metadata, and CIL are self-consistent. These tests are intended to ensure that the file conforms to the mandatory requirements of this specification. The behavior of conforming implementations of the CLI when presented with non-conforming files is unspecified.
Verification refers to a check of both CIL and its related metadata to ensure that the CIL code sequences do not permit any access to memory outside the programs logical address space. In conjunction with the validation tests, verification ensures that the program cannot access memory or other resources to which it is not granted access.
Partition III_alink=Partition_III specifies the rules for both valid and verifiable use of CIL instructions. Partition III_alink=Partition_III also provides an informative description of rules for validating the internal consistency of metadata (the rules follow, albeit indirectly, from the specification in this Partition) as well as containing a normative description of the verification algorithm. A mathematical proof of soundness of the underlying type system is possible, and provides the basis for the verification requirements. Aside from these rules this standard does not specify:
· at what time (if ever) such an algorithm should be performed
· what a conforming implementation should do in case of failure of verification.
The following graph makes this relationship clearer (see next paragraph for a description):

Figure 1: Relationship between valid and verifiable CIL
In the above figure, the outer circle contains all code permitted by the ilasm syntax. The next circle represents all code that is valid CIL. The dotted inner circle represents all type safe code. Finally, the black innermost circle contains all code that is verifiable. (The difference between typesafe code and verifiable code is one of provability: code which passes the VES verification algorithm is, by-definition, verifiable; but that simple algorithm rejects certain code, even though a deeper analysis would reveal it as genuinely typesafe). Note that even if a program follows the syntax described in Partition V_alink=Partition_V, the code may still not be valid, because valid code shall adhere to restrictions presented in this document and in Partition III_alink=Partition_III.
Verification is a very stringent test. There are many programs that will pass validation but will fail verification. The VES cannot guarantee that these programs do not access memory or resources to which they are not granted access. Nonetheless, they may have been correctly constructed so that they do not access these resources. It is thus a matter of trust, rather than mathematical proof, whether it is safe to run these programs. A conforming implementation of the CLI may allow unverifiable code (valid code that does not pass verification) to be executed, although this may be subject to administrative trust controls that are not part of this standard. A conforming implementation of the CLI shall allow the execution of verifiable code, although this may be subject to additional implementation-specified trust controls.
This section and its subsections contain only informative text.
Before diving into the details, it is useful to see an introductory sample program to get a feeling for the ilasm assembly language. The next section shows the famous Hello World program, this time in the ilasm assembly language.
This section gives a simple example to illustrate the general feel of ilasm. Below is code that prints the well known Hello world! salutation. The salutation is written by calling WriteLine, a static method found in the class System.Console that is part of the assembly mscorlib (see Partition IV_alink=Partition_IV).
Example (informative):
.assembly extern mscorlib {}
.assembly hello {}
.method static public void main() cil managed
{ .entrypoint
.maxstack 1
ldstr "Hello world!"
call void [mscorlib]System.Console::WriteLine(class System.String)
ret
}
The .assembly extern declaration references an external assembly, mscorlib, which defines System.Console. The .assembly declaration in the second line declares the name of the assembly for this program. (Assemblies are the deployment unit for executable content for the CLI.) The .method declaration defines the global method main. The body of the method is enclosed in braces. The first line in the body indicates that this method is the entry point for the assembly (.entrypoint), and the second line in the body specifies that it requires at most one stack slot (.maxstack).
The method contains only three instructions. The ldstr instruction pushes the string constant "Hello world!" onto the stack and the call instruction invokes System.Console::WriteLine, passing the string as its only argument (note that string literals in CIL are instances of the standard class System.String). As shown, call instructions shall include the full signature of the called method. Finally, the last instruction returns (ret) from main.
This document contains integrated examples for most features of the CLI metadata. Many sections conclude with an example showing a typical use of the feature. All these examples are written using the ilasm assembly language. In addition, Partition V_alink=Partition_V contains a longer example of a program written in the ilasm assembly language. All examples are, of course, informative only.
End informative text
This section describes aspects of the ilasm syntax that are common to many parts of the grammar. The term ASCII refers to the American Standard Code for Information Interchange, a standard seven-bit code that was proposed by ANSI in 1963, and finalized in 1968. The ASCII repertoire of Unicode is the set of 128 Unicode characters from U+0000 to U+007F.
This document uses a modified form of the BNF syntax notation. The following is a brief summary of this notation.
Bold items are terminals. Items placed in angle brackets (e.g. <int64>) are names of syntax classes and shall be replaced by actual instances of the class. Items placed in square brackets (e.g. [<float>]) are optional, and any item followed by * can appear zero or more times. The character | means that the items on either side of it are acceptable. The options are sorted in alphabetical order (to be more specific: in ASCII order, ignoring < for syntax classes, and case-insensitive). If a rule starts with an optional term, the optional term is not considered for sorting purposes.
ilasm is a case-sensitive language. All terminals shall be used with the same case as specified in this reference.
Example (informative):
A grammar such as
<top> ::= <int32> | float <float> |
floats [<float> [, <float>]*] | else <QSTRING>
would consider the following all to be legal:
12
float 3
float 4.3e7
floats
floats 2.4
floats 2.4, 3.7
else "Something \t weird"
but all of the following to be illegal:
else 3
3, 4
float 4.3, 2.4
float else
stuff
The basic syntax classes used in the grammar are used to describe syntactic constraints on the input intended to convey logical restrictions on the information encoded in the metadata.
The syntactic constraints described in this clause are informative only. The semantic constraints (e.g. shall be represented in 32 bits) are normative.
<int32> is either a decimal number or 0x followed by a hexadecimal number, and shall be represented in 32 bits.
<int64> is either a decimal number or 0x followed by a hexadecimal number, and shall be represented in 64 bits.
<hexbyte> is a 2-digit hexadecimal number that fits into one byte.
<realnumber> is any syntactic representation for a floating point number that is distinct from that for all other terminal nodes. In this document, a period (.) is used to separate the integer and fractional parts, and e or E separates the mantissa from the exponent. Either (but not both) may be omitted.
Note: A complete assembler may also provide syntax for infinities and NaNs.
<QSTRING> is a string surrounded by double quote (″) marks. Within the quoted string the character \ can be used as an escape character, with \t for a tab character, \n for a new line character, or followed by three octal digits in order to insert an arbitrary byte into the string. The + operator can be used to concatenate string literals. This way, a long string can be broken across multiple lines by using + and a new string on each line. An alternative is using \ as the last character in a line, in which case the line break is not entered into the generated string. Any white characters (space, line feed, carriage return, and tab) between the \ and the first character on the next line are ignored. See also examples below.
Note: A complete assembler will need to deal with the full set of issues required to support Unicode encodings, see Partition I_alink=Partition_I (especially CLS Rule 4).
<SQSTRING> is similar to <QSTRING> with the difference that it is surround by single quote (′) marks instead of double quote marks.
<ID> is a contiguous string of characters which starts with either an alphabetic character or one of _, $, @ or ? and is followed by any number of alphanumeric characters or any of _, $, @, or ?. An <ID> is used in only two ways:
· As a label of a CIL instruction
· As an <id> which can either be an <ID> or an <SQSTRING>, so that special characters can be included.
Example (informative):
The following examples shows breaking of strings:
ldstr "Hello " + "World " +
"from CIL!"
and
ldstr "Hello World\
\040from CIL!"
become both "Hello World from CIL!".
Identifiers are used to name entities. Simple identifiers are just equivalent to an <ID>. However, the ilasm syntax allows the use of any identifier that can be formed using the Unicode character set (see Partition I_alink=Partition_I). To achieve this an identifier is placed within single quotation marks. This is summarized in the following grammar.
|
<id> ::= |
|
<ID> |
|
| <SQSTRING> |
Keywords may only be used as identifiers if they appear in single quotes (see