Common Language Infrastructure (CLI)

Partition II:
Metadata Definition and Semantics

 

Table of contents

 

1                  Scope  8

2                  Overview  9

3                  Validation and Verification  10

4                  Introductory Examples  11

4.1              Hello World Example  11

4.2              Examples  11

5                  General Syntax  12

5.1              General Syntax Notation  12

5.2              Terminals  12

5.3              Identifiers  13

5.4              Labels and Lists of Labels  14

5.5              Lists of Hex Bytes  15

5.6              Floating point numbers  15

5.7              Source Line Information  15

5.8              File Names  15

5.9              Attributes and Metadata  15

5.10            ilasm Source Files  15

6                  Assemblies, Manifests and Modules  15

6.1              Overview of Modules, Assemblies, and Files  15

6.2              Defining an Assembly  15

6.2.1          Information about the Assembly (<asmDecl>) 15

6.2.2          Manifest Resources  15

6.2.3          Files in the Assembly  15

6.3              Referencing Assemblies  15

6.4              Declaring Modules  15

6.5              Referencing Modules  15

6.6              Declarations inside a Module or Assembly  15

6.7              Exported Type Definitions  15

7                  Types and Signatures  15

7.1              Types  15

7.1.1          modreq and modopt 15

7.1.2          pinned  15

7.2              Built-in Types  15

7.3              References to User-defined Types (<typeReference>) 15

7.4              Native Data Types  15

8                  Visibility, Accessibility and Hiding  15

8.1              Visibility of Top-Level Types and Accessibility of Nested Types  15

8.2              Accessibility  15

8.3              Hiding  15

9                  Defining Types  15

9.1              Type Header (<classHead>) 15

9.1.1          Visibility and Accessibility Attributes  15

9.1.2          Type Layout Attributes  15

9.1.3          Type Semantics Attributes  15

9.1.4          Inheritance Attributes  15

9.1.5          Interoperation Attributes  15

9.1.6          Special Handling Attributes  15

9.2              Body of a Type Definition  15

9.3              Introducing and Overriding Virtual Methods  15

9.3.1          Introducing a Virtual Method  15

9.3.2          The .override Directive  15

9.3.3          Accessibility and Overriding  15

9.4              Method Implementation Requirements  15

9.5              Special Members  15

9.5.1          Instance constructors  15

9.5.2          Instance Finalizer 15

9.5.3          Type Initializer 15

9.6              Nested Types  15

9.7              Controlling Instance Layout 15

9.8              Global Fields and Methods  15

10                Semantics of Classes  15

11                Semantics of Interfaces  15

11.1             Implementing Interfaces  15

11.2             Implementing Virtual Methods on Interfaces  15

12                Semantics of Value Types  15

12.1             Referencing Value Types  15

12.2             Initializing Value Types  15

12.3             Methods of Value Types  15

13                Semantics of Special Types  15

13.1             Vectors  15

13.2             Arrays  15

13.3             Enums  15

13.4             Pointer Types  15

13.4.1         Unmanaged Pointers  15

13.4.2         Managed Pointers  15

13.5             Method Pointers  15

13.6             Delegates  15

13.6.1         Synchronous Calls to Delegates  15

13.6.2         Asynchronous Calls to Delegates  15

14                Defining, Referencing, and Calling Methods  15

14.1             Method Descriptors  15

14.1.1         Method Declarations  15

14.1.2         Method Definitions  15

14.1.3         Method References  15

14.1.4         Method Implementations  15

14.2             Static, Instance, and Virtual Methods  15

14.3             Calling Convention  15

14.4             Defining Methods  15

14.4.1         Method Body  15

14.4.2         Predefined Attributes on Methods  15

14.4.3         Implementation Attributes of Methods  15

14.4.4         Scope Blocks  15

14.4.5         vararg Methods  15

14.5             Unmanaged Methods  15

14.5.1         Method Transition Thunks  15

14.5.2         Platform Invoke  15

14.5.3         Via Function Pointers  15

14.5.4         COM Interop  15

14.5.5         Data Type Marshaling  15

14.5.6         Managed Native Calling Conventions (x86) 15

15                Defining and Referencing Fields  15

15.1             Attributes of Fields  15

15.1.1         Accessibility Information  15

15.1.2         Field Contract Attributes  15

15.1.3         Interoperation Attributes  15

15.1.4         Other Attributes  15

15.2             Field Init Metadata  15

15.3             Embedding Data in a PE File  15

15.3.1         Data Declaration  15

15.3.2         Accessing Data from the PE File  15

15.3.3         Unmanaged Thread-local Storage  15

15.4             Initialization of Non-Literal Static Data  15

15.4.1         Data Known at Link Time  15

15.5             Data Known at Load Time  15

15.5.1         Data Known at Run Time  15

16                Defining Properties  15

17                Defining Events  15

18                Exception Handling  15

18.1             Protected Blocks  15

18.2             Handler Blocks  15

18.3             Catch  15

18.4             Filter 15

18.5             Finally  15

18.6             Fault Handler 15

19                Declarative Security  15

20                Custom Attributes  15

20.1             CLS Conventions: Custom Attribute Usage  15

20.2             Attributes Used by the CLI 15

20.2.1         Pseudo Custom Attributes  15

20.2.2         Custom Attributes Defined by the CLS  15

20.2.3         Custom Attributes for CIL-to-Native-Code Compiler and Debugger 15

20.2.4         Custom Attributes for Remoting  15

20.2.5         Custom Attributes for Security  15

20.2.6         Custom Attributes for TLS  15

20.2.7         Pseudo Custom Attributes for the Assembly Linker 15

20.2.8         Custom Attributes Provided for Interoperation with Unmanaged Code  15

20.2.9         Custom Attributes, Various  15

21                Metadata Logical Format: Tables  15

21.1             Metadata Validation Rules  15

21.2             Assembly : 0x20  15

21.3             AssemblyOS : 0x22  15

21.4             AssemblyProcessor : 0x21  15

21.5             AssemblyRef : 0x23  15

21.6             AssemblyRefOS : 0x25  15

21.7             AssemblyRefProcessor : 0x24  15

21.8             ClassLayout : 0x0F  15

21.9             Constant : 0x0B  15

21.10           CustomAttribute : 0x0C  15

21.11           DeclSecurity : 0x0E  15

21.12           EventMap : 0x12  15

21.13           Event : 0x14  15

21.14           ExportedType : 0x27  15

21.15           Field : 0x04  15

21.16           FieldLayout : 0x10  15

21.17           FieldMarshal : 0x0D   15

21.18           FieldRVA : 0x1D   15

21.19           File : 0x26  15

21.20           ImplMap : 0x1C  15

21.21           InterfaceImpl : 0x09  15

21.22           ManifestResource : 0x28  15

21.23           MemberRef : 0x0A   15

21.24           Method : 0x06  15

21.25           MethodImpl : 0x19  15

21.26           MethodSemantics : 0x18  15

21.27           Module : 0x00  15

21.28           ModuleRef : 0x1A   15

21.29           NestedClass : 0x29  15

21.30           Param : 0x08  15

21.31           Property : 0x17  15

21.32           PropertyMap : 0x15  15

21.33           StandAloneSig : 0x11  15

21.34           TypeDef : 0x02  15

21.35           TypeRef : 0x01  15

21.36           TypeSpec : 0x1B  15

22                Metadata Logical Format: Other Structures  15

22.1             Bitmasks and Flags  15

22.1.1         Values for AssemblyHashAlgorithm   15

22.1.2         Values for AssemblyFlags  15

22.1.3         Values for Culture  15

22.1.4         Flags for Events [EventAttributes] 15

22.1.5         Flags for Fields [FieldAttributes] 15

22.1.6         Flags for Files [FileAttributes] 15

22.1.7         Flags for ImplMap [PInvokeAttributes] 15

22.1.8         Flags for ManifestResource [ManifestResourceAttributes] 15

22.1.9         Flags for Methods [MethodAttributes] 15

22.1.10       Flags for Methods [MethodImplAttributes] 15

22.1.11       Flags for MethodSemantics [MethodSemanticsAttributes] 15

22.1.12       Flags for Params [ParamAttributes] 15

22.1.13       Flags for Properties [PropertyAttributes] 15

22.1.14       Flags for Types [TypeAttributes] 15

22.1.15       Element Types used in Signatures  15

22.2             Blobs and Signatures  15

22.2.1         MethodDefSig  15

22.2.2         MethodRefSig  15

22.2.3         StandAloneMethodSig  15

22.2.4         FieldSig  15

22.2.5         PropertySig  15

22.2.6         LocalVarSig  15

22.2.7         CustomMod  15

22.2.8         TypeDefOrRefEncoded  15

22.2.9         Constraint 15

22.2.10       Param   15

22.2.11       RetType  15

22.2.12       Type  15

22.2.13       ArrayShape  15

22.2.14       TypeSpec  15

22.2.15       Short Form Signatures  15

22.3             Custom Attributes  15

22.4             Marshalling Descriptors  15

23                Metadata Physical Layout 15

23.1             Fixed Fields  15

23.2             File Headers  15

23.2.1         Metadata root 15

23.2.2         Stream Header 15

23.2.3         #Strings heap  15

23.2.4         #US and #Blob heaps  15

23.2.5         #GUID heap  15

23.2.6         #~ stream   15

23.2.7         Coded Indexes  15

24                File Format Extensions to PE  15

24.1             Structure of the Runtime File Format 15

24.2             PE Headers  15

24.2.1         MS-DOS Header 15

24.2.2         PE File Header 15

24.2.3         PE Optional Header 15

24.3             Section Headers  15

24.3.1         Import Table and Import Address Table (IAT) 15

24.3.2         Relocations  15

24.3.3         CLI Header 15

24.4             Common Intermediate Language Physical Layout 15

24.4.1         Method Header Type Values  15

24.4.2         Tiny Format 15

24.4.3         Fat Format 15

24.4.4         Flags for Method Headers  15

24.4.5         Method Data Section  15

24.4.6         Exception Handling Clauses  15

 


1         Scope

Partition I_alink=Partition_I of the Common Language Infrastructure (CLI) describes the overall architecture of the CLI, and provides the normative description of the Common Type System (CTS), the Virtual Execution System (VES), and the Common Language Specification (CLS).  It also provides a non-normative description of the metadata and a comprehensive set of abbreviations, acronyms (Partition I_alink=Partition_I) and definitions, included by reference (Partition I_alink=Partition_I) from all other Partitions.

Partition II (this specification) provides the normative description of the metadata: its physical layout (as a file format), its logical contents (as a set of tables and their relationships), and its semantics (as seen from a hypothetical assembler, ilasm).

2         Overview

This document focuses on the structure and semantics of metadata. The semantics of metadata, which dictate much of the operation of the VES, are described using the syntax of ilasm, an assembler language for CIL.  The ilasm syntax itself is considered a normative part of this ECMA standard.  This constitutes Chapters 5_5_General_Syntax through 20_20_Custom_Attributes. A complete syntax for ilasm is included in Partition V_alink=Partition_V. The structure (both logical and physical) is covered in Chapters 21_21_Metedata_Logical_Format_Tables through 24_24_File_Format_Extensions_to_PE.

Rationale: An assembly language is really just syntax for specifying the metadata in a file and the CIL instructions in that file.   Specifying ilasm provides a means of interchanging programs written directly for the CLI without the use of a higher-level language and also provides a convenient way to express examples.

The semantics of the metadata also can be described independently of the actual format in which the metadata is stored.  This point is important because the storage format as specified Chapters 21_21_Metedata_Logical_Format_Tables through 24_24_File_Format_Extensions_to_PE is engineered to be efficient for both storage space and access time but this comes at the cost of the simplicity desirable for describing its semantics.

3         Validation and Verification

Validation refers to a set of tests that can be performed on any file to check that the file format, metadata, and CIL are self-consistent. These tests are intended to ensure that the file conforms to the mandatory requirements of this specification.  The behavior of conforming implementations of the CLI when presented with non-conforming files is unspecified.

Verification refers to a check of both CIL and its related metadata to ensure that the CIL code sequences do not permit any access to memory outside the program’s logical address space. In conjunction with the validation tests, verification ensures that the program cannot access memory or other resources to which it is not granted access.

Partition III_alink=Partition_III specifies the rules for both valid and verifiable use of CIL instructions.  Partition III_alink=Partition_III also provides an informative description of rules for validating the internal consistency of metadata (the rules follow, albeit indirectly, from the specification in this Partition) as well as containing a normative description of the verification algorithm.  A mathematical proof of soundness of the underlying type system is possible, and provides the basis for the verification requirements.  Aside from these rules this standard does not specify:

·              at what time (if ever) such an algorithm should be performed

·              what a conforming implementation should do in case of failure of verification. 

The following graph makes this relationship clearer (see next paragraph for a description):

Figure 1: Relationship between valid and verifiable CIL

In the above figure, the outer circle contains all code permitted by the ilasm syntax. The next circle represents all code that is valid CIL. The dotted inner circle represents all type safe code.  Finally, the black innermost circle contains all code that is verifiable.  (The difference between typesafe code and verifiable code is one of provability: code which passes the VES verification algorithm is, by-definition, verifiable; but that simple algorithm rejects certain code, even though a deeper analysis would reveal it as genuinely typesafe).  Note that even if a program follows the syntax described in Partition V_alink=Partition_V, the code may still not be valid, because valid code shall adhere to restrictions presented in this document and in Partition III_alink=Partition_III.

Verification is a very stringent test. There are many programs that will pass validation but will fail verification. The VES cannot guarantee that these programs do not access memory or resources to which they are not granted access. Nonetheless, they may have been correctly constructed so that they do not access these resources. It is thus a matter of trust, rather than mathematical proof, whether it is safe to run these programs. A conforming implementation of the CLI may allow unverifiable code (valid code that does not pass verification) to be executed, although this may be subject to administrative trust controls that are not part of this standard.  A conforming implementation of the CLI shall allow the execution of verifiable code, although this may be subject to additional implementation-specified trust controls.


4         Introductory Examples

This section and its subsections contain only informative text.

Before diving into the details, it is useful to see an introductory sample program to get a feeling for the ilasm assembly language. The next section shows the famous Hello World program, this time in the ilasm assembly language.

4.1         Hello World Example

This section gives a simple example to illustrate the general feel of ilasm. Below is code that prints the well known “Hello world!” salutation. The salutation is written by calling WriteLine, a static method found in the class System.Console that is part of the assembly mscorlib (see Partition IV_alink=Partition_IV).

Example (informative):

.assembly extern mscorlib {}

.assembly hello {}

.method static public void main() cil managed

{ .entrypoint

  .maxstack 1

  ldstr "Hello world!"

  call void [mscorlib]System.Console::WriteLine(class System.String)

  ret

}

The .assembly extern declaration references an external assembly, mscorlib, which defines System.Console. The .assembly declaration in the second line declares the name of the assembly for this program.  (Assemblies are the deployment unit for executable content for the CLI.)  The .method declaration defines the global method main.   The body of the method is enclosed in braces.  The first line in the body indicates that this method is the entry point for the assembly (.entrypoint), and the second line in the body specifies that it requires at most one stack slot (.maxstack).

The method contains only three instructions. The ldstr instruction pushes the string constant "Hello world!" onto the stack and the call instruction invokes System.Console::WriteLine, passing the string as its only argument (note that string literals in CIL are instances of the standard class System.String). As shown, call instructions shall include the full signature of the called method. Finally, the last instruction returns (ret) from main.

4.2         Examples

This document contains integrated examples for most features of the CLI metadata. Many sections conclude with an example showing a typical use of the feature. All these examples are written using the ilasm assembly language.  In addition, Partition V_alink=Partition_V contains a longer example of a program written in the ilasm assembly language.  All examples are, of course, informative only.

End informative text


5         General Syntax

This section describes aspects of the ilasm syntax that are common to many parts of the grammar.  The term “ASCII” refers to the American Standard Code for Information Interchange, a standard seven-bit code that was proposed by ANSI in 1963, and finalized in 1968.  The ASCII repertoire of Unicode is the set of 128 Unicode characters from U+0000 to U+007F.

5.1         General Syntax Notation

This document uses a modified form of the BNF syntax notation. The following is a brief summary of this notation.

Bold items are terminals. Items placed in angle brackets (e.g. <int64>) are names of syntax classes and shall be replaced by actual instances of the class. Items placed in square brackets (e.g. [<float>]) are optional, and any item followed by * can appear zero or more times. The character “|” means that the items on either side of it are acceptable. The options are sorted in alphabetical order (to be more specific: in ASCII order, ignoring “<” for syntax classes, and case-insensitive). If a rule starts with an optional term, the optional term is not considered for sorting purposes.

ilasm is a case-sensitive language. All terminals shall be used with the same case as specified in this reference.

Example (informative):

A grammar such as

<top> ::= <int32> | float <float> |

          floats [<float> [, <float>]*] | else <QSTRING>

would consider the following all to be legal:

     12

     float 3

     float –4.3e7

     floats

     floats 2.4

     floats 2.4, 3.7

     else "Something \t weird"

but all of the following to be illegal:

     else 3

     3, 4

     float 4.3, 2.4

     float else

     stuff

5.2         Terminals

The basic syntax classes used in the grammar are used to describe syntactic constraints on the input intended to convey logical restrictions on the information encoded in the metadata.

The syntactic constraints described in this clause are informative only.  The semantic constraints (e.g. “shall be represented in 32 bits”) are normative.

<int32> is either a decimal number or “0x” followed by a hexadecimal number, and shall be represented in 32 bits.

<int64> is either a decimal number or “0x” followed by a hexadecimal number, and shall be represented in 64 bits.

<hexbyte> is a 2-digit hexadecimal number that fits into one byte.

<realnumber> is any syntactic representation for a floating point number that is distinct from that for all other terminal nodes.  In this document, a period (.) is used to separate the integer and fractional parts, and “e” or “E” separates the mantissa from the exponent.  Either (but not both) may be omitted.

Note: A complete assembler may also provide syntax for infinities and NaNs.

<QSTRING> is a string surrounded by double quote (″) marks. Within the quoted string the character “\” can be used as an escape character, with “\t” for a tab character, “\n” for a new line character, or followed by three octal digits in order to insert an arbitrary byte into the string. The “+” operator can be used to concatenate string literals. This way, a long string can be broken across multiple lines by using “+” and a new string on each line. An alternative is using “\” as the last character in a line, in which case the line break is not entered into the generated string. Any white characters (space, line feed, carriage return, and tab) between the “\” and the first character on the next line are ignored. See also examples below.

Note: A complete assembler will need to deal with the full set of issues required to support Unicode encodings, see Partition I_alink=Partition_I (especially CLS Rule 4).

<SQSTRING> is similar to <QSTRING> with the difference that it is surround by single quote (′) marks instead of double quote marks.

<ID> is a contiguous string of characters which starts with either an alphabetic character or one of “_”, “$”, “@” or “?” and is followed by any number of alphanumeric characters or any of “_”, “$”,  “@”, or “?”. An <ID> is used in only two ways:

·              As a label of a CIL instruction

·              As an <id> which can either be an <ID> or an <SQSTRING>, so that special characters can be included.

Example (informative):

The following examples shows breaking of strings:

    ldstr "Hello " + "World " +

    "from CIL!"

and

    ldstr "Hello World\

       \040from CIL!"

become both "Hello World from CIL!".

5.3         Identifiers

Identifiers are used to name entities. Simple identifiers are just equivalent to an <ID>. However, the ilasm syntax allows the use of any identifier that can be formed using the Unicode character set (see Partition I_alink=Partition_I). To achieve this an identifier is placed within single quotation marks. This is summarized in the following grammar.

<id> ::=

  <ID>

   | <SQSTRING>

 

Keywords may only be used as identifiers if they appear in single quotes (see Partition V_alink=Partition_V for a list of all keywords).

Several <id>’s may be combined to form a larger <id>. The <id>’s are separated by a dot (.). An <id> formed in this way is called a <dottedname>.

<dottedname> ::= <id> [. <id>]*

 

Rationale: <dottedname> is provided for convenience, since “.” can be included in an <id> using the <SQSTRING> syntax.  <dottedname> is used in the grammar where “.” is considered a common character (e.g. fully qualified type names)

Implementation Specific (Microsoft)

Names that end with $PST followed by a hexadecimal number have a special meaning. The assembler will automatically truncate the part starting with the $PST. This is in support of compiler-controlled accessibility, see Partition I_alink=Partition_V.  Also, the first release of the CLI limits the length of identifiers;­­ see Chapter 21_21_Metedata_Logical_Format_Tables for details.

 Examples (informative):

The following shows some simple identifiers:

     A         

     Test      

     $Test     

     @Foo?     

     ?_X_

The following shows identifiers in single quotes:

     ′Weird Identifier′      

     ′Odd\102Char′    

     ′Embedded\nReturn′

The following shows dotted names:

     System.Console

     A.B.C

     ′My Project′.′My Component′.′My Name′

5.4         Labels and Lists of Labels

Labels are provided as a programming convenience; they represent a number that is encoded in the metadata.  The value represented by a label is typically an offset in bytes from the beginning of the current method, although the precise encoding differs depending on where in the logical metadata structure or CIL stream the label occurs.  For details of how labels are encoded in the metadata, see Chapters 21_21_Metedata_Logical_Format_Tables through 24_24_File_Format_Extensions_to_PE; for their encoding in CIL instructions see Partition III_alink=Partition_III.

A simple label is a special name that represents an address. Syntactically, a label is equivalent to an <id>. Thus, labels may be also single quoted and may contain Unicode characters.

A list of labels is comma separated, and can be any combination of these simple labels.

<labeloroffset> ::= <id>

<labels> ::= <labeloroffset> [, <labeloroffset>]*

 

Rationale: In a real assembler the syntax for <labeloroffset> might allow the direct specification of a number rather than requiring symbolic labels.

Implementation Specific (Microsoft)

The following syntax is also supported, for round-tripping purposes:

<labeloroffset> ::= <int32> | <label>

ilasm distinguishes between two kinds of labels: code labels and data labels. Code labels are followed by a colon (“:”) and represent the address of an instruction to be executed. Code labels appear before an instruction and they represent the address of the instruction that immediately follows the label. A particular code label name may not be declared more than once in a method.

In contrast to code labels, data labels specify the location of a piece of data and do not include the colon character. The data label may not be used as a code label, and a code label may not be used as a data label. A particular code label name may not be declared more than once in a module.

<codeLabel> ::= <id> :

<dataLabel> ::= <id>

 

Example (informative):

The following defines a code label, ldstr_label, that represents the address of the ldstr instruction:

ldstr_label:    ldstr  "A label"

5.5         Lists of Hex Bytes

A list of bytes consists simply of one or more hex bytes. Hex bytes are pairs of characters 0 – 9, a – f, and A – F.

<bytes> ::= <hexbyte> [<hexbyte>*]

 

5.6         Floating point numbers

There are two different ways to specify a floating-point number:

1.             Use the dot (“.”) for the decimal point and “e” or “E” in front of the exponent. Both the decimal point and the exponent are optional.

2.             Indicate that the floating-point value is derived from an integer using the keyword float32 or float64 and indicating the integer in parentheses.

<float64> ::=

  float32 ( <int32> )

| float64 ( <int64> )

| <realnumber>

 

Example (informative):

5.5

1.1e10

float64(128)    // note: this converts the integer 128 to its fp value

5.7         Source Line Information

The metadata does not encode information about the lexical scope of variables or the mapping from source line numbers to CIL instructions. Nonetheless, it is useful to specify an assembler syntax for providing this information for use in creating alternate encodings of the information.

Implementation Specific (Microsoft)

Source line information is stored in the PDB (Portable Debug) file associated with each module.

.line takes a line number, and optional column number (preceded by a colon) and single quoted string that specifies the name of the file the line number is referring to

<externSourceDecl> ::= .line <int32> [ : <int32> ] [<SQSTRING>]

 

Implementation Specific (Microsoft)

For compatibility reasons, ilasm allows the following:

<externSourceDecl> ::= … | #line <int32> <QSTRING>

Notice that this requires the file name and that it shall be double quoted, not single quoted as with .line

5.8         File Names

Some grammar elements require that a file name be supplied. A file name is like any other name where “.” is considered a normal constituent character. The specific syntax for file names follows the specifications of the underlying operating system.

<filename> ::=

Section

  <dottedname>

5.3_5.3_Identifiers

 

5.9         Attributes and Metadata

Attributes of types and their members attach descriptive information to their definition. The most common attributes are predefined and have a specific encoding in the metadata associated with them (see Chapter 22_22_Metadata_Logical_Format:_Other_Structures).  In addition, the metadata provides a way of attaching user-defined attributes to metadata, using several different encodings.

From a syntactic point of view, there are several ways for specifying attributes in ilasm:

·              Using special syntax built into ilasm. For example the keyword private in a <classAttr> specifies that the visibility attribute on a type should be set to allow access only within the defining assembly.

·              Using a general-purpose syntax in ilasm.  The non-terminal <customDecl> describes this grammar (see Chapter 20_20_Custom_Attributes). For some attributes, called pseudo-custom attributes, this grammar actually results in setting special encodings within the metadata (see clause 20.2.1_20.2.1_Pseudo_Custom_Attributes).

·              Some attributes are required to be set based on the settings of other attributes or information within the metadata and are not visible from the syntax of ilasm at all.  These attributes, called hidden attributes

·              Security attributes are treated specially.  There is special syntax in ilasm that allows the XML representing security attributes to be described directly (see Chapter 19_19_Declarative_Security).  While all other attributes defined either in the standard library or by user-provided extension are encoded in the metadata using one common mechanism described in Section 21.10_21.9_CustomAttribute_:_0x0C, security attributes (distinguished by the fact that they inherit, directly or indirectly from System.Security.Permissions.SecurityAttribute, see Partition IV_alink=Partition_IV) shall be encoded as described in Section 21.11_21.10_DeclSecurity_:_0x0E.

5.10      ilasm Source Files

An input to ilasm is a sequence of declarations, defined as follows:

<ILFile> ::=

Reference

 <decl>*

5.10_5.10_ilasm_source_files

 

The complete grammar for a top level declaration is shown below. The following sections will concentrate on the various parts of this grammar.

<decl> ::=

Reference

  .assembly <dottedname> { <asmDecl>* }

6.1

| .assembly extern <dottedname> { <asmRefDecl>* }

6.3

| .class <classHead> { <classMember>* }

9

| .class extern <exportAttr> <dottedname> { <externClassDecl>* }

6.7

| .corflags <int32>

6.1

| .custom <customDecl>

20

| .data <datadecl>

15.3.1

| .field <fieldDecl>

15

| .file [nometadata] <filename> [.hash = ( <bytes> )]
        [.entrypoint ]

6.2.3

| .mresource [public | private] <dottedname>
             [( <QSTRING> )] { <manResDecl>* }

6.2.2

| .method <methodHead> { <methodBodyItem>* }

14

| .module [<filename>]

6.4

| .module extern <filename>

6.5

| .subsystem <int32>

6.2

| .vtfixup <vtfixupDecl>

14.5.1

| <externSourceDecl>

5.7

| <securityDecl>

18

 

Implementation Specific (Microsoft)

The grammar for declarations also includes the following.  These are described in a separate product specification.

Implementation Specific (Microsoft)

<decl> ::=

Reference

.file alignment <int32>

 

| .imagebase <int64>

 

| .language <languageDecl>

 

| .namespace <id>

 

|

 

 

6         Assemblies, Manifests and Modules

Assemblies and modules are grouping constructs, each playing a different role in the CLI.

An assembly is a set of one or more files deployed as a unit.  An assembly always contains a manifest that specifies (see Section 6.1):

·              Version, name, culture, and security requirements for the assembly.

·              Which other files, if any, belong to the assembly along with a cryptographic hash of each file.  The manifest itself resides in the metadata part of a file and that file is always part of the assembly.

·              Which of the types defined in other files of the assembly are to be exported from the assembly.  Types defined in the same file as the manifest are exported based on attributes of the type itself.

·              Optionally, a digital signature for the manifest itself and the public key used to compute it.

A module is a single file containing executable content in the format specified here.  If the module contains a manifest then it also specifies the modules (including itself) that constitute the assembly.  An assembly shall contain only one manifest amongst all its constituent files. For an assembly to be executed (rather than dynamically loaded) the manifest shall reside in the module that contains the entry point.

While some programming languages introduce the concept of a namespace, there is no support in the CLI for this concept.  Type names are always specified by their full name relative to the assembly in which they are defined.


6.1         Overview of Modules, Assemblies, and Files

This section contains informative text only.

The following picture should clarify the various forms of references:

Figure 2: References

Eight files are shown in the picture. The name of each file is shown below the file. Files that declare a module have an additional border around them and have names beginning with M. The other two files have a name beginning with F. These files may be resource files, like bitmaps, or other files that do not contain CIL code.

Files M1 and M4 declare an assembly in addition to the module declaration, namely assemblies A and B, respectively. The assembly declaration in M1 and M4 references other modules, shown with straight lines. Assembly A references M2 and M3. Assembly B references M3 and M5. Thus, both assemblies reference M3.

Usually, a module belongs only to one assembly, but it is possible to share it across assemblies. When Assembly A is loaded at runtime, an instance of M3 will be loaded for it. When Assembly B is loaded into the same application domain, possibly simultaneously with Assembly A, M3 will be shared for both assemblies. Both assemblies also reference F2, for which similar rules apply.

The module M2 references F1, shown by dotted lines. As a consequence F1 will be loaded as part of Assembly A, when A is executed. Thus, the file reference shall also appear with the assembly declaration. Similarly, M5 references another module, M6, which becomes part of B when B is executed. It follows, that assembly B shall also have a module reference to M6.

End informative text


6.2         Defining an Assembly

An assembly is specified as a module that contains a manifest in the metadata; see Section 21.2.  The information for the manifest is created from the following portions of the grammar: 

<decl> ::=

Section

  .assembly <dottedname> { <asmDecl>* }

6.2

| .assembly extern <dottedname> { <asmRefDecl>* }

6.3

| .corflags <int32>

6.2

| .file [nometadata] <filename> .hash = ( <bytes> )
        [.entrypoint ]

6.2.3

| .module extern <filename>

6.5

| .mresource [public | private] <dottedname>

            [( <QSTRING> )] { <manResDecl>* }

6.2.2

| .subsystem <int32>

6.2

| …

 

 

The .assembly directive declares the manifest and specifies to which assembly the current module belongs. A module shall contain at most one .assembly directive. The <dottedname> specifies the name of the assembly.

Note: Since some platforms treat names in a case insensitive manner, two assemblies that have names that differ only in case should not be declared.

The .corflags directive sets a field in the CLI header of the output PE file (see clause 24.3.3.1).  A conforming  implementation of the CLI shall expect it to be 1.  For backwards compatibility, the three least significant bits are reserved.  Future versions of this standard may provide definitions for values between 8 and 65,535. Experimental and non-standard uses should thus use values greater than 65,535.

The .subsystem directive is used only when the assembly is directly executed (as opposed to used as a library for another program).  It specifies the kind of application environment required for the program, by storing the specified value in the PE file header (see clause 24.2.2). While a full 32 bit integer may be supplied, a conforming implementation of the CLI need only respect two possible values:

If the value is 2, the program should be run using whatever conventions are appropriate for an application that has a graphical user interface.

If the value is 3, the program should be run using whatever conventions are appropriate for an application that has a direct console attached.

Implementation Specific (Microsoft)

<decl> ::= … | .file alignment <int32> | .imagebase <int64>

The .file alignment directive sets the file alignment field in the PE header of the output file.  Legal values are multiples of 512.  (Different sections of the PE file are aligned, on disk, at the specified value (in bytes))

The .imagebase directive sets the imagebase field in the PE header of the output file.  This value specifies the virtual address at which this PE file will be loaded into the process.

See clause 24.2.3.2

 

Example (informative):

.assembly CountDown

{ .hash algorithm 32772

  .ver 1:0:0:0

}

.file Counter.dll .hash = (BA D9 7D 77 31 1C 85 4C 26 9C 49 E7 02 BE E7 52 3A CB 17 AF)

6.2.1         Information about the Assembly (<asmDecl>)

The following grammar shows the information that can be specified about an assembly. 

<asmDecl> ::=

Description

Section

  .custom <customDecl>

Custom attributes

20

  .hash algorithm <int32>

Hash algorithm used in the .file directive

6.2.1.1

| .culture <QSTRING>

Culture for which this assembly is built

6.2.1.2

| .publickey = ( <bytes> )

The originator's public key.

6.2.1.3

| .ver <int32> : <int32> : <int32> : <int32>

Major version, minor version, revision, and build

6.2.1.4

| <securityDecl>

Permissions needed, desired, or prohibited

19

 

6.2.1.1          Hash Algorithm

<asmDecl> ::= .hash algorithm <int32> | …

 

When an assembly consists of more than one file (see clause 6.2.3), the manifest for the assembly specifies both the name of the file and the cryptographic hash of the contents of the file.  The algorithm used to compute the hash can be specified, and shall be the same for all files included in the assembly.  All values are reserved for future use, and conforming implementations of the CLI shall use the SHA1(see Partition I_alink=Partition_I)  hash function and shall specify this algorithm by using a value of 32772 (0x8004).

Rationale: SHA1 was chosen as the best widely available technology at the time of standardization (see Partition I_alink=Partition_I).   A single algorithm is chosen since all conforming implementations of the CLI would be required to implement all algorithms to ensure portability of executable images.

6.2.1.2          Culture

<asmDecl> ::= .culture <QSTRING> | …

 

When present, this indicates that the assembly has been customized for a specific culture.  The strings that shall be used here are those specified in Partition IV_alink=Partition_IV as acceptable with the class  System.Globalization.CultureInfo. When used for comparison between an assembly reference and an assembly definition these strings shall be compared in a case insensitive manner.

Implementation Specific (Microsoft)

The product version of ilasm and ildasm use .locale rather than .culture.

Note: The culture names follow the IETF RFC1766 names. The format is “<language>-<country/region>”, where <language> is a lowercase two-letter code in ISO 639-1. <country/region> is an uppercase two-letter code in ISO 3166

6.2.1.3          Originator’s Public Key

<asmDecl> ::= .publickey = ( <bytes> ) | …

 

The CLI metadata allows the producer of an assembly to compute a cryptographic hash of the assembly (using the SHA1 hash function) and then encrypt it using the RSA algorithm (see Partition I_alink=Partition_I) and a public/private key pair of the producer’s choosing.  The results of this (an “SHA1/RSA digital signature”) can then be stored in the metadata along with the public part of the key pair required by the RSA algorithm.  The .publickey directive is used to specify the public key that was used to compute the signature.  To calculate the hash, the signature is zeroed, the hash calculated, then the result stored into the signature.

A reference to an assembly (see Section 6.3) captures some of this information at compile time.  At runtime, the information contained in the assembly reference can be combined with the information from the manifest of the assembly located at runtime to ensure that the same private key was used to create both the assembly seen when the reference was created (compile time) and when it is resolved (runtime).

6.2.1.4          Version Numbers

<asmDecl> ::= .ver <int32> : <int32> : <int32> : <int32> | …

 

The version number of the assembly, specified as four 32-bit integers.  This version number shall be captured at compile time and used as part of all references to the assembly within the compiled module.  This standard places no other requirement on the use of the version numbers.

Note: A conforming implementation may ignore version numbers entirely, or it may require that they match precisely when binding a reference, or any other behavior deemed appropriate.  By convention:

the first of these is considered the major version number and assemblies with the same name but different major versions are not interchangeable.  This would be appropriate, for example, for a major rewrite of a product where backwards compatibility cannot be assumed.

the second of these is considered the minor version number and assemblies with the same name and major version but different minor versions indicate significant enhancements but with intention to be backward compatible.  This would be appropriate, for example, on a “point release” of a product or a fully backward compatible new version of a product.

the third of these is considered the revision number and assemblies with the same name, major and minor version number but different revisions are intended to be fully interchangeable. This would be appropriate, for example, to fix a security hole in a previously released assembly.

the fourth of these is considered the build number and assemblies that differ only by build number are intended to represent a recompilation from the same source. This would be appropriate, for example,because of processor, platform, or compiler changes.

6.2.2         Manifest Resources

A manifest resource is simply a named item of data associated with an assembly. A manifest resource is introduced using the .mresource directive, which adds the manifest resource to the assembly manifest begun by a preceding .assembly declaration.

<decl> ::=

Section

  .mresource [public | private] <dottedname>

               { <manResDecl>* }

 

| …

5.10

 

If the manifest resource is declared public it is exported from the assembly. If it is declared private it is not exported and hence only available from within the assembly. The <dottedname> is the name of the resource, and the optional quoted string is a description of the resource.

<manResDecl> ::=

Description

Section

  .assembly extern <dottedname>

Manifest resource is in external assembly with name <dottedname>.

6.3

| .custom <customDecl>

Custom attribute.

20

| .file <dottedname> at <int32>

Manifest resource is in file <dottedname> at byte offset <int32>.

 

 

For a resource stored in a file that is not a module (for example, an attached text file), the file shall be declared in the manifest using a separate (top-level) .file declaration (see clause 6.2.3) and the byte offset shall be zero  Similarly, a resource that is defined in another assembly is referenced using .assembly extern which requires that the assembly has been defined in a separate (top-level) .assembly extern directive (see Section 6.3).

6.2.3         Files in the Assembly

Assemblies may be associated with other files, e.g. documentation and other files that are used during execution. The declaration .file is used to add a reference to such a file to the manifest of the assembly:  (See Section 21.19)

<decl> ::=

Section

  .file [nometadata] <filename> .hash = ( <bytes> ) [.entrypoint]

 

| …

5.10

 

The attribute nometadata is specified if the file is not a module according to this specification.  Files that are marked as nometadata may have any format; they are considered pure data files.

The <bytes> after the .hash specify a hash value computed for the file. The VES shall recompute this hash value prior to accessing this file and shall generate an exception if it does not match. The algorithm used to calculate this hash value is specified with .hash algorithm (see clause 6.2.1.1).

If specified, the .entrypoint directive indicates that the entrypoint of a multi-module assembly is contained in this file.

Implementation Specific (Microsoft)

If the hash value is not specified, it will be automatically computed by the assembly linker al when an assembly file is created using al. Even though the hash value is optional in the grammar for ilasm, it is required at runtime.

6.3         Referencing Assemblies

<asmRefDecl> ::= .assembly extern <dottedname> [ as <dottedname> ]
                 { <asmRefDecl>* }

 

An assembly mediates all accesses from the files that it contains to other assemblies.  This is done through the metadata by requiring that the manifest for the executing assembly contain a declaration for any assembly referenced by the executing code.  The syntax .assembly extern as a top-level declaration is used for this purpose.  The optional as clause provides an alias which allows ilasm  to address external assemblies that have the same name, but differing in version, culture, etc.

The dotted name used in .assembly extern shall exactly match the name of the assembly as declared with .assembly directive in a case sensitive manner.  (So, even though an assembly might be stored within a file, within a filesystem that is case-blind, the names stored internally within metadata are case-sensitive, and shall match exactly.)

Implementation Specific (Microsoft)

The assembly mscorlib contains many of the types and methods in the Base Class Library.  For convenience, ilasm automatically inserts a .assembly extern mscorlib declaration if required

<asmRefDecl> ::=

Description

Section

  .hash = ( <bytes> )

Hash of referenced assembly

6.2.3

| .custom <customDecl>

Custom attributes

20

| .culture <QSTRING>

Culture of the referenced assembly

6.2.1.2

| .publickeytoken = ( <bytes> )

The low 8 bytes of the SHA1 hash of the originator's public key.

6.3

| .publickey = ( <bytes> )

The originator’s full public key

6.2.1.3

| .ver <int32> : <int32> : <int32> : <int32>

Major version, minor version, revision, and build

6.2.1.4

 

These declarations are the same as those for .assembly declarations (clause 6.2.1), except for the addition of .publickeytokenThis declaration is used to store the low 8 bytes of the SHA1 hash of the originator’s public key in the assembly reference, rather than the full public key.

An assembly reference can store either a full public key or an 8 byte “publickeytoken.” Either can be used to validate that the same private key used to sign the assembly at compile time signed the assembly used at runtime. Neither is required to be present, and while both can be stored this is not useful.

A conforming implementation of the CLI need not perform this validation, but it is permitted to do so, and it may refuse to load an assembly for which the validation fails.  A conforming implementation of the CLI may also refuse to permit access to an assembly unless the assembly reference contains either the public key or the public key token.  A conforming implementation of the CLI shall make the same access decision independent of whether a public key or a token is used.

Rationale: The full public key is cryptographically safer, but requires more storage space in the assembly reference.

Example (informative):

.assembly extern MyComponents

{ .publickey = (BB AA BB EE 11 22 33 00)

  .hash = (2A 71 E9 47 F5 15 E6 07 35 E4 CB E3 B4 A1 D3 7F 7F A0 9C 24)

  .ver 2:10:2002:0

}

6.4         Declaring Modules

All CIL files are modules and are referenced by a logical name carried in the metadata rather than their file name.  See Section 21.16.

<decl> ::=

Section

| .module <filename>

 

| …

5.10

 

Example (informative):

.module CountDown.exe

Implementation Specific (Microsoft)

If the .module directive is missing, ilasm will automatically add a .module directive and set the module name to be the file name, including its extension in capital letters. e.g., if the file is called foo and compiled into an exe, the module name will become “Foo.EXE”.  

Note that ilasm also generates a required GUID to uniquely identify this instance of the module and emits that into the Mvid metadata field: see clause 21.27.

6.5         Referencing Modules

When an item is in the current assembly but part of a different module than the one containing the manifest, the defining module shall be declared in the manifest of the assembly using the .module extern directive.  The name used in the .module extern directive of the referencing assembly shall exactly match the name used in the .module directive (see Section 6.4) of the defining module.  See Section 21.28.

<decl> ::=

Section

| .module extern <filename>

 

| …

5.10

 

Example (informative):

.module extern Counter.dll

6.6         Declarations inside a Module or Assembly

Declarations inside a module or assembly are specified by the following grammar. More information on each option can be found in the corresponding section.

<decl> ::=

Section

| .class <classHead> { <classMember>* }

9

| .custom <customDecl>

20

| .data <datadecl>

15.3.1

| .field <fieldDecl>

15

| .method <methodHead> { <methodBodyItem>* }

14

| <externSourceDecl>

5.7

| <securityDecl>

18

| …

 

 

6.7         Exported Type Definitions

The manifest module, of which there can only be one per assembly, includes the .assembly statement.  To export a type defined in any other module of an assembly requires an entry in the assembly’s manifest.  The following grammar is used to construct such an entry in the manifest:

<decl> ::=

Section

  .class extern <exportAttr> <dottedname> { <externClassDecl>* }

 

 

<externClassDecl> ::=

Section

  .file <dottedname>

| .class extern <dottedname>

| .custom <customDecl>

 

 

20

 

The <exportAttr> value shall be either public or nested public and shall match the visibility of the type.

For example, suppose an assembly consists of two modules A.EXE and B.DLL.  A.EXE contains the manifest.  A public class “Foo” is defined in B.DLL.  In order to export it – that is, to  make it visible by, and usable from, other assemblies –a .class extern statement shall be included in A.EXE.

Conversely, a public class “Bar” defined in A.EXE does not need any .class extern statement.

Rationale: Tools should be able to retrieve a single module, the manifest module, to determine the complete set types defined by the assembly.  Therefore, information from other modules within the assembly is replicated in the manifest module.  By convention, the manifest module is also known as the assembly.

7         Types and Signatures

The metadata provides mechanisms to both define types and reference types. Chapter 9 describes the metadata associated with a type definition, regardless of whether the type is an interface, class or a value type.

The mechanism used to reference types is divided into two parts. The first is the creation of a logical description of user-defined types that are referenced but (typically) not defined in the current module.  These are stored in a logical table in the metadata (see Section 21.35).

The second is a signature that encodes one or more type references, along with a variety of modifiers.  The grammar non-terminal <type> describes an individual entry in a signature.  The encoding of a signature is specified in Section 22.1.15.n cn

7.1         Types

The following grammar completely specifies all built-in types including pointer types of the CLI system. It also shows the syntax for user defined types that can be defined in the CLI system:

<type> ::=

Description

Section

  bool

Boolean

7.2

| boxed <typeReference>

Boxed user-defined value type

 

| char

16-bit Unicode code point

7.2

| class <typeReference>

User defined reference type.

7.3

| float32

32-bit floating point number

7.2

| float64

64-bit floating point number

7.2

| int8

Signed 8-bit integer

7.2

| int16

Signed 16-bit integer

7.2

| int32

Signed 32-bit integer

7.2

| int64

Signed 64-bit integer

7.2

| method <callConv> <type> *

         ( <parameters> )

Method pointer

13.5

| native int

Signed integer whose size varies depending on platform (32- or 64-bit)

7.2

| native unsigned int

Unsigned integer whose size varies depending on platform (32- or 64-bit)

7.2

| object

See System.Object in Partition IV_alink=Partition_IV

 

| string

See System.String in Partition IV_alink=Partition_IV

 

| <type> &

Managed pointer to <type>. <type> shall not be a managed pointer type or typedref

13.4

| <type> *

Unmanaged pointer to <type>

13.4

| <type> [ [<bound> [,<bound>]*] ]

Array of <type> with optional rank (number of dimensions) and bounds.

13.1and 13.2

| <type> modopt ( <typeReference> )

Custom modifier that may be ignored by the caller.

7.1.1

| <type> modreq ( <typeReference> )

Custom modifier that the caller shall understand.

7.1.1

| <type> pinned

For local variables only. The garbage collector shall not move the referenced value.

7.1.2

| typedref

Typed reference, created by mkrefany and used by refanytype or refanyval.

7.2

| valuetype <typeReference>

User defined value type (unboxed)

12

| unsigned int8

Unsigned 8-bit integers

7.2

| unsigned int16

Unsigned 16-bit integers

7.2

| unsigned int32

Unsigned 32-bit integers

7.2

| unsigned int64

Unsigned 64-bit integers

7.2

| void

No type.  Only allowed as a return type or as part of void *

7.2

 

In several situations the grammar permits the use of a slightly simpler mechanism for specifying types, by just allowing type names (e.g. “System.GC”) to be used instead of the full algebra (e.g. “class System.GC”).  These are called type specifications:

<typeSpec> ::=

Section

  [ [.module] <dottedname> ]

7.3

| <typeReference>

7.2

| <type>

7.1

 

7.1.1         modreq and modopt

Custom modifiers, defined using modreq (“required modifier”) and modopt (“optional modifier”), are similar to custom attributes (see Chapter 20) except that modifiers are part of a signature rather than attached to a declaration.  Each modifer associates a type reference with an item in the signature.

The CLI itself shall treat required and optional modifiers in the same manner. Two signatures that differ only by the addition of a custom modifier (required or optional) shall not be considered to match.  Custom modifiers have no other effect on the operation of the VES.

Rationale: The distinction between required and optional modifiers is important to tools other than the CLI that deal with the metadata, typically compilers and program analysers.  A required modifier indicates that there is a special semantics to the modified item that should not be ignored, while an optional modifier can simply be ignored. 

For example, the concept of const in the C programming language can be modelled with an optional modifier since the caller of a method that has a constant parameter need not treat it in any special way.  On the other hand, a parameter that shall be copy constructed in C++ shall be marked with a required custom attribute since it is the caller who makes the copy.

7.1.2         pinned

The signature encoding for pinned shall appear only in signatures that describe local variables (see clause 14.4.1.3). While a method with a pinned local variable is executing the VES shall not relocate the object to which the local refers.  That is, if the implementation of the CLI uses a garbage collector that moves objects, the collector shall not move objects that are referenced by an active pinned local variable.

Rationale: If unmanaged pointers are used to dereference managed objects, these objects shall be pinned.  This happens, for example, when a managed object is passed to a method designed to operate with unmanaged data.

7.2         Built-in Types

The CLI built-in types have corresponding value types defined in the Base Class Library. They shall be referenced in signatures only using their special encodings (i.e. not using the general purpose valuetype <typeReference> syntax).  Partition I_alink=Partition_I specifies the built-in types.

7.3         References to User-defined Types (<typeReference>)

User-defined types are referenced either using their full name and a resolution scope or (if one is available in the same module) a type definition (see Chapter 9).

A <typeReference> is used to capture the full name and resolution scope. 

<typeReference> ::=

  [<resolutionScope>] <dottedname> [/ <dottedname>]*

 

<resolutionScope> ::=

  [ .module <filename> ]

| [ <assemblyRefName> ]

 

<assemblyRefName> ::=

Section

  <dottedname>

5.1

 

The following resolution scopes are specified for un-nested types:

·              Current module (and, hence, assembly).  This is the most common case and is the default if no resolution scope is specified.  The type shall be resolved to a definition only if the definition occurs in the same module as the reference. 

Note: A type reference that refers to a type in the same module and assembly is better represented using a type definition.  Where this is not possible (for example, when referencing a nested type that has compilercontrolled accessibility) or convenient (for example, in some one-pass compilers) a type reference is equivalent and may be used.

·              Different module, current assembly.  The resolution scope shall be a module reference syntactically reprented using the notation [.module <filename>]. The type shall be resolved to a definition only if the referenced module (see Section 6.4) and type (see Section 6.7) have been declared by the current assembly and hence have entries in the assembly’s manifest.  Note that in this case the manifest is not physically stored with the referencing module.

·              Different assembly.  The resolution scope shall be an assembly reference syntactically represented using the notation [<assemblyRefName>]. The referenced assembly shall be declared in the manifest for the current assembly (see Section 6.3), the type shall be declared in the referenced assembly’s manifest, and the type shall be marked as exported from that assembly (see section 6.7 and clause 9.1.1).

·              For nested types, the resolution scope is always the enclosing type.  (See Section 9.6).  This is indicated syntactically by using a slash (“/”) to separate the enclosing type name from the nested type’s name

Example (informative):

The proper way to refer to a type defined in the base class library. The name of the type is System.Console and it is found in the assembly named mscorlib.

     .assembly extern mscorlib { }

     .class [mscorlib]System.Console

 

A reference to the type named C.D in the module named x in the current assembly.

     .module extern x

     .class [.module x]C.D

 

A reference to the type named C nested inside of the type named Foo.Bar in another assembly, named MyAssembly.

     .assembly extern MyAssembly { }

     .class [MyAssembly]Foo.Bar/C

7.4         Native Data Types

Some implementations of the CLI will be hosted on top of existing operating systems or runtime platforms that specify data types required to perform certain functions.  The metadata allows interaction with these native data types by specifying how the built-in and user-defined types of the CLI are to be marshalled to and from native data types.  This marshalling information can be specified (using the keyword marshal) for

·              the return type of a method, indicating that a native data type is actually returned and shall be marshalled back into the specified CLI data type

·              a parameter to a method, indicating that the CLI data type provided by the caller shall be marshalled into the specified native data type (if the parameter is passed by reference the updated value shall be marshalled back from the native data type into the CLI data type when the call is completed)

·              a field of a user-defined type, indicating that any attempt to pass the object in which it occurs to platform methods shall make a copy of the object, replacing the field by the specified native data type (if the object is passed by reference then the updated value shall be marshalled back when the call is completed)

The following table lists all native types supported by the CLI and provides a description for each of them.  A more complete description can be found in Partition IV_alink=Partition_IV in the definition of the enum System.Runtime.Interopservices.UnmanagedType, which provides the actual values used to encode the types.  All encoding values from 0 through 63 are reserved for backward compatibility with existing implementations of the CLI.  Values 64 through 127 are reserved for future use in this and related Standards.

<nativeType> ::=

Description

Name in
class library

 [ ]

Native array. Type and size are determined at runtime from the actual marshaled array.

LPArray

| bool

Boolean. 4-byte integer value where a non-zero value represents TRUE and 0 represents FALSE.

Bool

| float32

32-bit floating point number.

FLOAT32

| float64

64-bit floating point number.

FLOAT64

| [unsigned] int

Signed or unsigned integer, sized to hold a pointer on the platform

SysUInt or SysInt

| [unsigned] int8

Signed or unsigned 8-bit integer

unsigned int8 or int8

| [unsigned] int16

Signed or unsigned 16-bit integer

unsigned int16 or int16

| [unsigned] int32

Signed or unsigned 32-bit integer

unsigned int32 or int32

| [unsigned] int64

Signed or unsigned 64-bit integer

unsigned int64 or int64

| lpstr

A pointer to a null terminated array of ANSI characters. Code page is implementation specific.

LPStr

| lptstr

A pointer to a null terminated array of platform characters (ANSI or Unicode).  Code page and character encoding are implementation specific.

LPTStr

| lpvoid

An untyped pointer, platform specifies size.

LPVoid

| lpwstr

A pointer to a null terminated array of Unicode characters.  Character encoding is implementation specific.

LPWStr

| method

A function pointer.

FunctionPtr

| <nativeType> [ ]

Array of <nativeType>. The length is determined at runtime by the size of the actual marshaled array.

LPArray

| <nativeType> [ <int32> ]

Array of <nativeType> of length <int32>.

LPArray

| <nativeType>
[ + <int32> ]

Array of <nativeType> with runtime supplied element size. The int32 specifies a parameter to the current method (counting from parameter number 0) that, at runtime, will contain the size of an element of the array in bytes.  Can only be applied to methods, not fields.

LPArray

| <nativeType>
[ <int32> + <int32> ]

Array of <nativeType> with runtime supplied element size. The first int32 specifies the number of elements in the array.  The second int32 specifies which parameter to the current method (counting from parameter number 1) will specify the additional number of elements in the array.   Can only be applied to methods, not fields

LPArray

 

Implementation Specific (Microsoft)

The Microsoft implementation supports a richer set of types to describe marshalling between Windows native types and COM.  These additional options are listed in the following table:

Implementation Specific (Microsoft)

<nativeType> ::=

Description

Name in
class library

| as any

Determines the type of an object at runtime and marshals the Object as that type.

AsAny

| byvalstr

A string in a fixed length buffer.

VBByRefStr

| custom ( <QSTRING>,
  <QSTRING> )

Custom marshaler.  The 1st string is the name of the marshalling class, using the string conventions of Reflection.Emit to specify the assembly and/or module.  The 2nd is an arbitrary string passed to the marshaller at runtime to identify the form of marshalling required.

CustomMarshaler

| fixed array [ <int32> ]

A fixed size array of length <int32> bytes

ByValArray

| fixed sysstring
[ <int32> ]

A fixed size system string of length <int32>.  This can only be applied to fields, and a separate attribute specifies the encoding of the string.

ByValTStr

| lpstruct

A pointer to a C-style structure. Used to marshal managed formatted types.

LPStruct

| struct

A C-style structure, used to marshal managed formatted types.

Struct

 

Example (informative):

.method int32 M1( int32 marshal(int32), bool[] marshal(bool[5]) )

 

Method M1 takes two arguments: an int32, and an array of 5 bools

 

++++++++++

 

.method int32 M2( int32 marshal(int32), bool[] marshal(bool[+1]) )

 

Method M2 takes two arguments: an int32, and an array of bools: the number of elements in that array is given by the value of the first parameter

 

++++++++++

 

.method int32 M3( int32 marshal(int32), bool[] marshal(bool[7+1]) )

 

Method M3 takes two arguments: an int32, and an array of bools: the number of elements in that array is given as 7 plus the value of the first parameter

 

8         Visibility, Accessibility and Hiding

Partition I_alink=Partition_I specifies visibility and accessibility. In addition to these attributes, the metadata stores information about method name hiding. Hiding controls which method names inherited from a base type are available for compile-time name binding.

8.1         Visibility of Top-Level Types and Accessibility of Nested Types

Visibility is attached only to top-level types, and there are only two possibilities: visible to types within the same assembly, or visible to types regardless of assembly. For nested types (i.e. types that are members of another type) the nested type has an accessibility that further refines the set of methods that can reference the type. A nested type may have any of the 7 accessibility modes (see Partition I_alink=Partition_I), but has no direct visibility attribute of its own, using the visibility of its enclosing type instead.

Because the visibility of a top-level type controls the visibility of the names of all of its members, a nested type cannot be more visible than the type in which it is nested. That is, if the enclosing type is visible only within an assembly then a nested type with public accessibility is still only available within the assembly. By contrast, a nested type that has assembly accessibility is restricted to use within the assembly even if the enclosing type is visible outside the assembly.

To make the encoding of all types consistent and compact, the visibility of a top-level type and the accessibility of a nested type are encoded using the same mechanism in the logical model of clause 22.1.14.

8.2         Accessibility

Accessibility is encoded directly in the metadata.  See, for example, clause 21.24.

8.3         Hiding

Hiding is a compile-time concept that applies to individual methods of a type. The CTS specifies two mechanisms for hiding, specified by a single bit:

·              hide-by-name, meaning that the introduction of a name in a given type hides all inherited members of the same kind (method or field) with the same name.

·              hide-by-name-and-sig, meaning that the introduction of a name in a given type hides any inherited member of the same kind but with precisely the same type (for fields) or signature (for methods, properties, and events).

There is no runtime support for hiding.  A conforming implementation of the CLI treats all references as though the names were marked hide-by-name-and-sigCompilers that desire the effect of hide-by-name can do so by marking method definitions with the newslot attribute (see clause 14.4.2.3) and correctly chosing the type used to resolve a method reference  (see clause 14.1.3).

9         Defining Types

Types (i.e., classes, value types, and interfaces) may be defined at the top-level of a module:

<decl> ::=

Section

  .class <classHead> { <classMember>* }

9

| …

 

 

The logical metadata table created by this declaration is specified in Section 21.34.

Rationale: For historical reasons, many of the syntactic classes used for defining types incorrectly use “class” instead of “type” in their name.  All classes are types, but “types” is a broader term encompassing value types, and interfaces.

9.1         Type Header (<classHead>)

A type header consists of

·              any number of type attributes

·              a name (an <id>)

·              a base type (or parent type), which defaults to [mscorlib]System.Object

·              an optional list of interfaces whose contract this type and all its descendent types shall satisfy

<classHead> ::=

  <classAttr>* <id> [extends <typeReference>] [implements <typeReference> [, <typeReference>]*]

 

The extends keyword defines the base type of a type. A type shall extend from exactly one other type. If no type is specified, ilasm will add an extend clause to make the type inherit from System.Object.

The implements keyword defines the interfaces of a type. By listing an interface here, a type declares that all of its concrete implementations will support the contract of that interface, including providing implementations of any virtual methods the interface declares.  See also Chapter 10 and Chapter 11.

Example (informative):

.class private auto autochar CounterTextBox

   extends [System.Windows.Forms]System.Windows.Forms.TextBox

   implements [.module Counter]CountDisplay

{ // body of the class

}

This code declares the class CounterTextBox, which extends the class System.Windows.Forms.TextBox in the assembly System.Windows.Forms and implements the interface CountDisplay in the module Counter of the current assembly. The attributes private, auto and autochar are described in the following sections.

A type can have any number of custom attributes attached.  Custom attributes are attached as described in Chapter 20. The other (predefined) attributes of a type may be grouped into attributes that specify visibility, type layout information, type semantics information, inheritance rules, interoperation information, and information on special handling. The following subsections provide additional information on each group of predefined attributes.

<classAttr> ::=

Description

Section

  abstract

Type is abstract.

9.1.4

| ansi

Marshal strings to platform as ANSI.

9.1.5

| auto

Auto layout of type.

9.1.2

| autochar

Marshal strings to platform based on platform.

9.1.5

| beforefieldinit

Calling static methods  does not initialize type.

9.1.6

| explicit

Layout of fields is provided explicitly.

9.1.2

| interface

Interface declaration.

9.1.3

| nested assembly

Assembly accessibility for nested type.

9.1.1

| nested famandassem

Family and Assembly accessibility for nested type.

9.1.1

| nested family

Family accessibility for nested type.

9.1.1

| nested famorassem

Family or Assembly accessibility for nested type.

9.1.1

| nested private

Private accessibility for nested type.

9.1.1

| nested public

Public accessibility for nested type.

9.1.1

| private

Private visibility of top-level type.

9.1.1

| public

Public visibility of top-level type.

9.1.1

| rtspecialname

Special treatment by runtime.

9.1.6

| sealed

The type cannot be subclassed.

9.1.4

| sequential

The type is laid out sequentially.

9.1.2

| serializable

Type may be serialized.

9.1.6

| specialname

Special treatment by tools.

9.1.6

| unicode

Marshal strings to platform as Unicode.

9.1.5

 

Implementation Specific (Microsoft)

The above grammar also includes

<classAttr> ::= import

to indicate that the type is imported from a COM type library

9.1.1         Visibility and Accessibility Attributes

<classAttr> ::= …

| nested assembly

| nested famandassem

| nested family

| nested famorassem

| nested private

| nested public

| private

| public

 

See Partition I_alink=Partition_I.  A type that is not nested inside another shall have exactly one visibility (private or public) and shall not have an accessiblity.  Nested types shall have no visibility, but instead shall have exactly one of the accessibility attributes (nested assembly, nested famandassem, nested family, nested famorassem, nested private, or nested public). The default visibility for top-level types is private. The default accessibility for nested types is nested private.

9.1.2         Type Layout Attributes

<classAttr> ::= …

| auto

| explicit

| sequential

 

The type layout specifies how the fields of an instance of a type are arranged. A given type shall have only one layout attribute specified.  By convention, ilasm supplies auto if no layout attribute is specified.

auto: the layout shall be done by the CLI, with no user-supplied constraints

explicit: the layout of the fields is explicitly provided (see Section 9.7).

sequential: the CLI shall lay out the fields in sequential order, based on the order of the fields in the logical metadata table (see Section 21.15).

Rationale: The default auto layout should provide the best layout for the platform on which the code is executing.  sequential layout is intended to instruct the CLI to match layout rules commonly followed by languages like C and C++ on an individual platform, where this is possible while still guaranteeing verifiable layout.  explicit layout allows the CIL generator to specify the precise layout semantics; specific rules govern which explicit layouts are verifiable.

9.1.3         Type Semantics Attributes

<classAttr> ::= …

| interface

 

The type semantic attributes specify whether an interface, class, or value type shall be defined.  The interface attribute specifies an interface.  If this attribute is not present and the definition extends (directly or indirectly) System.ValueType a value type shall be defined (see Chapter 12).   Otherwise, a class shall be defined (see Chapter 10).

Note that the runtime size of a value type shall not exceed 1 MByte (0x100000 bytes)

Implementation Specific (Microsoft)

The current implementation allows 0x3F0000 bytes, but may be reduced in future

9.1.4         Inheritance Attributes

<classAttr> ::= …

| abstract

| sealed

 

Attributes that specify special semantics are abstract and sealed. These attributes may be used together.

abstract specifies that this type shall not be instantiated.  If a type contains abstract methods, the type shall be declared as an abstract type.

sealed specifies that a type shall not have subclasses.  All value types shall be sealed.

Rationale: Virtual methods of sealed types are effectively instance methods, since they cannot be overridden. Framework authors should use sealed classes sparingly since they do not provide a convenient building block for user extensibility.  Sealed classes may be necessary when the implementation of a set of virtual methods for a single class (typically inherited from different interfaces) becomes interdependent or depends critically on implementation details not visible to potential subclasses.

A type that is both abstract and sealed should have only static members, and serves as what some languages call a namespace.

9.1.5         Interoperation Attributes

<classAttr> ::= …

| ansi

| autochar

| unicode

 

These attributes are for interoperation with unmanaged code.  They specify the default behavior to be used when calling a method (static, instance, or virtual) on the class that has an argument or return type of System.String and does not itself specify marshalling behavior.  Only one value shall be specified for any type, and the default value is ansi.

ansi specifies that marshalling shall be to and from ANSI strings

unicode specifies that marshalling shall be to and from Unicode strings

autochar specifies either ANSI or Unicode behavior, depending on the platform on which the CLI is running.

9.1.6         Special Handling Attributes

<classAttr> ::= …

| beforefieldinit

| serializable

| specialname

| rtspecialname

 

These attributes may be combined in any way.

beforefieldinit instructs the CLI that it need not initialize the type before a static method is called.  See clause 9.5.3.

Implementation Specific (Microsoft)

serializable indicates that the fields of the type may be serialized into a data stream by the CLI serializer.  See Partition IV_alink=Partition_IV.

specialname indicates that the name of this item may have special significance to tools other than the CLI.  See, for example, Partition I_alink=Partition_I .

rtspecialname indicates that the name of this item has special significance to the CLI.  There are no currently defined special type names; this is for future use.  Any item marked rtspecialname shall also be marked specialname

Rationale: If an item is treated specially by the CLI, then tools should also be made aware of that. The converse is not true.

9.2         Body of a Type Definition

A type may contain any number of further declarations. The directives .event, .field, .method, and .property are used to declare members of a type. The directive .class inside a type declaration is used to create a nested type, which is discussed in further detail in Section 9.6.

<classMember> ::=

Description

Section

  .class <classHead> { <classMember>* }

Defines a nested type.

9.6

| .custom <customDecl>

Custom attribute.

20

| .data <datadecl>

Defines static data associated with the type.

15.3

| .event <eventHead> { <eventMember>* }

Declares an event.

17

| .field <fieldDecl>

Declares a field belonging to the type.

15

| .method <methodHead> { <methodBodyItem>* }

Declares a method of the type.

14

| .override <typeSpec> :: <methodName> with <callConv> <type> <typeSpec> :: <methodName> ( <parameters> )

Specifies that the first method is overridden by the definition of the second method.

9.3.2

| .pack <int32>

Used for explicit layout of fields.

9.7

| .property <propHead> { <propMember>* }

Declares a property of the type.

16

| .size <int32>

Used for explicit layout of fields.

9.7

| <externSourceDecl>

.line

5.7

| <securityDecl>

.permission or .capability

19

 

9.3         Introducing and Overriding Virtual Methods

A virtual method of a base type is overridden by providing a direct implementation of the method (using a method definition, see Section 14.4) and not specifying it to be newslot (see clause 14.4.2.3).  An existing method body may also be used to implement a given virtual declaration using the .override directive (see clause 9.3.2).

9.3.1         Introducing a Virtual Method

A virtual method is introduced in the inheritance hierarchy by defining a virtual method (see Section 14.4). The versioning semantics differ depending on whether or not the definition is marked as newslot (see clause 14.4.2.3):

If the definition is marked newslot then the definition always creates a new virtual method, even if a base class provides a matching virtual method.  Any reference to the virtual method created before the new virtual function was defined will continue to refer to the original definition.

If the definition is not marked newslot then it creates a new virtual method only if there is no virtual method of the same name and signature inherited from a base class.  If the inheritance hierarchy changes so that the definition matches an inherited virtual function the definition will be treated as a new implementation of the inherited function.

9.3.2         The .override DirectiveUsusally the VES

The .override directive specifies that a virtual method should be implemented (overridden), in this type, by a virtual method with a different name but with the same signature. It can be used to provide an implementation for a virtual method inherited from a base class or a virtual method specified in an interface implemented by this type.  The .override directive specifies a Method Implementation (MethodImpl) in the metadata (see clause 14.1.4).

<classMember> ::=

Section

  .override <typeSpec> :: <methodName> with <callConv> <type> <typeSpec> :: <methodName> ( <parameters> )

 

| …

9.2

 

The first <typeSpec> :: <methodName> pair specifies the virtual method that is being overridden.  It shall reference either an inherited virtual method or a virtual method on an interface that the current type implements.  The remaining information specifies the virtual method that provides the implementation. 

While the syntax specified here and the actual metadata format (see Section 21.25 )allows any virtual method to be used to provide an implementation, a conforming program shall provide a virtual method actually implemented directly on the type containing the .override directive.

Rationale: The metadata is designed to be more expressive than can be expected of all implementations of the VES.

Example (informative):

The following example shows a typical use of the .override directive. A method implementation is provided for a method declared in an interface (see Chapter 11).

.class interface I

{ .method public virtual abstract void m() cil managed {}

}

.class C implements I

{ .method virtual public void m2()

  { // body of m2

  }

  .override I::m with instance void C::m2()

}

The .override directive specifies that the C::m2 body shall provide the implementation of be used to implement I::m on objects of class C.

9.3.3         Accessibility and Overriding

If a type overrides an inherited method, it may widen, but it shall not narrow, the accessibility of that method.  As a principle, if a client of a type is allowed to access a method of that type, then it should also be able to access that method (identified by name and signature) in any derived type.  Table 7.1 specifies narrow and widen in this contexta “Yes” denotes that the subclass can apply that accessibility, a “No” denotes it is illegal.

Table 7.1: Legal Widening of Access to a Virtual Method

Subclass

Base type Accessibility

 

private

family

assembly

famandassem

famorassem

public

private

Yes

No

No

No

No

No

family

Yes

Yes

No

No

If not in same assembly

No

assembly

Yes

No

Same assembly

No

No

No

famandassem

Yes

No

No

Same assembly

No

No

famorassem

Yes

Yes

Same assembly

Yes

Same assembly

No

public

Yes

Yes

Yes

Yes

Yes

Yes

 

Note: A method may be overridden even if it may not be accessed by the subclass.

If a method has assembly accessibility, then it shall have public accessibility if it is being overridden by a method in a different assembly. A similar rule applies to famandassem, where also famorassem is allowed outside the assembly. In both cases assembly or famandassem, respectively, may be used inside the same assembly.

A special rule applies to famorassem, as shown in the table. This is the only case where the accessibility is apparently narrowed by the subclass. A famorassem method may be overridden with family accessibility by a type in another assembly.

Rationale: Because there is no way to specify “family or specific other assembly” it  is not possible to specify that the accessibility should be unchanged.  To avoid narrowing access, it would be necessary to specify an accessibility of public, which would force widening of access even when it  is not desired.  As a compromise, the minor narrowing of “family” alone is permitted.

9.4         Method Implementation Requirements

A type (concrete or abstract) may provide

·              implementations for instance, static, and virtual methods that it introduces

·              implementations for methods declared in interfaces that it has specified it will implement, or that its base type  has specified it will implement

·              alternative implementations for virtual methods inherited from its parent

·              implementations for virtual methods inherited from an abstract base type that did not provide an implementation

A concrete (i.e. non-abstract) type shall provide either directly or by inheritance an implementation for

·              all methods declared by the type itself

·              all virtual methods of interfaces implemented by the type

·              all virtual methods that the type inherits from its base type

9.5         Special Members

There are three special members, all methods, that can be defined as part of a type: instance constructors, instance finalizers, and type initializers.

9.5.1         Instance constructors

Instance constructors initialize an instance of a type. An instance constructor is called when an instance of a type is created by the newobj instruction (see Partition III_alink=Partition_III).  Instance constructors shall be instance (not static or virtual) methods, they shall be named .ctor and marked both rtspecialname and specialname (see clause 14.4.2.6). Instance constructors may take parameters, but shall not return a value. Instance constructors may be overloaded (i.e. a type may have several instance constructors). Each instance constructor shall have a unique signature. Unlike other methods, instance constructors may write into fields of the type that are marked with the initonly attribute (see clause 15.1.2).

Example (informative):

The following shows the definition of an instance constructor that does not take any parameters:

.class X {

.method public rtspecialname specialname instance void .ctor() cil managed

{ .maxstack 1

// call super constructor

ldarg.0              // load this pointer

call instance void [mscorlib]System.Object::.ctor()

// do other initialization work

ret

}

}

9.5.2         Instance Finalizer

The behavior of finalizers is specified in Partition I_alink=Partition_I. The finalize method for a particular type is specified by overriding the virtual method Finalize in System.Object.

9.5.3         Type Initializer

Types may contain special methods called type initializers to initialize the type itself.

All types (classes, interfaces, and value types) may have a type initializer.  This method shall be static, take no parameters, return no value, be marked with rtspecialname and specialname (see clause 14.4.2.6), and be named .cctor.

Like instance initializers, type initializers may write into static fields of their type that are marked with the initonly attribute (see clause 15.1.2).

Note: Type initializers are often simple methods that initialize the type’s static fields from stored constants or via simple computations. There are, however, no limitations on what code is permitted in a type initializer.

9.5.3.1          Type Initialization Guarantees

The CLI shall provide the following guarantees regarding type initialization (but see also clause 9.5.3.2 and clause 9.5.3.3):

1.             When type initializers are executed is specified in Partition I_alink=Partition_I

2.             A type initializer shall run exactly once for any given type, unless explicitly called by user code

3.             No method other than those called directly or indirectly from the type initializer will be able to access members of a type before its initializer completes execution.

9.5.3.2          Relaxed Guarantees

A type can be marked with the attribute beforefieldinit (see clause 9.1.6) to indicate that all the guarantees specified in clause 9.5.3.1  are not required.  In particular, the final requirement of guarantee 1 need not be provided: the type initializer  need not run before a static method is called or referenced.

Rationale: When code can be executed in multiple application domains it becomes particularly expensive to ensure this final guarantee.  At the same time, examination of large bodies of managed code have shown that this final guarantee is rarely required, since type initializers are almost always simple methods for initializing static fields.  Leaving it up to the CIL generator (and hence, possibly, to the programmer) to decide whether this guarantee is required therefore provides efficiency when it is desired at the cost of consistency guarantees.

9.5.3.3          Races and Deadlocks

In addition to the type initialization guarantees specified in clause 9.5.3.1 the CLI shall ensure two further guarantees for code that is called from a type initializer:

1.             Static variables of a type are in a known state prior to any access whatsoever.

2.             Type initialization alone shall not create a deadlock unless some code called from a type initializer (directly or indirectly) explicitly invokes blocking operations.

Rationale:

Consider the following two class definitions:

.class public A extends [mscorlib]System.Object

{ .field static public class A a

  .field static public class B b

 

  .method public static rtspecialname specialname void .cctor ()

  { ldnull                    // b=null

    stsfld class B A::b

    ldsfld class A B::a       // a=B.a

    stsfld class A A::a

    ret

  }

}

 

.class public B extends [mscorlib]System.Object

{ .field static public class A a

  .field static public class B b

 

  .method public static rtspecialname specialname void .cctor ()

  { ldnull                    // a=null

    stsfld class A B::a

    ldsfld class B A::b       // b=A.b

    stfld class B B::b

    ret

  }

}

After loading these two classes, an attempt to reference any of the static fields causes a problem, since the type initializer for each of A and B requires that the type initializer of the other be invoked first. Requiring that no access to a type be permitted until its initializer has  completed would create a deadlock situation. Instead, the CLI provides a weaker guarantee: the initializer will have started to run, but it need not have completed. But this alone would allow the full uninitialized state of a type to be visible, which would make it difficult to guarantee repeatable results.

There are similar, but more complex, problems when type initialization takes place in a multi-threaded system. In these cases, for example, two separate threads might start attempting to access static variables of separate types (A and B) and then each would have to wait for the other to complete initialization.

A rough outline of the algorithm is as follows:

1. At class load time (hence prior to initialization time) store zero or null into all static fields of the type.

2. If the type is initialized you are done.

2.1. If the type is not yet initialized, try to take an initialization lock. 

2.2. If successful, record this thread as responsible for initializing the type and proceed to step 2.3.

2.2.1. If not, see whether this thread or any thread waiting for this thread to complete already holds the lock.

2.2.2. If so, return since blocking would create a deadlock.  This thread will now see an incompletely initialized state for the type, but no deadlock will arise.

2.2.3  If not, block until the type is initialized then return.

2.3 Initialize the parent type and then all interfaces implemented by this type.

2.4 Execute the type initialization code for this type.

2.5 Mark the type as initialized, release the initialization lock, awaken any threads waiting for this type to be initialized, and return.

9.6         Nested Types

Nested types are specified in Partition I_alink=Partition_I. Interfaces may be nested inside of classes and value types, but classes and value types shall not be nested inside of interfaces. For information about the logical tables associated with nested types, see Section 21.29.

Note: A nested type is not associated with an instance of its enclosing type. The nested type has its own base type and may be instantiated independent of the enclosing type. This means that the instance members of the enclosing type are not accessible using the this pointer of the nested type.

A nested type may access any members of its enclosing type, including private members, as long as the member is static or the nested type has a reference to an instance of the enclosing type. Thus, by using nested types a type may give access to its private members to another type.

On the other side, the enclosing type may not access any private or family members of the nested type. Only members with assembly, famorassem, or public accessibility can be accessed by the enclosing type.

Example (informative):

The following example shows a class declared inside another class. Both classes declare a field. The nested class may access both fields, while the enclosing class does not have access to the field b.

.class private auto autochar CounterTextBox

       extends [System.Windows.Forms]System.Windows.Forms.TextBox

implements [.module Counter]IcountDisplay

{ .field static private int32 a

  /* Nested class. Declares the NegativeNumberException */

  .class nested assembly NonPositiveNumberException extends [mscorlib]System.Exception

  { .field static private int32 b

    // body of nested class

  } // end of nested class NegativeNumberException

}

9.7         Controlling Instance Layout

The CLI supports both sequential and explicit layout control, see clause 9.1.2. For explicit layout it is also necessary to specify the precise layout of an instance, see also Section 21.18 and Section 21.16.

<fieldDecl> ::=

  [[ <int32> ]] <fieldAttr>* <type> <id>

 

The optional int32 specified in brackets at the beginning of the declaration specifies the byte offset from the beginning of the instance of the type.  This form of explicit layout control shall not be used with global fields specified using the at notation (see clause 15.3.2).

Offset values shall be 0 or greater; they cannot be negative. It is possible to overlap fields in this way, even though it is not recommended. The field may be accessed using pointer arithmetic and ldind to load the field indirectly or stind to store the field indirectly (see Partition III_alink=Partition_III).  See Section 21.18 and Section 21.16 for encoding of this information. For explicit layout, every field shall be assigned an offset.

The .pack directive specifies that fields should be placed within the runtime object at addresses which are a multiple of the specified number, or at natural alignment for that field type, whichever is smaller.  e.g., .pack 2 would allow 32-bit-wide fields to be started on even addresses – whereas without any .pack directive, they would be naturally aligned – that is to say, placed on addresses that are a multiple of 4.  The integer following .pack shall be one of 0, 1, 2, 4, 8, 16, 32, 64 or 128.  (A value of zero indicates that the pack size used should match the default for the current platform).  The .pack directive shall not be supplied for any type with explicit layout control.

The directive .size specifies that a memory block of the specified amount of bytes shall be allocated for an instance of the type. e.g., .size 32 would create a block of 32 bytes for the instance.  The value specified shall be greater than or equal to the calculated size of the class, based upon its field sizes and any .pack directive.  Note that if this directive applies to a value type, then the size shall be less than 1 MByte.

Note:  Metadata that controls instance layout is not a “hint,” it is an integral part of the VES that shall be supported by all conforming implementations of the CLI.

Example (informative):

The following class uses sequential layout of its fields:

.class sequential public SequentialClass

{ .field public int32 a             // store at offset 0 bytes

  .field public int32 b             // store at offset 4 bytes

}

The following class uses explicit layout of its fields:

.class explicit public ExplicitClass

{ .field [0] public int32 a   // store at offset 0 bytes

  .field [6] public int32 b   // store at offset 6 bytes

}

The following value type uses .pack to pack its fields together:

.class value sealed public MyClass extends [mscorlib]System.ValueType

{ .pack 2

  .field  public int8  a      // store at offset 0 bytes

  .field  public int32 b      // store at offset 2 bytes (not 4)

}

The following class specifies a contiguous block of 16 bytes:

.class public BlobClass

{ .size  16

}

9.8         Global Fields and Methods

In addition to types with static members, many languages have the notion of data and methods that are not part of a type at all. These are referred to as global fields and methods.

It is simplest to understand global fields and methods in the CLI by imagining that they are simply members of an invisible abstract public class. In fact, the CLI defines such a special class, named ′<Module>′, that does not have a base type and does not implement any interfaces. The only noticeable difference is in how definitions of this special class are treated when multiple modules are combined together, as is done by a class loader. This process is known as metadata merging.

For an ordinary type, if the metadata merges two definitions of the same type, it simply discards one definition on the assumption they are equivalent and that any anomaly will be discovered when the type is used.  For the special class that holds global members, however, members are unioned across all modules at merge time. If the same name appears to be defined for cross-module use in multiple modules then there is an error.  In detail:

·              If no member of the same kind (field or method), name, and signature exists, then add this member to the output class.

·              If there are duplicates and no more than one has an accessibility other than compilercontrolled, then add them all in the output class.

·              If there are duplicates and two or more have an accessibility other than compilercontrolled an error has occurred.

10      Semantics of Classes

Classes, as specified in Partition I_alink=Partition_I, define types in an inheritance hierarchy.  A class (except for the built-in class System.Object) shall declare exactly one parent class.  A class shall declare zero or more interfaces that it implements (see Chapter 11).  A concrete class may be instantiated to create an object, but an abstract class (see clause 9.1.4) shall not be instantiated.   A class may define fields (static or instance), methods (static, instance, or virtual), events, properties, and nested types (classes, value types, or interfaces).

Instances of a class (objects) are created only by explicitly using the newobj instruction (see Partition III_alink=Partition_III).  When a variable or field that has a class as its type is created (for example, by calling a method that has a local variable of a class type) the value shall initially be null, a special value that is assignment compatible with all class types even though it is not an instance of any particular class.

11      Semantics of Interfaces

Interfaces, as specified in Partition I_alink=Partition_I, define a contract that other types may implement. Interfaces may have static fields and methods, but they shall not have instance fields or methods.  Interfaces may define virtual methods, but only if they are abstract (see Partition I_alink=Partition_I and clause 14.4.2.4).

Rationale: Interfaces cannot define instance fields for the same reason that the CLI does not support multiple inheritance of base types: in the presence of dynamic loading of data types there is no known implementation technique that is both efficient when used and has no cost when not used.  By contrast, providing static fields and methods need not affect the layout of instances and therefore does not raise these issues.

Interfaces may be nested inside any type (interface, class, or value type).  Classes and value types shall not be nested inside of interfaces.

11.1      Implementing Interfaces

Classes and value types shall implement zero or more interfaces.  Implementing an interface implies that all concrete instances of the class or value type shall provide an implementation for each abstract virtual method declared in the interface.   In order to implement an interface, a class or value type shall either explicitly declare that it does so (using the implements attribute in its type definition, see Section 9.1) or shall be derived from a base class that implements the interface.

Note: An abstract class (since it cannot be instantiated) need not provide implementations of the virtual methods of interfaces it implements, but any concrete class derived from it shall provide the implementation.

Merely providing implementations for all of the abstract methods of an interface is not sufficient to have a type implement that interface.  Conceptually, this represents that fact that an interface represents a contract that may have more requirements than are captured in the set of abstract methods.  From an implementation point of view, this allows the layout of types to be constrained only by those interfaces that are explicitly declared.

Interfaces shall declare that they require the implementation of zero or more other interfaces. If one interface, A, declares that it requires the implementation of another interface, B, then A implicitly declares that it requires the implementation of all interfaces required by B. If a class or value type declares that it implements A, then all concrete instances shall provide implementations of the virtual methods declared in A and all of the interfaces A requires.

Example (informative):

The following class implements the interface IStartStopEventSource defined in the module Counter.

.class private auto autochar StartStopButton
       extends [System.Windows.Forms]System.Windows.Forms.Button
       implements [.module Counter]IstartStopEventSource
{ // body of class
}

11.2      Implementing Virtual Methods on Interfaces

Classes that implement an interface (see Section 11.1) are required to provide implementations for the abstract virtual methods defined by the interface.  There are three mechanisms for providing this implementation:

·              directly specifying an implementation, using the same name and signature as appears in the interface

·              inheritance of an existing implementation from the base type

·              use of an explicit MethodImpl (see clause 14.1.4).

The Virtual Execution System shall determine the appropriate implementation of a virtual method to be used for an interface abstract method using the following algorithm. 

·              If the parent class implements the interface, start with the same virtual methods that it provides, otherwise create an interface that has empty slots for all virtual functions.

·              If this class explicitly specifies that it implements the interface

o             if the class defines any public virtual newslot functions whose name and signature match a virtual method on the  interface, then use these new virtual methods to implement the corresponding interface method.

·              If there are any virtual methods in the interface that still have empty slots, see if there are any public virtual methods available on this class (directly or inherited) and use these to implement the corresponding methods on the interface.

·              Apply all MethodImpls that are specified for this class, thereby placing explicitly specified virtual methods into the interface in preference to those inherited or chosen by name matching.

·              If the current class is not abstract and there are any interface methods that still have empty slots, then the program is not valid.

Rationale: Interfaces can be thought of as specifying, primarily, a set of virtual methods that shall be implemented by any class that implements the interface.  The class specifies a mapping from its own virtual methods to those of the interface.  Thus it is virtual methods, not specific implementations of those methods, that are associated with interfaces.  Overriding a virtual method on a class with a specific implementation will thus affect not only the virtual method named in the class but also any interface virtual methods to which that same virtual method has been mapped.

12      Semantics of Value Types

In contrast to classes, value types (see Partition I_alink=Partition_I) are not accessed by using a reference but are stored directly in the location of that type.

Rationale: Value types are used to describe the type of small data items. They can be compared to struct (as opposed to pointers to struct) types in C++. Compared to reference types, value types are accessed faster since there is no additional indirection involved. As elements of arrays they do not require allocating memory for the pointers as well as for the data itself.  Typical value types are complex numbers, geometric points, or dates.

Like other types, value types may have fields (static or instance), methods (static, instance, or virtual), properties, events, and nested types.  A value type may be converted into a corresponding reference type (its boxed form, a class automatically created for this purpose by the VES when a value type is defined) by a process called boxing. A boxed value type may be converted back into its value type representation, the unboxed form, by a process called unboxing.  Value types shall be sealed, and they shall have a base type of either System.ValueType or System.Enum (see Partition IV_alink=Partition_IV).  Value types shall implement zero or more interfaces, but this has meaning only in their boxed form (see Section 12.3).

Unboxed value types are not considered subtypes of another type and it is not valid to use the isinst instruction (see Partition III_alink=Partition_III) on unboxed value types. The isinst instruction may be used for boxed value types.  Unboxed value types shall not be assigned the value null and they shall not be compared to null.

Value types support layout control in the same way as reference types do (see Section 9.7). This is especially important when values are imported from native code.

12.1      Referencing Value Types

The unboxed form of a value type shall be referred to by using the valuetype keyword followed by a type reference.   The boxed form of a value type shall be referred to by using the boxed keyword followed by a type reference.

<valueTypeReference> ::=    

     boxed <typeReference> |

  valuetype <typeReference>

 

Implementation Specific (Microsoft)

For historical reasons “value class” may be used instead of “valuetype” although the latter is preferred.  V1 of the CLI does not support direct references to boxed value types; they should be treated as object instead.

12.2      Initializing Value Types

Like classes, value types may have both instance constructors (see clause 9.5.1) and type initializers (see clause 9.5.3). Unlike classes that are automatically initialized to null, however, the following rules constitute the only guarantee about the initilisation of (unboxed) value types:

·              Static variables shall be initialized to zero when a type is loaded (see clause 9.5.3.3), hence statics whose type is a value type are zero-initialized when the type is loaded.

·              Local variables shall be initialized to zero if the appropriate bit in the method header (see clause 24.4.4) is set.

·              Arrays shall be zero initialized.

·              Instances of classes (i.e. objects) shall be zero initialized prior to calling their instance constructor.

Rationale: Guaranteeing automatic initialization of unboxed value types is both difficult and expensive, especially on platforms that support thread-local storage and allow threads to be created outside of the CLI and then passed to the CLI for management.

 

Note: Boxed value types are classes and follow the rules for classes.

The instruction initobj (see Partition III_alink=Partition_III) performs zero-initialization under program control.  If a value type has a constructor, an instance of its unboxed type can be created as is done with classes. The newobj instruction (see Partition III_alink=Partition_III) is used along with the initializer and its parameters to allocate and initialize the instance. The instance of the value type will be allocated on the stack. The Base Class Library provides the method System.Array.Initialize (see Partition IV_alink=Partition_IV) to zero all instances in an array of unboxed value types.

Example (informative):

The following code declares and initializes three value type variables.  The first variable is zero-initialized, the second is initialized by calling an instance constructor, and the third by creating the object on the stack and storing it into the local.

.assembly Test { }

.assembly extern System.Drawing {

  .ver 1:0:3102:0

.publickeytoken = (b03f5f7f11d50a3a)

}

.method public static void Start()

{ .maxstack 3

  .entrypoint

  .locals init (valuetype [System.Drawing]System.Drawing.Size Zero,

valuetype [System.Drawing]System.Drawing.Size Init,

valuetype [System.Drawing]System.Drawing.Size Store)

 

  // Zero initialize the local named Zero

  ldloca Zero        // load address of local variable

  initobj valuetype [System.Drawing]System.Drawing.Size

 

  // Call the initializer on the local named Init

  ldloca Init // load address of local variable

  ldc.i4 425 // load argument 1 (width)

  ldc.i4 300 // load argument 2 (height)

  call instance void [System.Drawing]System.Drawing.Size::.ctor(int32, int32)

 

  // Create a new instance on the stack and store into Store.  Note that

  // stobj is used here – but one could equally well  use stloc, stfld, etc.

  ldloca Store

  ldc.i4 425 // load argument 1 (width)

  ldc.i4 300 // load argument 2 (height)

  newobj instance void [System.Drawing]System.Drawing.Size::.ctor(int32, int32)

  stobj valuetype [System.Drawing]System.Drawing.Size

  ret

}

12.3      Methods of Value Types

Value types may have static, instance and virtual methods. static methods of value types are defined and called the same way as static methods of class types.  As with classes, both instance and virtual methods of a boxed or unboxed value type may be called using the call instruction. The callvirt instruction shall not be used with unboxed value types, but it may be used on boxed value types.

Instance and virtual methods of classes shall be coded to expect a reference to an instance of the class as the this pointer.  By contrast, instance and virtual methods of value types shall be coded to expect a managed pointer  (see Partition I_alink=Partition_I) to an unboxed instance of the value type.  The CLI shall convert a boxed value type into a managed pointer to the unboxed value type when a boxed value type is passed as the this pointer to a virtual method whose implementation is provided by the unboxed value type.

Note: This operation is the same as unboxing the instance, since the unbox instruction (see Partition III_alink=Partition_III) is defined to return a managed pointer to the value type that shares memory with the original boxed instance.

The following diagrams may help understand the relationship between the boxed and unboxed representations of a value type.

 

 

Rationale: An important use of instance methods on value types is to change internal state of the instance.  This cannot be done if an instance of the unboxed value type is used for the this pointer, since it would be operating on a copy of the value, not the original value: unboxed value types are copied when they are passed as arguments.

Virtual methods are used to allow multiple types to share implementation code, and this requires that all classes that implement the virtual method share a common representation defined by the class that first introduces the method.  Since value types can (and in the Base Class Library do) implement interfaces and virtual methods defined on System.Object, it is important that the virtual method be callable using a boxed value type so  it can be manipulated as would any other type that implements the interface.  This leads to the requirement that the EE automatically unbox value types on virtual calls.

Table 1: Type of this given CIL instruction and declaring type of instance method.

 

Value Type (Boxed or Unboxed)

Interface

Class Type

call

managed pointer to value type

illegal

object reference

callvirt

managed pointer to value type

object reference

object reference

 

Example (informative):

The following converts an integer of the value type int32 into a string. Recall that int32 corresponds to the unboxed value type System.Int32 defined in the Base Class Library.  Suppose the integer is declared as:

     .locals init (int32 x)

Then the call is made as shown below:

     ldloca x          // load managed pointer to local variable

     call instance string
          valuetype [mscorlib]System.Convert::ToString()

However, if System.Object (a class) is used as the type reference rather than System.Int32 (a value type), the value of x shall be boxed before the call is made and the code becomes:

     ldloc x

     box valuetype [mscorlib]System.Int32

     callvirt instance string [mscorlib]System.Object::ToString()

13      Semantics of Special Types

Special Types are those that are referenced from CIL, but for which no definition is supplied: the VES supplies the definitions automatically based on information available from the reference.

13.1      Vectors

<type> ::= …

     | <type> [ ]

 

Vectors are single-dimension arrays with a zero lower bound.  They have direct support in CIL instructions (newarr, ldelem, stelem, and ldelema, see Partition III_alink=Partition_III).  The CIL Framework also provides methods that deal with multidimensional arrays, or single-dimension arrays with a non-zero lower bound (see Section 13.2). Two vectors are the same type if their element types are the same, regardless of their actual upper bounds.

Vectors have a fixed size and element type, determined when they are created.  All CIL instructions shall respect these values.  That is, they shall reliably detect attempts to index beyond the end of the vector, attempts to store the incorrect type of data into an element of a vector, and attempts to take addresses of elements of a vector with an incorrect data type.  See Partition III_alink=Partition_III.

Example (informative):

Declaring a vector of Strings:

     .field string[] errorStrings

Declaring a vector of function pointers:

     .field method instance void*(int32) [] myVec

Create a vector of 4 strings, and store it into the field errorStrings.  The four strings lie at errorStrings[0] through errorStrings[3]:

      ldc.i4.4

      newarr string

      stfld     string[] CountDownForm::errorStrings

Store the string "First" into errorStrings[0]:

     ldfld string[] CountDownForm::errorStrings

     ldc.i4.0

     ldstr "First"

     stelem

Vectors are subtypes of System.Array, an abstract class pre-defined by the CLI.  It provides several methods that can be applied to all vectors. See Partition IV_alink=Partition_IV.

13.2      Arrays

While vectors (see Section 13.1) have direct support through CIL instructions, all other arrays are supported by the VES by creating subtypes of the abstract class System.Arrray (see Partition IV_alink=Partition_IV)

<type> ::= …

   | <type> [ [<bound> [,<bound>]*] ]

 

The rank of an array is the number of dimensions.  The CLI does not support arrays with rank 0.  The type of an array (other than a vector) shall be determined by the type of its elements and the number of dimensions.

<bound> ::=

Description

  ...

lower and upper bounds unspecified.  In the case of multi-dimensional arrays, the ellipsis may be omitted

| <int32>

zero lower bound, <int32> upper bound

| <int32> ...

lower bound only specified

| <int32> ... <int32>

both bounds specified

 

The fundamental operations provided by the CIL instruction set for vectors are provided by methods on the class created by the VES.

The VES shall provide two constructors for arrays.  One takes a sequence of numbers giving the number of elements in each dimension (a lower bound of zero is assumed).  The second takes twice as many arguments: a sequence of lower bounds, one for each dimension; followed by a sequence of lengths, one for each dimension (where length is the number of elements required). 

In addition to array constructors, the VES shall provide the instance methods Get, Set, and Address to access specific elements and compute their addresses. These methods take a number for each dimension, to specify the target element.  In addition, Set takes an additional final argument specifying the value to store into the target element.

Example (informative):

Creates an array, MyArray, of strings with two dimensions, with indexes 5..10 and 3..7.  Stores the string "One" into MyArray[5, 3], retrieves it and prints it out. Then computes the address of MyArray[5, 4], stores "Test" into it, retrieves it, and prints it out.

.assembly Test { }

.assembly extern mscorlib { }

 

.method public static void Start()

{ .maxstack 5

  .entrypoint

  .locals (class [mscorlib]System.String[,] myArray)

 

  ldc.i4.5      // load lower bound for dim 1

  ldc.i4.6      // load (upper bound - lower bound + 1) for dim 1

  ldc.i4.3      // load lower bound for dim 2

  ldc.i4.5      // load (upper bound - lower bound + 1) for dim 2

  newobj instance void string[,]::.ctor(int32,

         int32, int32, int32)

  stloc  myArray

 

  ldloc myArray

  ldc.i4.5

  ldc.i4.3

  ldstr "One"

  call instance void string[,]::Set(int32, int32, string)

 

  ldloc myArray

  ldc.i4.5

  ldc.i4.3

  call instance string string[,]::Get(int32, int32)

  call void [mscorlib]System.Console::WriteLine(string)

 

  ldloc myArray

  ldc.i4.5

  ldc.i4.4

  call instance string & string[,]::Address(int32, int32)

  ldstr "Test"

  stind.ref

 

  ldloc myArray

  ldc.i4.5

  ldc.i4.4

  call instance string string[,]::Get(int32, int32)

  call void [mscorlib]System.Console::WriteLine(string)

 

  ret

}


The following text is informative

Whilst the elements of multi-dimensional arrays can be thought of as laid out in contiguous memory, arrays of arrays are different – each dimension (except the last) holds an array reference.  The following picture illustrates the difference:

       

On the left is a [6, 10] rectangular array.  On the right is not one, but a total of five arrays.  The vertical array is an array of arrays, and references the four horizontal arrays.  Note how the first and second elements of the vertical array both reference the same horizontal array.

Note that all dimensions of a multi-dimensional array shall be of the same size.  But in an array of arrays, it is possible to reference arrays of different sizes.  For example, the figure on the right shows the vertical array referencing arrays of lengths 8, 8, 3, null, 6 and 1.

There is no special support for these so-called jagged arrays in either the CIL instruction set or the VES.  They are simply vectors whose elements are themselves either the base elements or (recursively) jagged arrays.

End of informative text


13.3      Enums

An enum, short for enumeration, defines a set of symbols that all have the same type.  A type shall be an enum if and only if it has an immediate base type of System.Enum.  Since System.Enum itself has an immediate base type of System.ValueType (see Partition IV_alink=Partition_IV), enums are value types (see Chapter 12).  The symbols of an enum are represented by an underlying type: one of { bool, char, int8, unsigned int8, int16, unsigned int16, int32, unsigned int32, int64, unsigned int64, float32, float64, native int, unsigned native int }

Note: The CLI does not provide a guarantee that values of the enum type are integers corresponding to one of the symbols (unlike Pascal).  In fact, the CLS (see Partition I_alink=Partition_I, CLS) defines a convention for using enums to represent bit flags which can be combined to form integral value that are not named by the enum type itself.

Enums obey additional restrictions beyond those on other value types.  Enums shall contain only fields as members (they shall not even define type initializers or instance constructors); they shall not implement any interfaces; they shall have auto field layout (see clause 9.1.2); they shall have exactly one instance field and it shall be of the underlying type of the enum; all other fields shall be static and literal (see Section 15.1); and they shall not be initialized with the initobj instruction.

Rationale: These restrictions allow a very efficient implementation of enums.

The single, required, instance field stores the value of an instance of the enum. The static literal fields of an enum declare the mapping of the symbols of the enum to the underlying values.  All of these fields shall have the type of the enum and shall have field init metadata that assigns them a value (see Section 15.2).

For binding purposes (e.g. for locating a method definition from the method reference used to call it) enums shall be distinct from their underlying type.  For all other purposes, including verification and execution of code, an unboxed enum freely interconverts with its underlying type.  Enums can be boxed (see Chapter 12) to a corresponding boxed instance type, but this type is not the same as the boxed type of the underlying type, so boxing does not lose the original type of the enum.

Example (informative):

Declare an enum type, then create a local variable of that type.  Store a constant of the underlying type into the enum (showing automatic coercsion from the underlying type to the enum type).  Load the enum back and print it as the underlying type (showing automatic coersion back).  Finally, load the address of the enum and extract the contents of the instance field and print that out as well.

.assembly Test { }

.assembly extern mscorlib { }

 

.class sealed public ErrorCodes extends [mscorlib]System.Enum

{ .field public unsigned int8 MyValue

  .field public static literal valuetype ErrorCodes no_error = int8(0)

  .field public static literal valuetype ErrorCodes format_error =

         int8(1)

  .field public static literal valuetype ErrorCodes overflow_error =

         int8(2)

  .field public static literal valuetype ErrorCodes nonpositive_error =

         int8(3)

}

 

.method public static void Start()

{ .maxstack 5

  .entrypoint

  .locals init (valuetype ErrorCodes errorCode)

 

ldc.i4.1           // load 1 (= format_error)

  stloc errorCode // store in local, note conversion to enum

  ldloc errorCode

  call void [mscorlib]System.Console::WriteLine(int32)

  ldloca errorCode // address of enum

  ldfld unsigned int8 valuetype ErrorCodes::MyValue

  call void [mscorlib]System.Console::WriteLine(int32)

  ret

}

13.4      Pointer Types

<type> ::= …

Section

   | <type> &

13.4.2

   | <type> *

13.4.1

 

A pointer type shall be defined by specifying a signature that includes the type for the location it points at.  A pointer may be managed (reported to the CLI garbage collector, denoted by &, see clause 13.4.2) or unmanaged (not reported, denoted by *, see clause 13.4.1)

Pointers may contain the address of a field (of an object or value type) or an element of an array.  Pointers differ from object references in that they do not point to an entire type instance, but rather to the interior of an instance.  The CLI provides two type-safe operations on pointer:

·              loading the value from the location referenced by the pointer

·              storing an assignment-compatible value into the location referenced  by the pointer

For pointers into the same array or object (see Partition I_alink=Partition_I) the following arithmetic operations are supported:

·              Adding an integer value to a pointer, where that value is interpreted as a number of bytes, results in a pointer of the same kind

·              Subtracting an integer value (number of bytes) from a pointer results in a pointer of the same kind. Note that subtracting a pointer from an integer value is not permitted. 

·              Two pointers, regardless of kind, can be subtracted from one another, producing an integer value that specifies the number of bytes between the addresses they reference.


The following is informative text

Pointers are compatible with unsigned int32 on 32-bit architectures, and with unsigned int64 on 64-bit architectures.  They are best considered as unsigned int, whose size varies depending upon the runtime machine architecture.

The CIL instruction set (see Partition III_alink=Partition_III) contains instructions to compute addresses of fields, local variables, arguments, and elements of vectors:

Instruction

Description

ldarga

Load address of argument

ldelema

Load address of vector element

ldflda

Load address of field

ldloca

Load address of local variable

ldsflda

Load address of static field

 

Once a pointer is loaded onto the stack, the ldind class of instructions may be used to load the data item to which it points. Similarly, the stind class of instructions can be used to store data into the location.

Note that the CLI will throw an InvalidOperationException for an ldflda instruction if the address is not within the current application domain. This situation arises typically only from the use of objects with a base type of System.MarshalByRefObject (see Partition IV_alink=Partition_IV).

13.4.1      Unmanaged Pointers

Unmanaged pointers (*) are the traditional pointers used in languages like C and C++. There are no restrictions on their use, although for the most part they result in code that cannot be verified. While it is perfectly legal to mark locations that contain unmanaged pointers as though they were unsigned integers (and this is, in fact, how they are treated by the VES), it is often better to mark them as unmanaged pointers to a specific type of data. This is done by using * in a signature for a return value, local variable or an argument or by using a pointer type for a field or array element.

·              Unmanaged pointers are not reported to the garbage collector and can be used in any way that an integer can be used.

·              Verifiable code cannot dereference unmanaged pointers.

·              Unverified code can pass an unmanaged pointer to a method that expects a managed pointer. This is safe only if one of the following is true:

a.            The unmanaged pointer refers to memory that is not in memory used by the CLI for storing instances of objects (“garbage collected memory” or “managed memory”).

b.            The unmanaged pointer contains the address of a field within an object.

c.            The unmanaged pointer contains the address of an element within an array.

d.            The unmanaged pointer contains the address where the element following the last element in an array would be located

13.4.2      Managed Pointers

Managed pointers (&) may point to an instance of a value type, a field of an object, a field of a value type, an element of an array, or the address where an element just past the end of an array would be stored (for pointer indexes into managed arrays). Managed pointers cannot be null, and they shall be reported to the garbage collector even if they do not point to managed memory. 

Managed pointers are specified by using & in a signature for a return value, local variable or an argument or by using a by-ref type for a field or array element.

·              Managed pointers can be passed as arguments, stored in local variables, and returned as values.

·              If a parameter is passed by reference, the corresponding argument is a managed pointer.

·              Managed pointers cannot be stored in static variables, array elements, or fields of objects or value types.

·              Managed pointers are not interchangeable with object references. 

·              A managed pointer cannot point to another managed pointer, but it can point to an object reference or a value type.

·              A managed pointer can point to a local variable, or a method argument

·              Managed pointers that do not point to managed memory can be converted (using conv.u or conv.ovf.u) into unmanaged pointers, but this is not verifiable. 

e.            Unverified code that erroneously converts a managed pointer into an unmanaged pointer can seriously compromise the integrity of the CLI. See Partition III_alink=Partition_III (Managed Pointers) for more details.

End informative text


13.5      Method Pointers

<type> ::= …

   | method <callConv> <type> * ( <parameters> )

 

Variables of type method pointer shall store the address of the entry point to a method with compatible signature.  A pointer to a static or instance method is obtained with the ldftn instruction, while a pointer to a virtual method is obtained with the ldvirtftn instruction.  A method may be called by using a method pointer with the calli instruction.  See Partition III_alink=Partition_III for the specification of these instructions.

Note: Like other pointers, method pointers are compatible with unsigned int64 on 64-bit architectures with unsigned int32 and on 32-bit architectures.  The preferred usage, however, is unsigned native int, which works on both 32- and 64-bit architectures.

Example (informative):

Call a method using a pointer.  The method MakeDecision::Decide returns a method pointer to either AddOne or Negate, alternating on each call.  The main program call MakeDecision::Decide three times and after each call uses a CALLI instruction to call the method specified.  The output printed is "-1 2 –1" indicating successful alternating calls.

.assembly Test { }

.assembly extern mscorlib { }

 

.method public static int32 AddOne(int32 Input)

{ .maxstack 5

  ldarg Input

  ldc.i4.1

  add

  ret

}

 

.method public static int32 Negate(int32 Input)

{ .maxstack 5

  ldarg Input

  neg

  ret

}

 

.class value sealed public MakeDecision extends

         [mscorlib]System.ValueType

{ .field static bool Oscillate

  .method public static method int32 *(int32) Decide()

  { ldsfld bool valuetype MakeDecision::Oscillate

    dup

    not

    stsfld bool valuetype MakeDecision::Oscillate

    brfalse NegateIt

    ldftn int32 AddOne(int32)

    ret

NegateIt:

    ldftn int32 Negate(int32)

    ret

  }

}

 

.method public static void Start()

{ .maxstack 2

  .entrypoint

 

  ldc.i4.1

  call method int32 *(int32) valuetype MakeDecision::Decide()

  calli int32(int32)

  call  void [mscorlib]System.Console::WriteLine(int32)

 

  ldc.i4.1

  call method int32 *(int32) valuetype MakeDecision::Decide()

  calli int32(int32)

  call  void [mscorlib]System.Console::WriteLine(int32)

 

  ldc.i4.1

  call method int32 *(int32) valuetype MakeDecision::Decide()

  calli int32(int32)

  call  void [mscorlib]System.Console::WriteLine(int32)

 

  ret

}

13.6      Delegates

Delegates (see Partition I_alink=Partition_I) are the object-oriented equivalent of function pointers. Unlike function pointers, delegates are object-oriented, type-safe, and secure.  Delegates are reference types, and are declared in the form of Classes.  Delegates shall have an immediate base type of System.MulticastDelegate, which in turns has an immediate base type of System.Delegate (see Partition IV_alink=Partition_IV).

Delegates shall be declared sealed, and the only members a Delegate shall have are either two or four methods as specified here. These methods shall be declared runtime and managed (see clause 14.4.3). They shall not have a body, since it shall be automatically created by the VES. Other methods available on delegates are inherited from the classes System.Delegate and System.MulticastDelegate in the Base Class Library (see Partition IV_alink=Partition_IV).

Rationale: A better design would be to simply have delegate classes derive directly from System.Delegate.  Unfortunately, backward compatibility with an existing CLI does not permit this design.

The instance constructor (named .ctor and marked specialname and rtspecialname, see clause 9.5.1) shall take exactly two parameters. The first parameter shall be of type System.Object and the second parameter shall be of type System.IntPtr.  When actually called (via a newobj instruction, see Partition III_alink=Partition_III), the first argument shall be an instance of the class (or one of its subclasses) that defines the target method and the second argument shall be a method pointer to the method to be called.

The Invoke method shall be virtual and have the same signature (return type, parameter types, calling convention, and modifiers, see Section 7.1) as the target method. When actually called the arguments passed shall match the types specified in this signature.

The BeginInvoke method (see clause 13.6.2.1), if present, shall be virtual have a signature related to, but not the same as, that of the Invoke method.  There are two differences in the signature.   First, the return type shall be System.IAsyncResult (see Partition IV_alink=Partition_IV). Second, there shall be two additional parameters that follow those of Invoke: the first of type System.AsyncCallback and the second of type System.Object.

The EndInvoke method (see clause 13.6.2) shall be virtual have the same return type as the Invoke method. It shall take as parameters exactly those parameters of Invoke that are managed pointers, in the same order they occur in the signature for Invoke.  In addition, there shall be an additional parameter of type System.IAsyncResult.

Example (informative):

The following example declares a Delegate used to call functions that take a single integer and return void.  It provides all four methods so it can be called either synchronously or asynchronously.  Because there are no parameters that are passed by reference (i.e. as managed pointers) there are no additional arguments to EndInvoke.

.assembly Test { }

.assembly extern mscorlib { }

 

.class private sealed StartStopEventHandler

       extends [mscorlib]System.MulticastDelegate

 { .method public specialname rtspecialname instance

           void .ctor(object Instance, native int Method)

runtime managed {}

   .method public virtual void Invoke(int32 action) runtime managed {}

   .method public virtual

      class [mscorlib]System.IAsyncResult

        BeginInvoke(int32 action,

class [mscorlib]System.AsyncCallback callback,

object Instance) runtime managed {}

   .method public virtual

      void EndInvoke(class [mscorlib]System.IAsyncResult result)

      runtime managed {}

}

As with any class, an instance is created using the  newobj instruction in conjunction with the instance constructor.  The first argument to the constructor shall be the object on which the method is to be called, or it shall be null if the method is a static method.  The second argument shall be a method pointer to a method on the corresponding class and with a signature that matches that of the delegate class being instantiated.

Implementation-Specific (Microsoft)

The Microsoft implementation of the CLI allows the programmer to add more methods to a delegate, on the condition that they provide an implementation for those methods (ie, they cannot be marked runtime).  Note that such use makes the resulting assembly non-portable.

13.6.1      Synchronous Calls to Delegates

The synchronous mode of calling delegates corresponds to regular method calls and is performed by calling the virtual method named Invoke on the delegate. The delegate itself is the first argument to this call (it serves as the this pointer), followed by the other arguments as specified in the signature.  When this call is made, the caller shall block until the called method returns. The called method shall be executed on the same thread as the caller.

Example (informative):

Continuing the previous example, define a class Test that declares a method, onStartStop, appropriate for use as the target for the delegate.

 

.class public Test

{ .field public int32 MyData

  .method public void onStartStop(int32 action)

  { ret        // put your code here

  }

  .method public specialname rtspecialname
          instance void .ctor(int32
Data)

  { ret        // call parent constructor, store state, etc.

  }

}

 

Then define a main program. This one constructs an instance of Test and then a delegate that targets the onStartStop method of that instance.  Finally, call the delegate.

 

.method public static void Start()

{ .maxstack 3

  .entrypoint

  .locals (class StartStopEventHandler DelegateOne,

           class Test InstanceOne)

  // Create instance of Test class

  ldc.i4.1

  newobj instance void Test::.ctor(int32)

  stloc InstanceOne

  // Create delegate to onStartStop method of that class

  ldloc InstanceOne

  ldftn instance void Test::onStartStop(int32)

  newobj void StartStopEventHandler::.ctor(object, native int)

  stloc DelegateOne

  // Invoke the delegate, passing 100 as an argument

  ldloc DelegateOne

  ldc.i4 100

  callvirt instance void StartStopEventHandler::Invoke(int32)

  ret

}

  // Note that the example above creates a delegate to a non-virtual

  // function.  If onStartStop had instead been a virtual function, use

  // the following code sequence instead :

 

  ldloc InstanceOne

  dup

  ldvirtftn instance void Test::onStartStop(int32)

  newobj void StartStopEventHandler::.ctor(object, native int)

  stloc DelegateOne

  // Invoke the delegate, passing 100 as an argument

  ldloc DelegateOne

Note: The code sequence above shall use dup –not ldloc InstanceOne twice.  The dup code sequence is easily recognized as typesafe, whereas alternatives would require more complex analysis.  Verifiability of code is discussed in Partition III_alink=Partition_III

13.6.2      Asynchronous Calls to Delegates

In the asynchronous mode, the call is dispatched, and the caller shall continue execution without waiting for the method to return. The called method shall be executed on a separate thread.

To call delegates asynchronously, the BeginInvoke and EndInvoke methods are used.

Note: if the caller thread terminates before the callee completes, the callee thread is unaffected.  The callee thread continues execution and terminates silently

Note: the callee may throw exceptions.  Any unhandled exception propagates to the caller via the EndInvoke method.

13.6.2.1       The BeginInvoke Method

An asynchronous call to a delegate shall begin by making a virtual call to the BeginInvoke method.  BeginInvoke is similar to the Invoke method (see clause 13.6.1), but has three differences:

·              It has a two additional parameters, appended to the list, of type System.AsyncCallback, and System.Object

·              The return type of the method is System.IAsyncResult

Although the BeginInvoke method therefore includes parameters that represent return values, these values are not updated by this method.  The results instead are obtained from the EndInvoke method (see below).

Unlike a synchronous call, an asynchronous call shall provide a way for the caller to determine when the call has been completed.  The CLI provides two such mechanisms.  The first is through the result returned from the call.  This object, an instance of the interface System.IAsyncResult,  can be used to wait for the result to be computed, it can be queried for the current status of the method call, and it contains the System.Object value that was passed to the call to BeginInvoke.  See Partition IV_alink=Partition_IV.

The second mechanism is through the System.AsyncCallback delegate passed to BeginInvoke. The VES shall call this delegate when the value is computed or an exception has been raised indicating that the result will not be available.  The value passed to this callback is the same value passed to the call to BeginInvoke.  A value of null may be passed for System.AsyncCallback to indicate that the VES need not provide the callback.

Rationale: This model supports both a polling approach (by checking the status of the returned System.IAsyncResult) and an event-driven approach (by supplying a System.AsyncCallback) to asynchronous calls.

A synchronous call returns information both through its return value and through output parameters.  Output parameters are represented in the CLI as parameters with managed pointer type.  Both the returned value and the values of the output parameters are not available until the VES signals that the asynchronous call has completed successfully.  They are retrieved by calling the EndInvoke method on the delegate that began the asynchronous call.

13.6.2.2       The EndInvoke Method

The EndInvoke method can be called at any time after BeginInvoke.   It shall suspend the thread that calls it until the asynchronous call completes.  If the call completes successfully, EndInvoke will return the value that would have been returned had the call been made synchronously, and its managed pointer arguments will point to values that would have been returned to the out parameters of the synchronous call.

EndInvoke requires as parameters the value returned by the originating call to BeginInvoke (so that different calls to the same delegate can be distinguished, since they may execute concurrently) as well as any managed pointers that were passed as arguments (so their return values can be provided).

14      Defining, Referencing, and Calling Methods

Methods may be defined at the global level (outside of any type):

<decl> ::= …

   | .method <methodHead> { <methodBodyItem>* }

 

as well as inside a type:

<classMember> ::= …

   | .method <methodHead> { <methodBodyItem>* }

 

14.1      Method Descriptors

There are four constructs in ilasm connected with methods.  These correspond with different metadata constructs, as described in Chapter 21.

14.1.1      Method Declarations

A MethodDecl, or method declaration, supplies the method name and signature (parameter and return types), but not its body.  That is, a method declaration provides a <methodHead> but no <methodBodyItem>s.  These are used at callsites to specify the call target (call or callvirt instructions, see Partition III_alink=Partition_III) or to declare an abstract method.  A MethodDecl has no direct logical couterpart in the metadata; it can be either a Method or a MethodRef.

14.1.2      Method Definitions

A Method, or method definition, supplies the method name, attributes, signature and body.  That is, a method definition provides a <methodHead> as well as one or more <methodBodyItem>s.  The body includes the method's CIL instructions, exception handlers, local variable information, and additional runtime or custom metadata about the method.  See Chapter 11.

14.1.3      Method References

A MethodRef, or method reference, is a reference to a method. It is used when a method is called whose definition lies in another module or assembly.  A MethodRef shall be resolved by the VES into a Method before the method is called at runtime.  If a matching Method cannot be found, the VES shall throw a System.MissingMethodException.  See Chapter 21.23.

14.1.4      Method Implementations

A MethodImpl, or method implementation, supplies the executable body for an existing virtual method.  It associates a Method (representing the body) with a MethodDecl or Method (representing the virtual method).  A MethodImpl is used to provide an implementation for an inherited virtual method