"If you open an empty cupboard and don't find an elephant inside, are the contents of the cupboard different from opening it and not finding a bicycle? " Andrew Layman, commenting on reification of NULLs.
Web services are all about data exchange between heterogenous applications. This data exchange cannot be accomplished without a common, agreed upon type system that provides standard types as well as the ability to define your own types. This chapter is designed to first get you up and running with XSD, the Web services type system, then show you how XSD is used to specify message formats and validate data exchanged between client and service. This is not intended as a rigorous explanation of XSD – that would require an entire book. Rather, this chapter explains those aspects of XSD that are commonly used in Web services.
Imagine a scenario where a VB 6 application needs to invoke a Java application. The Java application is exposed as a Web service with one method called getData. You are writing the VB application and you are told that the Java method getData returns an integer. So you write your VB code to invoke the Web service using SOAP, which we’ll cover in the next chapter. When you run your application, you can invoke the Web service fine but sometimes you get runtime error 6 overflow! You get this error because a Java Integer is a 32-bit number while a VB 6 integer is a 16-bit number. What just happened was a problem due to the mismatch between the service and client type systems. To solve this problem, the client and service developer must agree on a common type system. Such type system must provide a set of built in types similar to the intrinsic types you get with any development language, like integer and double.
In addition, the type system must also provide a means for creating your own data types. This capability is analogus to creating your own classes or User Defined Types (Structures) in your favorite programming language. This is an extremely important requirement for the type system because in most cases applications will want to exchange structured data rather than simple, scalar values. For example, your VB .NET application might contain a method called SubmitInvoice that looks like this:
Public Function SubmitInvoice(ByVal theInvoice As Invoice) As Integer
...
End Function
Where the Invoice type is a class that you created. As you learned in chapter 1, you can easily expose this as a Web method by adding the WebMethod attribute to it. Before a client can invoke this method, it needs to know what exactly this Invoice class looks like. What properties does the Invoice have? Because the client can be written in any language, it needs a language independent definition of the Invoice class. This is why the Web services type system must allow you to extend it by creating your own types. In this case, you would define a new type that represents your Invoice class and the Web service clients would read and understand this type definition in order to invoke your Web service.
On May 2nd 2001, the W3C finalized a standard for an XML-based type system known as XML Schema. The language used to define a schema is an XML grammar known as XML Schema Definition language (XSD). Web services use XML as the underlying format for representing messages and data, this made XSD a natural choice as the Web services type system.
The W3C XML Schema standard is logically two things: A set of predefined or built-in types such as int and string and an XML language for defining new types and for describing the structure of a class of XML documents like an invoice or a purchase order. To help you understand the role of XSD in the world of XML, consider this analogy to your favorite programming language (figure 2-1): The programming language you use to define application classes (e.g. VB 6 or VB .NET) is analogus to XSD. When you define a class named CCustomer, this would correspond to a type definition for a CCustomer type in XSD. Finally, when you instantiate an object from the CCustomer class, this object instance is analogus to an XML document that represents a customer, i.e. an XML instance document.

Figure 2‑2‑1
Analogy between XSD and a programming language.
Just to show you a quick example, assume we have the following Customer class:
'In VB
Public Class Customer
Public CustomerID As Integer
Public ContactName As String
Public Telephone As String
End Class
The corresponding XSD type definition is:
<complexType name="CustomerType">
<sequence>
<element name="CustomerID" type="int" />
<element name="ContactName" type="string" />
<element name="Telephone" type="string" />
</sequence>
</complexType>
This schema defines a new type called CustomerType. This type contains three XML elements: CustomerID, ContactName, and Telephone in that sequence. Notice the use of the XSD element <complexType> to define the CustomerType type. Also note that int and string are both XSD built-in types.
If you have an XML document that claims to adhere to a specific schema, you can use a validating parser to validate the document against that schema. If the document really does adhere to the schema, it is said to be a valid XML document. This ability to validate a document against a schema means that schemas are not only for specifying the structure of an XML document or a message, but also for enforcing that structure through validation. Later in this chapter you will see an example of validating an invoice document against the invoice schema.
In this section, I’ll explain the essential concepts of the XSD type system to prepare you for creating your own schemas. XSD’s type system is made up of simple and complex types. Simple types represent non-structured or scalar values. For example, int, date, and string are all simple types. Complex types represent structured data as you saw earlier in the Customer class example. A VB class or UDT (structure), can be represented as an element that contains child elements or attributes. Such an element would be of a complex type because it represents a non-scalar type. In general, XML elements may have attributes and may contain simple text and/or child elements. If an element has attributes and/or contains child elements it is of a complex type. However, an element that has no attributes and contains only text is of a simple type because it is representing a scalar value. XML attributes, on the other hand, always contain just text, so they always contain scalar values and are of simple types. To help clarify the difference between simple and complex types, consider the following example XML document:
<examples>
<!-- this element is of a complex type because it has attributes-->
<example anAttrib="7">some text</example>
<!-- this element is of a complex type because it has child elements -->
<example>
<elem1>some text</elem1>
<elem2>more text</elem2>
</example>
<!-- this element is of a simple type -->
<example>some text content</example>
</examples>
The XSD type system is similar to the .NET type system in that every type derives from some other base type. In .NET, System.Object is at the root of the type system. Similarly, the XSD type system is a hierarchy of types with the built-in type called anyType at its root. Every type, whether built-in or user-defined, derives from some other type. Therefore the type hierarchy of all built-in types forms a tree as shown in figure 2-2. This hierarchy diagram is taken from the W3C Recommendation XML Schema Part 2: Datatypes (http://www.w3.org/TR/xmlschema-2/).

Figure 2‑2‑2
The type hierarchy for built-in types. Note that anyType is at the root of this type hierarchy. All built-in types are simple types based (directly or indirectly) on anySimpleType.
Copyright © May, 2, 2001, World Wide Web Consortium, (Massachusetts Institute of Technology, Institut National de Recherche en Informatique et en Automatique, Keio University). All Rights Reserved. http://www.w3.org/Consortium/Legal/.
In figure 2-2, types that derive directly from anySimpleType are known as primitive types. All other types are called derived types. There are three ways to derive from a base type: restriction, extension, and list. Derivation by restriction is when a type derives from a base type to restrict the base type’s definition. For example, the type int derives from long by restricting permissible values to the range -2147483648 to 2147483647. The NMTOKENS type derives from NMTOKEN by list, this means values of type NMTOKENS are simply a list of space-delimited values of type NMTOKEN. The built in types do not contain any types derived by extension. However, in practice, you might need to define a new type that extends the base type. You will see examples of type derivation later in this chapter.
When building and using Web services, you will likely need to create new XSD schemas and/or read and understand existing ones. You might also occasionally need to tweak an XSD schema that was auto-generated by your development tool. This section builds on the concepts of the XSD type system to show you how to create XSD schemas.
There are typically two steps to creating XSD schemas. Defining new data types and declaring elements and attributes using the types you defined as well as the built-in types. The order of these steps doesn’t matter, i.e. you could declare elements first or define types first or mix the two.
When you declare an element you specify the element’s name and its type. For example, if your XML document contains an element called quantity that is an integer, you would declare it like this:
<element name="quantity" type="int"/>
This is saying that the XML document must contain exactly one element called quantity. You can use the minOccurs and maxOccurs attributes to change the number of occurrences of the <quantity> element. The default is minOccurs=maxOccurs=1. For example, to indicate that <quantity> is optional (may or may not be in the XML document):
<element name="quantity" type="int" minOccurs="0"/>
Or, if the <quantity> element has to appear between 1 and 5 times:
<element name="quantity" type="int" minOccurs="1" maxOccurs="5"/>
In many cases, you don’t want to put an upper bound on the number of occurrences of an element, in this case you can use maxOccurs="unbounded".
<element name="quantity" type="int" minOccurs="1" maxOccurs="unbounded"/>
The element’s type can be one of the built-in types or a type that you define. In many cases, the built-in types will not meet your needs and you will need to define new simple or complex types.
Although there are many built-in simple types, you will often need to define new simple types. For example, if you have the method:
Public Sub GetCustomerByName (ByVal custName As String)
Where, based on the application’s database schema, custName is a string with a maximum length of 50 characters. In this case it is not sufficient to use the built-in XSD string as the data type for custName because that says nothing about its maximum length. Instead, you define a new type that derives from string and restricts its maximum length to 50:
<simpleType name="LimitedLenString">
<restriction base="string">
<maxLength value="50"/>
</restriction>
</simpleType>
The <restriction> element indicates that this simple type derives from string and restricts it. The <restriction> element contains one or more child elements called facets, which are used to restrict the base type. In this case, we use the maxLength facet to specify a maximum length of 50. You could also indicate the minimum length:
<simpleType name="LimitedLenString">
<restriction base="string">
<maxLength value="50"/>
<minLength value="2"/>
</restriction>
</simpleType>
Another useful facet is the enumeration facet. Consider the case where you want to represent the following enumerated type in XSD:
Public Enum OrderStatus
Pending = 3
Processed = 4
Shipped = 5
End Enum
The XSD equivalent would be:
<simpleType name="OrderStatus">
<restriction base="int">
<enumeration value="3"/>
<enumeration value="4"/>
<enumeration value="5"/>
</restriction>
</simpleType>
But the problem with this type definition is that it exposes the underlying enumerated values rather than their names. To expose the names instead of the actual values, you can create a new simple type that derives from string instead of int and use the enum names: pending, processed, and shipped as the values of the enumeration facets:
<simpleType name="OrderStatus">
<restriction base="string">
<enumeration value="Pending"/>
<enumeration value="Processed"/>
<enumeration value="Shipped"/>
</restriction>
</simpleType>
An extremely useful facet is the pattern, which lets you use regular expressions to specify a pattern that values must follow. For example, if you want to create a type that limits values to a valid US zip code, you can use the pattern facet with this pattern:
<simpleType name="zipType">
<restriction base="string">
<pattern value="\d{5}(-\d{4})?"/>
</restriction>
</simpleType>
Here we use the pattern "\d{5}(-\d{4})?" to indicate that the string must contain 5 digits followed by an optional dash and 4 digits.
Table 2-1 shows a list of available facets and a brief description of each facet. You can find more information on facets in the .NET documentation under the topic Data Type Facets (http://msdn.microsoft.com/library/en-us/cpgenref/html/xsdrefdatatypefacets.asp) or in the W3C XML Schema Part 2: Datatypes (http://www.w3.org/TR/xmlschema-2/).
Table 2‑1
List of XSD facets that may be applied to simple types.
|
Facet |
Description |
|
enumeration |
Specifies a set of allowable values. |
|
fractionDigits |
The maximum number of digits in the fractional part. Applies to datatypes derived from decimal. (Therefore it does not apply to double nor float). |
|
length |
The fixed length of values of this type. Note that the unit of length depends on the base type: For strings, the unit is characters. For hexBinary and base64Binary, the unit is an octet (a byte). Finally for types derived by list, length is the number of items in that list. |
|
maxExclusive |
Is the exclusive allowable upper bound, i.e. values must be less than this upper bound. |
|
maxInclusive |
Is the inclusive allowable upper bound, i.e. values must be less than or equal to this upper bound. |
|
maxLength |
The maximum allowable length. The unit of maxLength depends on the datatype, see length above. |
|
minExclusive |
Is the exclusive allowable lower bound, i.e. values must be greater than this lower bound. |
|
minInclusive |
Is the inclusive allowable lower bound, i.e. values must be greater than or equal to this lower bound. |
|
minLength |
The minimum allowable length. The unit of minLength depends on the datatype, see length above. |
|
pattern |
A regular expression that restricts values to the specified pattern. |
|
totaDigits |
The maximum number of digits. Applies to types derived from decimal. |
|
whiteSpace |
Must be preserve, replace, or collapse. Specifies how white space (tab, carriage return, line feed, and space) in string types is treated. preserve indicates that all white space should be left as is. replace indicates that each tab, carriage return, and line feed is replaced by a space. collapse indicates that first white space is replaced as in replace, then contiguous spaces are replaced by one space and leading and trailing spaces are removed. |
Complex types are used to represent classes, structures, arrays, and other data structures. To define a complex type, you use the <complexType> XSD element. For example, assume you want to create a type definition for the following XML fragment:
<!-- an example XML fragment -->
<example>
<elem1>some text</elem1>
<elem2>more text</elem2>
</example>
The corresponding type definition would begin with the <complexType> XSD element which has a name attribute that indicates the type’s name. You then specify the content model for this new type. For example, the type definition below uses the XSD <sequence> element to indicate that elements of this type must contain a sequence of <elem1> followed by <elem2>. All elements declared within the <sequence> element must appear in the same order in which they are declared. In this example, if the order of <elem1> and <elem2> is reversed, the XML fragment would be considered invalid.
<!-- the corresponding type definition -->
<complexType name="exampleType">
<sequence>
<element name="elem1" type="string"/>
<element name="elem2" type="string"/>
</sequence>
</complexType>
The XSD <sequence> element defines what is known as an element model group. Basically, you are saying “the following group of elements must appear in this sequence”. Besides <sequence>, you can also use <choice> and <all> to define element model groups. <choice> indicates that only one of the elements may appear in the XML document. <all> indicates that either 0 or 1 instance of each of the elements may appear in the XML document in any order.
If an element declaration appears in more than one place in your schema, you can declare the element as a global element, that is make its declaration a direct child of the <schema> element. You can then reference this declaration by name within a type definition:
<schema ...>
<!-- global declaration -->
<element name="AnElement" type="string"/>
<complexType name="someType">
<sequence>
<element name="AnotherElement" type="int"/>
<!-- local declaration references
the global declaration by name -->
<element ref="AnElement" maxOccurs="4"/>
</sequence>
</complexType>
</schema>
Note that you cannot use minOccurs or maxOccurs on a global element declaration; instead, you use them on the local declaration that references the global declaration.
Attributes are declared as part of the complex type definition after the model group. For example, assume we modify the previous XML fragment by adding two attributes to the example element:
<!-- an example XML fragment -->
<example quant="3" size="big" >
<elem1>some text</elem1>
<elem2>more text</elem2>
</example>
We would need to modify the XSD type definition to declare the two attributes:
<!-- the corresponding type definition -->
<complexType name="exampleType">
<sequence>
<element name="elem1" type="string"/>
<element name="elem2" type="string"/>
</sequence>
<attribute name="quant" type="int" use="required"/>
<attribute name="size" type="string" use="optional"/>
</complexType>
For each attribute, we add an <attribute> element and specify the attribute’s name, data type, and its use. The allowable attribute use values are optional, required, or prohibited. By default, attributes are optional. Specifying prohibited means the attribute should not appear in an XML document. This is useful if you are creating a new version of the schema where some older attributes are no longer supported and you want to cause all older documents to be invalidated according to your new schema. You can also specify a default value using the default attribute. For example, the following attribute declaration states that if the size attribute is missing, its value defaults to “small”:
<attribute name="size" type="string" use="optional" value="small"/>
Up to this point, I have focused on type definitions for complex types that contain only child elements. Remember that a type is complex if it has attributes and/or contains child elements. If a complex type contains child elements, it is said to have complex content. However, if a complex type has attributes but contains only text, it is said to be a complex type with simple content. For example, the following element <example> is of a complex type with simple content:
<example size="Large">test</example>
The corresponding complex type definition would use the <simpleContent> child of <complexType> like this:
<complexType name="exampleType">
<simpleContent>
<extension base="string">
<attribute name="size" type="string" use="required"/>
</extension>
</simpleContent>
</complexType>
This type definition is saying that elements of the exampleType will contain only string values (no child elements) and will have an attribute called size of type string.
An XML element is said to have mixed content if it contains text as well as child elements. The <examples> element in the following XML fragment has mixed content:
<examples>
This is text content
<elem1>some text</elem1>
<elem2>some text</elem2>
</examples>
To represent mixed content in an XSD complex type definition, you set the mixed attribute to true (it is false by default):
<!-- the corresponding type definition -->
<complexType name="exampleType" mixed="true">
<sequence>
<element name="elem1" type="string"/>
<element name="elem2" type="string"/>
</sequence>
</complexType>
Mixed content is commonly used for document publishing applications but has little use in data processing applications. In a document publishing markup language, such as HTML, element tags are used to format text. For example, HTML defines the paragraph tag (element) <p> which may contain text mixed with other markup tags such as the bold tag <b>:
<p>This is an HTML paragraph with some <b>BOLD</b> text</p>
However, in data processing applications, each element represents a data field or a member of a class so it makes no sense for such an element to have a value (text content) and at the same time contain child elements. To illustrate this, consider again the example customer class repeated here with an added member called Numbers:
'the Customer class
Public Class Customer
Public CustomerID As Integer
Public ContactName As String
Public Telephone As String
Public Numbers() As Integer
End Class
Assuming you have an object instance of the above class and you want to serialize it to XML, you might end up with an XML fragment like this:
<ACustomer>
<CustomerID>1234</CustomerID>
<ContactName>John Smith</ContactName>
<Telephone>123-123-7183</Telephone>
<Numbers>
<item>4</item>
<item>7</item>
<item>2</item>
</Numbers>
</ACustomer>
Each of CustomerID, ContactName and Telephone is represented by an XML element that contains only text: These elements are all of simple types. The Numbers array is represented by a <Numbers> element that contains only child elements, i.e. the <Numbers> element does not itself contain text. Each item in the array is represented by an <item> element that has only text content. Therefore, members that contain simple values are represented by elements that contain only text, while members that contain arrays (or collections) are represented by elements that contain only child elements. As you can see, there is no place for mixed content in this scenario. While this is a simple example, the point it makes holds true for more complex scenarios: In general, mixed content models are not (or at least should not be) used in data processing applications.
An element doesn’t have to have content, it may be empty. An empty element is an element that has no text content and no child elements. Would such an element be of complex or simple type? It depends; if the element has attributes it is of a complex type. Note that attributes are not considered part of an element’s content, therefore whether an element has attributes has nothing to do with whether the element is empty: An empty element may or may not have attributes. Here’s an example empty element and the corresponding type definition:
<!-- This is an empty element -->
<emptyElem attrib1="some string value" attrib2="50.02"/>
<!-- This is the corresponding type definition -->
<complexType name="emptyType">
<attribute name="attrib1" type="string"/>
<attribute name="attrib2" type="float"/>
</complexType>
Note that this complex type definition has no model group therefore elements of this type must be empty.
In all the previous examples, each defined type had its own unique name. However, it is not always necessary to name types. If you include the type definition as part of an element declaration, then you do not need to name that type. For example, if you have an element called <zipCode> that contains a U.S. zip code, you could combine the element declaration and the corresponding type definition:
<element name="zipCode">
<simpleType name="zipType">
<restriction base="string">
<pattern value="\d{5}(-\d{4})?"/>
</restriction>
</simpleType>
</element>
Note that the element declaration does not have a type attribute (i.e. no type="" in <element>) and the type definition does not have a name attribute. This is an anonymous type definition and is quite common when there’s only one element declaration that needs to use that type.
Complex types may also be defined anonymously in the same way: