Components of XML Document – XML enable you to store structured data such that different type of devices can recognize it. You need to organize data before storing it in a XML document. This involves arranging the data in a hierarchy. The components of an XML document that can be used for representing data in a hierarchical order are:
- Processing Instruction(PI)
An XML document usually begins with the XML declaration statement called the processing Instruction (PI). The PI statement provides information on how the XML file should be processed. The PI statement can be written as:
<? Version = “1.0” encoding =”UTF-8”?>
The preceding statement indicates that XML version 1.0 is used. The PI statement is optional. However, as new version of XML is released, this tag becomes important as it explicitly declares the version of XML being used in the file. This statement enables forward compatibility. The PI statement must be written using lowercase letters.
The PI statement uses the encoding property to specify the encoding scheme used to create the XML file. The encoding scheme is the slandered character set is used to create pages in English.UTF stands for UCS (Universal Character set) Transformation format. This character set uses eight bits of information to represent each character.
UTF-8 supports the use of characters 0-127, and is compatible with ASCII-based computing systems. if an application uses characters from other languages, such as Japanese, Katana and Cyrillic, you need to set the encoding property to UTF-16.The UTF-16 character set uses 16 bits to store a character. A browser uses the encoding information to interpret the data in an XML document. The data may not be properly displayed if this information is not provided.
Tags are used to specify a name for a given piece of information. It is a means of identifying data. Data is marked up using tags. A tag consists of opening and closing angular brackets (<>) that enclosed the name of the tag. Tags usually occur in pairs. Each pair consists of a start tag and an end tag.
The start tag contains the name of the tag and the end tag includes a forward slash (/) before the name of the tag.
<p> David shaw </p>
In the preceding code snippet, <p> and </p> is a predefined HTML tag or markup enclosing the name. Here, <p> is start tag and </p>is end tag. As XML allows you to create your own tags, the same information can be stored as:
<EMP_NAME> David shaw </EMP_NAME>
In this code snippet, <EMP_NAME> is a new tag created in XML to store the name of the employee.
Elements are the basic units used to identify and describe data in XML. They are the building blocks of an XML document. Elements are represented using tags. XML allows you to give meaningful names to elements. This helps improve the readability of the code and enables easy identification of the element content. Consider the following example:
In this example, the element name, Authorname provides a description of the content within the tags. The Authorname element is represented by using the <Authorname> and </Authorname> tags, and the name of the author is enclosed within these tags.
<FIRSTNAME> john </FIRSTNAME>
In the preceding example, the AUTHORS element contains all other elements in the XML document, and is the root element. All other elements must be embedded within the opening and closing tags of the root.
Content refers to the information represented by the elements of an XML document.
Consider the following example:
<BOOKNAME> The painted House</BOOKNAME>
In the preceding example, the name of the book, The Painted House, is the content of the BOOKNAME element.
XML enables you to declare and use elements that can contain different types of information. An element can contain:
- Character of the data content
- Element content
- Combination of mixed content
Character or data content
Elements can contain only textual information. Consider the following example:
<BOOKNAME>The painted House</BOOKNAME>
In the preceding example, the BOOKNAME element contains only textual information, and is, therefore, said to have character or data contain.
XML Element Content
Elements can contain other elements. The elements contain in another element s are called child elements, and the containing elements is called parent element. A parent element can contain many child elements. All the child elements of a parent element are sibling and are related to one another. Consider the following example:
<FNAME> JOHN </FNAME>
<LNAME> SMITH </LNAME>
In the preceding example, the AUTHOR element has two child elements, FNAME and LNAME.
Therefore, it is said to have element content.
Combination of mixed content
Elements can contain textual information as well as other elements, as illustrated in the following example:
The product is available in four colors.
<COLOR> RED </COLOR>
<COLOR> BLUE </COLOR>
In the preceding example, the PRODUCTDESCRIPTION element contains textual information as well as the COLOR element. Therefore, it is said to have combination or mixed content.
Attributes provide additional information about the elements for which they are declared. An attribute consists of name-value pair. Consider the following example:
<PRODUCTNAME PRODID =” P001”> Barbie Doll</PPODUCTSTORENAME>
In the preceding example, the PRODUCTNAME element has an attribute called PRODID, whose value is set to P001. The attribute name and value are specified within the opening tag of the PRODUCTNAME element.
Elements can have one or more attributes. You must decide whether a specific piece of information is represented as an element or an attribute when you create the structure of an XML document .In general, an element is used to represent a definable unit and an attribute is used to represent data that further qualifies the element. For example an element called font can have an attribute called color to specify the font color .In this case, the color attribute further qualifies the font element .Therefore, color is represented as attribute and not as an element.
You may decide to represent information using elements or attributes. There are no rules governing this decision. However, you can consider the following guidelines while deciding whether to represent information by using an element or an attribute:
- If the data must be displayed, you can represent it as an element. In general, element attributes are used for intangible, abstract properties, such as ID.
- If the data must be frequently updated, it is better represented as an element because it is easier to edit elements than attributes with XML editing tools. For example, if the quantity of hand of a product needs to be frequently updated, you can represent the quantity on hand by using an element.
- If the value of a piece of information must be frequently checked, it can be represented as an attribute. This is because an XML processor can check the content of an element. For example, you can represent product ID and category as attributes if you often retrieve details about products based on their IDs and categories.
An entity is a name that is associated with a block of data, such as a chunk of text or a reference to an external file that contains textual or binary information. It is a set of information that can be used by specifying a single name.
Certain characters, such as < and & cannot be used in XML documents because they have a special meaning .For example , the < symbol is used as a delimiter for tags. XML provides predefined entities, called internal entities, to enable you to express such characters in an XML document.
An internal entity consists of a name that is associated with a block of information. The name of the internal entity is always proceeded by ampersand(&) and terminated with a semicolon.
Some of the predefined internal entities that from a part of the XML specification are listed in the following table.
|<||Used to display the less than(<) symbol.|
|>||Used to display the greater than(>) symbol.|
|&||Used to display the ampersand (&) symbol.|
|"||Used to display the double quote(“) symbol.|
Internal entities are replaced by the symbols that they represented, when used in an XML document. Consider the following code snippet:
<DISPLAY> The price of this toy is < 200 </DISPLAY>
In the preceding code snippet, when the XML file is opened in the browser, the internal entity is replaced with the less than (<) symbol.
Comments are statements used to explain the XML code. They are used to provide documentation information about the XML file or the application to which the file belongs. The parser ignores comment entries during code execution.
Comments are not essential in an XML file. However, it is good programming practice to include comments along with the code. This helps you easily understand the code.
Comments are created using an operating angular bracket followed by an exclamation mark and two hyphens(<!–). This is followed by the text that comprises the comment.
Comments are closed using two hyphens followed by a closing angular bracket (–>). The following example illustrates the use of a comment in an XML document.
<!—PRODUCTDATA is the root element – – >
The text contained within a comment entry cannot have two consecutive hyphens, as given in the following example:
<!—PRODUCTDATA is the — root element – – >