BEGINNING XML PART 5 – PROCESSING INSTRUCTIONS (Page 1)

These allow you to enter instructions into your XML which are not part of the actual document, but which are passed up to the application.

<?xml version='1.0' encoding='UTF-16' standalone='yes'?>
<name nickname='Shiny John'>
 <first>John</first>
 <!--John lost his middle name in a fire-->
 <middle/>
 <?nameprocessor SELECT * FROM blah?>
 <last>Doe</last>
</name>

There aren’t really a lot of rules on PIs. They’re basically just a “<?”, the name of the application that is supposed to receive the PI (the PITarget), and the rest up until the ending “?>” is whatever you want the instruction to be. The PITarget is bound by the same naming rules as elements and attributes. So, in this example, the PITarget is nameprocessor, and the actual text of the PI is SELECT * FROM blah.

PIs are pretty rare, and are often frowned upon in the XML community, especially when used frivolously. But if you have a valid reason to use them, go for it. For example, PIs can be an excellent place for putting the kind of information (such as scripting code) that gets put in comments in HTML. While you can’t assume that comments will be passed on to the application, PIs always are.

Is the XML Declaration a Processing Instruction?

At first glance, you might think that the XML declaration is a PI that starts with xml. It uses the same “<? ?>” notation, and provides instructions to the parser (but not the application). So is it a PI?

Actually, no: the XML declaration isn’t a PI. But in most cases it really doesn’t make any difference whether it is or not, so feel free to look at it as one if you wish. The only places where you’ll get into trouble are the following:

  • Trying to get the text of the XML declaration from an XML parser. Some parsers erroneously treat the XML declaration as a PI, and will pass it on as if it were, but many will not. The truth is, in most cases your application will never need the information in the XML declaration; that information is only for the parser. One notable exception might be an application that wants to display an XML document to a user, in the way that we’re using IE5 to display the documents in this book.
  • Including an XML declaration somewhere other than at the beginning of an XML document. Although you can put a PI anywhere you want, an XML declaration must come at the beginning of a file.

Try It Out – Dare to be Processed

Just to see what it looks like, let’s add a processing instruction to our Weird Al XML:

1. Make the following changes to cd5.xml and save the file as cd6.xml:

<?xml version='1.0' encoding='windows-1252' standalone='yes'?>
<CD serial='B6B41B'
  disc-length='36:55'>
 <artist>"Weird Al" Yankovic</artist>
 <title>Dare to be Stupid</title>
 <genre>parody</genre>
 <date-released>1990</date-released>
 <!--date-released is the date released to CD, not to record-->
 <song>
  <title>Like A Surgeon</title>
  <length>
   <minutes>3</minutes>
   <seconds>33</seconds>
  </length>
  <parody>
   <title>Like A Virgin</title>
   <artist>Madonna</artist>
  </parody>
 </song>
 <song>
  <title>Dare to be Stupid</title>
  <length>
   <minutes>3</minutes>
   <seconds>25</seconds>
  </length>
  <parody/>
 </song>
 <?CDParser MessageBox("There are songs missing!")?>
</CD>

In IE5, it looks like this:

unnamed-file-621 BEGINNING XML PART 5 - PROCESSING INSTRUCTIONS
       (Page 1)

How It Works

For our example, we are targeting a fictional application called CDParser, and giving it the instruction MessageBox(“There are songs missing!”). The instruction we gave it has no meaning in the context of XML itself, but only to our CDParser application, so it’s up to CDParser to do something meaningful with it.

Illegal PCDATA Characters

There are some reserved characters that you can’t include in your PCDATA because they are used in XML syntax.

For example, the “<” and “&” characters:

<!--This is not well-formed XML!-->
<comparison>6 is < 7 & 7 > 6</comparison>

Viewing the above XML in IE5 would give the following error:

unnamed-file-622 BEGINNING XML PART 5 - PROCESSING INSTRUCTIONS
       (Page 1)

This means that the XML parser comes across the “<” character, and expects a tag name, instead of a space. (Even if it had got past this, the same error would have occurred at the “&” character.)

There are two ways you can get around this: escaping characters, or enclosing text in a CDATA section.

Continued…