PDA

View Full Version : XML


Tjobbe
December 10th, 2004, 14:55
should this go here? I'm not even 100% sure what XML is. ive started a w3schools xml tutorials just now but wondered if anyone ehre can point me in the right direction ot some more tutorials/resources that would be a good starting point!

Thanks in advance..

the_pm
December 10th, 2004, 15:37
Yeah, considering the nature of XML takes it well beyond Web applications, I'd say it's perfectly suited for programming.

But I don't know any resources offhand. I'll bet BigBison does... :)

BigBison
December 11th, 2004, 02:19
I'm a little misleading with all my "I was gone for years" comments, I've actually been learning this XML stuff (off and on) since it first cropped up back in the late '90s. In fact, I just dug up some of my notes from 3 years ago, it's exactly what I'm doing now. Unfortunately, this means I don't know what a good resource for beginners would be, or how to put it in beginners' terms. Here goes nothing!

It's also misleading to say "XML", which is really a group of interrelated technologies -- which may be scripting, query or definition languages. All a document has to be to qualify as XML, is be "well-formed". All these related technologies do is define, access or manipulate well-formed XML documents. Sometimes, a document must also be "valid" to a particular definition before it can be manipulated or accessed.

Books, I always recommend books. Normally I wouldn't recommend the "nutshell" series for beginners, as they're meant for reference, but the most concise book I have on this stuff (it always seems to be on top of the mess on my desktop) is O'Reilly's "XML in a Nutshell". Aside from that, I think it's best to start with a specific purpose in mind. None of this stuff has really started to sink in until lately, when I started seriously writing my own tagset.

If you haven't seen me say this elsewhere, here it is again:

"Learn XQUERY".

You may also have to learn XPATH, which is probably as good a place as any to start for an experienced programmer. Programmers will have familiarity with a data structure known as a "tree" and the various uses to which a tree may be put. The problem with many XML tutorials and implementations is the tendency to merely use XML to serialize objects, but that isn't really the right model, and is a pedestrian use of technology which is capable of so much more.

Which is not to disparage using XML for object serialization, I'm just noticing that too many authors on the subject of using XML in programming can't seem to see beyond such a use. Point: an XML document is not synonymous with an object! It is synonymous with a tree. If you have ever used tree structures in your code, you'll grasp this concept easily.

In my current project, XML has four uses. First, my data storage format is defined using RELAX NG -- Regular Language for XML, Next Generation. Second, the data is displayed using XHTML, which brings us to use three: transforming the document from the data storage format used internally, into XHTML, using XSLT. Finally, XQUERY is used to search the documents. If you're a coder you'll agree that it's more efficient to search a tree structure than an unstructured text file (HTML).

Non-coding web developers should probably think about learning XHTML, XQUERY, and XSLT now, in order to stay ahead of the game.

BigBison
December 11th, 2004, 07:33
As far as tools go, my decision to use RELAX NG in my project has led to a couple of changes. This stuff is new enough that despite the improved XML functionality in PHP 5, I decided I would be better off (albeit less comfortable) switching to Java for my current project. I use IBM hardware, so I installed the IBM version of the JDK/JRE for better performance. I'm not sure if the IBM version of the Eclipse Platform works without it or not, check it out:

http://www-106.ibm.com/developerworks/java/jdk/

The IBM edition (free, not WebSphere-related) is more fleshed-out than the basic Eclipse 3.0 setup, as it's set up for Java project development. Download from:

http://www.eclipse.org/

Once you've obtained Eclipse, you need a plugin that knows its way around XML. Unfortunately, neither the standalone nor the plugin version of Altove XMLSPY understands RELAX NG. There are two tools worth looking at, I will end up purchasing Oxygen XML editor:

http://www.oxygenxml.com/

There's another plugin (both are commercial) which does what I need, called XML Buddy:

http://www.xmlbuddy.com/

I liked the frameworks available with Oxygen better, since I'm really a beginner I need XML development to be as easy as possible, I didn't think XML Buddy offered as much, but it was close.

the_pm
December 11th, 2004, 18:23
That would be the first reference I recall seeing from you regarding XQUERY. Good info as usual, my friend.

Tjobbe
December 14th, 2004, 01:41
thanks, looks like xml is going to be a big step up from beginners php!

inimino
December 18th, 2004, 07:55
thanks, looks like xml is going to be a big step up from beginners php!

Rather than a step up, one might say it is a rather large step to the side.

While PHP is a programming language, XML is a data format. Thus there is very little overlap in the scope of these two technologies.

While PHP is concerned with manipulating data, XML is a way to store data.

XML can be thought of as a way to use the simple nature of a text file (which is simply a string) to store structured data.

An interesting thing about XML is that it is possible to parse the structure of an XML document, without knowing anything specific about what the data represents.

XML is also designed to be human-readable, in that it is stored as simple text with an understandable and intuitive syntax, and can therefore be read and editing using ordinary textual tools.

Perhaps more interesting than XML itself are it's many applications. Generic, well-specified access to structured data types has proved to have many powerful applications in the real world, and people often use "XML" to refer to this wide array of related technologies.

So the short answer to the question "what is XML" is that it is in fact a data format, and, informally, a set of related technologies.

The answer already given above by BigBison is excellent, and I was hoping this would be a more accessible and "beginner-level" answer but in fact it's probably just as opaque, but perhaps from a different angle.

Perhaps the best place to start in understanding XML would be a reading of the spec itself:
http://www.w3.org/TR/REC-xml/

Tjobbe
December 18th, 2004, 11:03
welcome, and thanks for clarifying that for me!

BigBison
December 29th, 2004, 01:11
Assume we're developing a web-enabled recipe, inventory and grocery-shopping application for meal planning. This would be a type of Content Management System, or CMS. Using XML and XQUERY instead of SQL and HTML, gives some serious advantages both in development, and usability.

Consider this well-formed XML document:

<?xml version="1.0"?>
<fruits>
<apples>5</apples>
<oranges>4</oranges>
</fruits>


Using XSLT, the above XML may be easily transformed into the following HTML:


<html>
<head>
<title>Fruit Bowl</title>
</head>
<body>
<table title="Fruits">
<tr><td>Apples</td><td>5</td></tr>
<tr><td>Oranges</td><td>4</td></tr>
</table>
</body>
</html>


How does one search for all documents containing "fruits"? The table's title could very well be "Ingredients" instead of "Fruits", in which case "apples" and "oranges" wouldn't be considered relevant in a search of the HTML. XQUERY, combined with an XML schema describing "apples" and "oranges" as valid "fruits", overcomes this limitation by searching the source XML tree.

What if one has a recipe for fruit salad and another for apple pie, and needs to know how many apples to purchase at the store? If the recipes are stored in XML with a clever DTD (or schema) it's trivial to write an XQUERY (as part of a recipe/inventory app) which returns the desired number.

Using HTML only, it would be much more difficult to parse out the desired information, compared to using a validating XML parser. One could always make an SQL database of recipes, with each ingredient and amount itemized and related to the directions.

But what if the fruit salad calls for Gala apples and the pie, Golden Delicious? Better extend the contract of whomever coded the app, to reprogram all that proprietary SQL code and middleware, and maybe the PHP interfaces as well. Unless one has used XML, in which case just extend the definition of "apples" to account for variety, by choosing a syntax:


<?xml version="1.0"?>
<fruits>
<apples>
<gala>3</gala>
<golden_delicious>2</golden_delicious>
</apples>
<oranges>4</oranges>
</fruits>


Alternately:


<?xml version="1.0"?>
<fruits>
<apples type="gala">3</apples>
<apples type="golden_delicious">2</apples>
<oranges>4</oranges>
</fruits>


XML's open and flexible, which syntax to use is a matter of personal preference. Either way, the app doesn't need to be reprogrammed, just the XQUERY. Now, a shopping list generated from a comparison between inventory and ingredients needed could be generated:


milk 2 gallons
stick cinnamon 1
ground nutmeg 1
gala apples 3
golden delicious apples 2
flour 5 lbs.
eggs 1 dozen


The units can be intelligently handled, without forcing the user to input all recipes strictly, for instance requiring a recipe to call for 12 eggs instead of a dozen. It's a little more work, but the conversions can be done, and in my opinion it's easier than it would be to implement the same thing using SQL.

The time saved can be used to add features, for example: if we're out of flour but only need 3 pounds, round up to 5 if one 5-pound sack is cheaper than three 1-pound sacks. If there's ever such a thing as an online grocer, perform a remote query even.

Deciding at a later date to allow for brown eggs vs. white eggs, etc. is just as straightforward. In fact, if one has allowed for this possibility during the design phase, it may not even require rewriting any XQUERYs. I'm ahead of myself now, some day soon I'll be able to flesh this out with more example code.

This stuff is too new to reliably compare performance between an XML-based CMS, and a traditional SQL-based CMS. In terms of designing a CMS, I have the experience to say that these new technologies show great promise in terms of efficiency of code. Beyond the hype, there is a real future for these technologies.

The key to the power of XML is its simplicity. All that's required of any document, is to be well-formed. Finally, a document may or may not need to be valid. There's no such thing as "valid XML", though. An XML document may be validated against a DOCTYPE or a schema, either of which may be expressed in a variety of ways.

For instance, an XML document may be considered "valid XHTML", but only if it declares and validates against an XHTML DOCTYPE. Validation is an optional function of XML parsers. Different parsers understand different schema languages. All XML parsers require well-formed documents.

We discuss "valid HTML" in web design all the time these days. The current generation of web browsers mostly include XML parsers. These are not validating parsers, however. The reason we stress valid code, is this is the first step in making a website look the same across browsers and operating systems.