"A superb summary of the main Web technologies. It is broad and deep, giving you enough detail to get real work done. Eminently readable with excellent. Anders Møller and Michael I. Schwartzbach. "A superb summary of the main Web technologies. Eminently readable with excellent examples and touches of humour. Download An Introduction to XML and Web Technologies Download free online book chm pdf.

An Introduction To Xml And Web Technologies Ebook

Language:English, Japanese, French
Country:Korea North
Genre:Children & Youth
Published (Last):24.05.2016
ePub File Size:27.81 MB
PDF File Size:17.49 MB
Distribution:Free* [*Register to download]
Uploaded by: MISTY

An Introduction to XML and Web Technologies - free book at E-Books Directory. You can download the book or read it online. It is made freely available by its. This thoroughly class tested text and online tutorial gives a complete introduction to the essentials of the XML standard. It will teach students. An Introduction to XML and Web Technologies Anders Moller, Michael Schwartzbach. A 'total solution' of book Moller, Michael Schwartzbach for online ebook.

An attribute is a name-value pair inside the starting tag of an element. How XML is changing the Web Now that you've seen how developers can use XML to create documents with self-describing data, let's look at how people are using those documents to improve the Web. Here are a few key areas: XML simplifies data interchange. Because different organizations or even different parts of the same organization rarely standardize on a single set of tools, it can take a significant amount of work for applications to communicate.

Best of all, there's a good chance that their software vendors already provide tools to transform their database records or LDAP directories, or download orders, and so forth to and from XML. XML enables smart code. Because XML documents can be structured to identify every important piece of information as well as the relationships between the pieces , it's possible to write code that can process those XML documents without human intervention.

The fact that software vendors have spent massive amounts of time and money building XML development tools means writing that code is a relatively simple process. XML enables smart searches. Although search engines have improved steadily over the years, it's still quite common to get erroneous results from a search. If you're searching HTML pages for someone named "Chip," you might also find pages on chocolate chips, computer chips, wood chips, and lots of other useless matches.

I'll also discuss real-world uses of XML in Case studies. It goes over the basic rules of XML documents, and discusses the terminology used to describe them. Most HTML parsers will accept sloppy markup, making a guess as to what the writer of the document intended. To avoid the loosely structured mess found in the average HTML document, the creators of XML decided to enforce document structure from the beginning.

By the way, if you're not familiar with the term, a parser is a piece of code that attempts to read a document and interpret its contents. Invalid, valid, and well-formed documents There are three kinds of XML documents: Invalid documents don't follow the syntax rules defined by the XML specification. If a developer has defined rules for what the document can contain in a DTD or schema, and the document doesn't follow those rules, that document is invalid as well.

The root element An XML document must be contained in a single element.

That single element is called the root element, and it contains all the text and any other elements in the document. Notice that the document has a comment that's outside the root element; that's perfectly legal. Elements can't overlap XML elements can't overlap. End tags are required You can't leave out any end tags. In empty elements in XML documents, you can put the closing slash in the start tag.

In the example below, the heading at the top is illegal, while the one at the bottom is fine. To do the equivalent in XML, you have to give the attribute a value, and you have to enclose it in quotes. An XML declaration is recommended, but not required.

If there is one, it must be the first thing in the document. The declaration can contain up to three name-value pairs many people call them attributes, although technically they're not. The version is the version of XML used; currently this value must be 1.

The encoding is the character set used in this document. The ISO character set referenced in this declaration includes all of the characters used by most Western European languages. If no encoding is specified, the XML parser assumes that the characters are in the UTF-8 set, a Unicode standard that supports virtually every character and ideograph from the world's languages.

Other things in XML documents There are a few other things you might find in an XML document: Comments: Comments can appear anywhere in the document; they can even appear before or after the root element. A comment can't contain a double hyphen -- except at the end; with that exception, a comment can contain anything.

Most importantly, any markup inside a comment is ignored; if you want to remove a large section of an XML document, simply wrap that section in a comment. To restore the commented-out section, simply remove the comment tags.

When Cocoon is processing an XML document, it looks for processing instructions that begin with cocoon-process, then processes the XML document accordingly. The XML spec also defines five entities you can use in place of various special characters. Namespaces XML's power comes from its flexibility, the fact that you and I and millions of other people can define our own tags to describe our data.

Remember the sample XML document for a person's name and address?

XML Applications

All of those are reasonable choices, but all of them create elements with the same name. With namespaces. To use a namespace, you define a namespace prefix and map it to a particular string. In this example, the three namespace prefixes are addr, books, and mortgage.

Notice that defining a namespace for a particular element means that all of its child elements belong to the same namespace. One final point: The string in a namespace definition is just a string. Yes, these strings look like URLs, but they're not. The only thing that's important about the namespace string is that it's unique; that's why most namespace definitions look like URLs. It's confusing, but that's how namespaces work. Defining document content Overview: Defining document content So far in this tutorial you've learned about the basic rules of XML documents; that's all well and good, but you need to define the elements you're going to use to represent data.

You'll learn about two ways of doing that in this section. A DTD defines the elements that can appear in an XML document, the order in which they can appear, how they can be nested inside each other, and other basic details of XML document structure. The other method is to use an XML Schema. A schema can define all of the document structures that you can put in a DTD, and it can also define data types and more complicated rules than a DTD can.

The next couple of sections look at fragments of DTDs. All of those elements must appear, and they must appear in that order. All of the other elements contain text. Although the DTD is pretty simple, it makes it clear what combinations of elements are legal. All of the elements are required. The comma indicates a list of items. The question mark indicates that an item is optional; it can appear once or not at all.

The plus sign indicates that an item must appear at least once, but can appear any number of times. The asterisk indicates that an item can appear any number of times, including zero. Vertical bars indicate a list of choices; you can choose only one item from the list.

Also notice that this example uses parentheses to group certain elements, and it uses a question mark against the group.

A word about flexibility Before going on, a quick note about designing XML document types for flexibility. Consider the sample name and address document type; I clearly wrote it with U.

If you want a DTD or schema that defines rules for other types of addresses, you would have to add a lot more complexity to it. Finally, be aware that in many parts of the world, concepts like title, first name, and last name don't make sense. The bottom line: If you're going to define the structure of an XML document, you should put as much forethought into your DTD or schema as you would if you were designing a database schema or a data structure in an application.

The more future requirements you can foresee, the easier and cheaper it will be for you to implement them later. Defining attributes This introductory tutorial doesn't go into great detail about how DTDs work, but there's one more basic topic to cover here: defining attributes.

You can define attributes for the elements that will appear in your XML document. Thus, you can do a very limited form of data validation.

That means you can process a schema just like any other document. For example, you can write an XSLT style sheet that converts an XML schema into a Web form complete with automatically generated JavaScript code that validates the data as you enter it. XML schemas support datatypes.

Introduction to XML

While DTDs do support datatypes, it's clear those datatypes were developed from a publishing perspective. They also support integers, floating point numbers, dates, times, strings, URLs, and other datatypes useful for data processing and validation. XML schemas are extensible. In addition to the datatypes defined in the XML schema specification, you can also create your own, and you can derive new datatypes based on other datatypes.


XML schemas have more expressive power. You can't do either of those things with DTDs. Although the schema is much longer than the DTD, it expresses more clearly what a valid document looks like. Most of the elements contain text; defining them is simple. This summary only scratches the surface of what XML schemas can do; there are entire books written on the subject. For the purpose of this introduction, suffice to say that XML schemas are a very powerful and flexible way to describe what a valid XML document looks like.

These interfaces give developers a consistent interface for working with XML documents. The parser reads in the entire document and builds an in-memory tree, so your code can then use the DOM interfaces to manipulate the tree. You can move through the tree to see what the original document contained, you can delete sections of the tree, you can rearrange the tree, add new branches, and so on. If the document is very large, this requires a significant amount of memory.

The DOM creates objects that represent everything in the original document, including elements, text, attributes, and whitespace.

If you only care about a small portion of the original document, it's extremely wasteful to create all those objects that will never be used. A DOM parser has to read the entire document before your code gets control. For very large documents, this could cause a significant delay.

The parser tells you when it finds the start of an element, the end of an element, text, the start or end of the document, and so on.

You decide which events are important to you, and you decide what kind of data structures you want to create to hold the data from those events.

XML and Web Technologies for Data Sciences with R

If you don't explicitly save the data from an event, it's discarded. A SAX parser doesn't create any objects at all, it simply delivers events to your application. If you want to create objects based on those events, that's up to you. A SAX parser starts delivering events to you as soon as the parse begins. Your code will get an event when the parser finds the start of the document, when it finds the start of an element, when it finds text, and so on.

Your application starts generating results right away; you don't have to wait until the entire document has been parsed. Even better, if you're only looking for certain things in the document, your code can throw an exception once it's found what it's looking for. The exception stops the SAX parser, and your code can do whatever it needs to do with the data it has found.

The remainder of this section discusses why you might want to use one interface or the other. That event simply gives you the text that was found; it does not tell you what element contains that text. If you want to know that, you have to write the state management code yourself. SAX events are not permanent. If your application needs a data structure that models the XML document, you have to write that code yourself.

If you need to access data from a SAX event, and you didn't store that data in your code, you have to parse the document again. SAX is not controlled by a centrally managed organization. Although this has not caused a problem to date, some developers would feel more comfortable if SAX were controlled by an organization such as the W3C.

The main feature of JDOM is that it greatly reduces the amount of code you have to write. Although this introductory tutorial doesn't discuss programming topics in depth, JDOM applications are typically one-third as long as DOM applications, and about half as long as SAX applications.

DOM purists, of course, suggest that learning and using the DOM is good discipline that will pay off in the long run. JDOM doesn't do everything, but for most of the parsing you want to do, it's probably just the thing. There are also methods that allow you to control whether the underlying parser is namespace-aware and whether it uses a DTD or schema to validate the XML document.

Which interface is right for you? To determine which programming interface is right for you, you need to understand the design points of all of the interfaces, and you need to understand what your application needs to do with the XML documents you're going to process. Consider these questions to help you find the right approach. Will your application be written in Java? How will your application be deployed? If your application is going to be deployed as a Java applet, and you want to minimize the amount of downloaded code, keep in mind that SAX parsers are smaller than DOM parsers.

Once you parse the XML document, will you need to access that data many times? When a SAX event is fired, it's up to you the developer to save it somehow if you need it later.

Sign in to the Instructor Resource Centre

If you need to access an event you didn't save, you have to parse the file again. DOM saves all of the data automatically. Do you need just a few things from the XML source? SAX doesn't create objects for everything in the source document; you can decide what's important. With SAX, you can look at each event to see if it's relevant to your needs, then process it appropriately. Even better, once you've found what you're looking for, your code can throw an exception to stop the SAX parser altogether.

Are you working on a machine with very little memory? If so, SAX is your best choice, despite all the other factors that you might consider. In addition to the base XML standard, other standards define schemas, style sheets, links, Web services, security, and other important items. This section covers the most popular standards for XML, and points you to references to find other standards. Here is an example document. This type of structural validation is a core feature of XML and can easily be performed using any validating XML parser.

In addition to structural validation, it is also necessary to validate the contents of the publication logically. This ensures that elements in the DTD have been used in a consistent and correct manner e. The content validation step is particularly important when the content originates from many sources.

An exception included Microsoft PowerPoint presentations, which had to be converted to the GCAPaper DTD structure before they could be included in the conference proceedings publication. Further validation of all papers was then required to ensure they adhered to specific authoring guidelines for the DTD. The authoring guidelines accompanying the GCAPaper DTD specify the correct usage of elements in the DTD and also define naming conventions for cross-references and images used within each paper.

Validation of authoring guidelines is especially important for conference proceedings as a variety of authoring tools are used to produce papers.

Once all conference papers were received and validated, they were imported into a master document representing the conference proceedings publication. Step 3: Producing electronic publication formats from XML XML is predominantly used to define markup vocabularies for a specific application domain and in general has no default formatting styles, as is the case with HTML. Instead, stylesheets are used to associate presentational information with XML documents.If your application needs a data structure that models the XML document, you have to write that code yourself.

Once you parse the XML document, will you need to access that data many times? These steps are discussed in the following sections and are applicable to the production of any publication electronic or print with XML. This means that the XML data is actually part of the script itself and can be manipulated much in the same way that other data types such as arrays, strings, and objects, can be manipulated.

In short, E4X makes working with XML data a much more flexible process and speeds up the parsing of data since it exists as a primitive data type rather than an external resource.

HOBERT from Port St. Lucie
I do love studying docunments inwardly . Also read my other articles. I am highly influenced by chess boxing.