Reality provides five DataBasic subroutines to extract data from XML documents. You must:
You extract data from XML documents by running queries against them. There are three ways of doing this: running a single query against a single XML document, running multiple queries against the same XML document and running the same query against multiple XML documents.
The subroutine XML.QUERY allows you to execute a single query against an XML document. You must provide strings containing the XML document and the query, and variables in which to return:
For example, if you have copied your XML document to the variable XMLDOC and your query to the variable XMLQ, you would call XML.QUERY as follows:
CALL XML.QUERY(EMSG, ANS, XMLDOC, XMLQ)
On return, ANS will contain the data extracted from the XML document, and EMSG will normally be an empty string (if an error occurred, EMSG would contain an appropriate error message).
If you have a large XML document and want to make multiple queries, it can be inefficient to use XML.QUERY. Instead, you can parse the XML document once and then run each query against the parsed document. This is done as follows:
For example, with your XML document in the variable XMLDOC and queries in the variables XMLQ1, XMLQ2 and XMLQ3:
CALL XML.INIT(RESULT, HANDLE) * Get a reference to an XML parser CALL XML.PARSE(RESULT, HANDLE, XMLDOC) * Parse the XML document CALL XML.PARSE.QUERY(RESULT, HANDLE, XMLQ1) * Parse query 1 CALL XML.EXTRACT(ANS1, HANDLE) * Get the result. CALL XML.PARSE.QUERY(RESULT, HANDLE, XMLQ2) * Parse query 2 CALL XML.EXTRACT(ANS2, HANDLE) * Get the result. CALL XML.PARSE.QUERY(RESULT, HANDLE, XMLQ3) * Parse query 3 CALL XML.EXTRACT(ANS3, HANDLE) * Get the result.
On completion, the results of the three queries will be available in ANS1, ANS2 and ANS3.
If you wish, you can parse an XML query once and then run the same query against different XML documents:
An XML query must have a similar structure to the XML document you will run the query against. For instance, if you have the following XML document:
<criminals> <record> <name>blogs</name> <aliases> <alias>joe</alias> <alias>jimmy</alias> </aliases> </record> <record> <name>smith</name> <aliases> <alias>fred</alias> <alias>freda</alias> </aliases> </record> </criminals>
and want to find the aliases used by a person with the name "blogs", you would need the following query template:
<criminals> <record> <name>blogs</name> <aliases> <alias>%1%</alias> </aliases> </record> </criminals>
This extracts the contents of <alias> tags that are children of <aliases> tags that follow <name> tags with the value "blogs" (note that there will only be a match if the <name> and <aliases> tags are at the correct place in the document structure). The sequence "%1%" marks the data to be extracted and specifies that it should be placed in attribute 1 of the result. If there is more than one match for a particular data item, the result attribute is multi-valued using value marks. In the case of the document and query shown above, the result will be "joe]jimmy".
You can also extract the values of XML attributes. For example, the following query extracts the name of the service using port 5001. The data will be placed in attribute 2 of the result:
<server> <services> <service port="5001" name="%2%" /> </services> </server>
Note: Fixed, defining attributes
must appear in the query line before attributes containing extraction sequences.
For example, in the above port="5001"
appears before
name="%2%"
.
One way of writing a query is to start with a typical XML document and remove the tags and attributes that are not needed. Then mark the data you require with %attributeNumber%.
If required, you can change the way in which the query is processed and the
results presented. This is done by including the attribute __nis_cont
in the query tag from which you want the change to take effect. For example, in
the following query, the contents of the <services> tag are processed in
parallel.
<server> <services __nis_cont="P"> <service name="Time" port="%1%"/> <service name="MultiGateWay" port="%2%"/> </services> </server>
The following __nis_cont
options are available:
P Process this and lower levels in parallel. This option can be combined with the T option.
S Revert to sequential mode processing.
T Tabulate the results by the including additional value marks where necessary.
D Diagnostic mode. This option can be combined with any of the others.
Queries are normally parsed sequentially. The nodes in the document are searched until one matches the current query node; then the query moves to the next node and the search continues. Therefore, for a match to occur, the nodes in the document must appear in the same order as in the query. The only exception to this is when the query node has no sibling nodes; the query node is then allowed to match multiple document nodes at the same level.
In parallel mode, each node in the XML document is compared with every node at the same level in the query. The document nodes do not therefore have to appear in the same order as in the query (see Example 5).
In result tabulation mode, the results are returned in a form that is suitable for display in a table. This is done by including additional value marks as placeholders for any missing values (cf. Example 6 and Example 7).
Diagnostic mode allows you to see how your XML document is processed against your query. For example, if you run this query
<criminals> <record __nis_cont="D"> <name>blogs</name> <aliases> <alias>%1%</alias> </aliases> </record> </criminals>
against the XML document shown in Writing an XML Query, the following will be displayed:
Looking for node <name> OK Looking for text node 'blogs' OK Looking for node <aliases> OK Looking for node <alias> OK Looking for text node '%1%' Set param 1 to 'joe' Looking for node <alias> OK Looking for text node '%1%' Set param 1 to 'jimmy' Looking for node <record __nis_cont="D"> OK Looking for node <name> OK Looking for text node 'blogs' got text 'smith' Looking for node <name> got node <aliases>
The topic XML Extraction Examples shows various types of XML query.