XML Extraction

Reality provides five DataBasic subroutines to extract data from XML documents. You must:

You extract data from XML documents by running queries against them. There are three ways of doing this: running a single query against a single XML document, running multiple queries against the same XML document and running the same query against multiple XML documents.

Running a Single Query

The subroutine XML.QUERY allows you to execute a single query against an XML document. You must provide strings containing the XML document and the query, and variables in which to return:

For example, if you have copied your XML document to the variable XMLDOC and your query to the variable XMLQ, you would call XML.QUERY as follows:

CALL XML.QUERY(EMSG, ANS, XMLDOC, XMLQ)

On return, ANS will contain the data extracted from the XML document, and EMSG will normally be an empty string (if an error occurred, EMSG would contain an appropriate error message).

Running Multiple Queries against the same XML Document

If you have a large XML document and want to make multiple queries, it can be inefficient to use XML.QUERY. Instead, you can parse the XML document once and then run each query against the parsed document. This is done as follows:

  1. Call the XML.INIT subroutine to obtain a reference to an XML parser object.
  2. Call XML.PARSE to parse your XML document.
  3. For each query, call XML.PARSE.QUERY to parse your query template and then call XML.EXTRACT to return the results.

For example, with your XML document in the variable XMLDOC and queries in the variables XMLQ1, XMLQ2 and XMLQ3:

CALL XML.INIT(RESULT, HANDLE)                * Get a reference to an XML parser
CALL XML.PARSE(RESULT, HANDLE, XMLDOC)       * Parse the XML document

CALL XML.PARSE.QUERY(RESULT, HANDLE, XMLQ1)  * Parse query 1
CALL XML.EXTRACT(ANS1, HANDLE)               * Get the result.

CALL XML.PARSE.QUERY(RESULT, HANDLE, XMLQ2)  * Parse query 2
CALL XML.EXTRACT(ANS2, HANDLE)               * Get the result.

CALL XML.PARSE.QUERY(RESULT, HANDLE, XMLQ3)  * Parse query 3
CALL XML.EXTRACT(ANS3, HANDLE)               * Get the result.

On completion, the results of the three queries will be available in ANS1, ANS2 and ANS3.

Running the same Query against Multiple XML Documents

If you wish, you can parse an XML query once and then run the same query against different XML documents:

  1. Call the XML.INIT subroutine to obtain a reference to an XML parser object.
  2. Call XML.PARSE.QUERY to parse your XML query.
  3. For each document, call XML.PARSE and then call XML.EXTRACT to return the results.

Writing an XML Query

An XML query must have a similar structure to the XML document you will run the query against. For instance, if you have the following XML document:

<criminals>
  <record>
    <name>blogs</name>
    <aliases>
      <alias>joe</alias>
      <alias>jimmy</alias>
    </aliases>
  </record>
  <record>
    <name>smith</name>
    <aliases>
      <alias>fred</alias>
      <alias>freda</alias>
    </aliases>
  </record>
</criminals>

and want to find the aliases used by a person with the name "blogs", you would need the following query template:

<criminals>
  <record>
    <name>blogs</name>
    <aliases>
      <alias>%1%</alias>
    </aliases>
  </record>
</criminals>

This extracts the contents of <alias> tags that are children of <aliases> tags that follow <name> tags with the value "blogs" (note that there will only be a match if the <name> and <aliases> tags are at the correct place in the document structure). The sequence "%1%" marks the data to be extracted and specifies that it should be placed in attribute 1 of the result. If there is more than one match for a particular data item, the result attribute is multi-valued using value marks. In the case of the document and query shown above, the result will be "joe]jimmy".

You can also extract the values of XML attributes. For example, the following query extracts the name of the service using port 5001. The data will be placed in attribute 2 of the result:

<server>
  <services>
    <service port="5001" name="%2%" />
  </services>
</server>

Note: Fixed, defining attributes must appear in the query line before attributes containing extraction sequences. For example, in the above port="5001"  appears before name="%2%".

One way of writing a query is to start with a typical XML document and remove the tags and attributes that are not needed. Then mark the data you require with %attributeNumber%.

Processing Options

If required, you can change the way in which the query is processed and the results presented. This is done by including the attribute __nis_cont  in the query tag from which you want the change to take effect. For example, in the following query, the contents of the <services> tag are processed in parallel.

<server>
   <services __nis_cont="P">
      <service name="Time" port="%1%"/>
      <service name="MultiGateWay" port="%2%"/>
   </services>
</server>

The following __nis_cont  options are available:

P Process this and lower levels in parallel. This option can be combined with the T option.

S Revert to sequential mode processing.

T Tabulate the results by the including additional value marks where necessary.

D Diagnostic mode. This option can be combined with any of the others.

Sequential Mode

Queries are normally parsed sequentially. The nodes in the document are searched until one matches the current query node; then the query moves to the next node and the search continues. Therefore, for a match to occur, the nodes in the document must appear in the same order as in the query. The only exception to this is when the query node has no sibling nodes; the query node is then allowed to match multiple document nodes at the same level.

Parallel Mode

In parallel mode, each node in the XML document is compared with every node at the same level in the query. The document nodes do not therefore have to appear in the same order as in the query (see Example 5).

Result Tabulation

In result tabulation mode, the results are returned in a form that is suitable for display in a table. This is done by including additional value marks as placeholders for any missing values (cf. Example 6 and Example 7).

Diagnostic Mode

Diagnostic mode allows you to see how your XML document is processed against your query. For example, if you run this query

<criminals>
  <record __nis_cont="D">
    <name>blogs</name>
    <aliases>
      <alias>%1%</alias>
    </aliases>
  </record>
</criminals>

against the XML document shown in Writing an XML Query, the following will be displayed:

Looking for node <name> OK
Looking for text node 'blogs' OK
Looking for node <aliases> OK
Looking for node <alias> OK
Looking for text node '%1%' Set param 1 to 'joe'
Looking for node <alias> OK
Looking for text node '%1%' Set param 1 to 'jimmy'
Looking for node <record __nis_cont="D"> OK
Looking for node <name> OK
Looking for text node 'blogs' got text 'smith'
Looking for node <name> got node <aliases>

Examples

The topic XML Extraction Examples shows various types of XML query.

Go to top button