Reading XML files

For reading XML files, there is a step named Get data from XML input. In order to specify which fields to read from the file, you do two things:

  1. First, select the path that will identify the current node. This is optimally the repeating node in the file. You select the path by filling in the Loop XPath textbox in the Content tab.
  2. Then specify the fields to get. You do it by filling the grid in the Fields tab by using XPath notation. The location is relative to the path indicated in the Content tab.

The Get Data from XML step is the step that you will use for reading XML structures in most cases. However, when the data structures in your files are very big or complex, or when the file itself is very large, there is an alternative step, XML Input Stream (StAX). This step is capable of processing data very fast regardless of the file size and is also very flexible for reading complex XML structures.

Earlier in the chapter, we showed how to read a very simple XML file with the first of these two steps. In the next chapter, we will devote more time to the details about dealing with XML structures.