- Learning Pentaho Data Integration 8 CE(Third Edition)
- María Carina Roldán
- 495字
- 2025-04-04 17:49:50
Understanding the PDI rowset
Transformation deal with datasets or rowsets, that is, rows of data with a predefined metadata. The metadata tells us about the structure of data, that is, the list of fields as well as their definitions. The following table describes the metadata of a PDI dataset:

These metadata concepts shouldn't be new to you. Let's look at some examples:
- Recall the Hello World Transformation. You created a dataset with a Data Grid step. In the main tab, you defined the metadata (in that case you had only one field) and in the second tab, you entered the rows with data.
- In the Transformation of projects, you defined the dataset with a CSV file input step. In the grid, you defined the metadata: one string and two dates. In this case, the data was read from a file.
In Spoon, data is presented in a tabular form, where:
- Each column represents a field.
- Each row corresponds to a given member of the dataset. All rows in a dataset share the same metadata definition, that is, all rows have the same fields in the same order.
The following screenshot is an example of this. It is the result of the preview in the Calculator step in the Transformation of projects:

Sample rowset
In this case, you have four columns representing the four fields of your rowset: project_name, start_date, end_date, and diff_dates. You also have five rows of data, one for each project.
In the preview window of a rowset, you can see the field name and the data itself. If you move the mouse cursor over a column title (or click on any value in that column) and leave it there for a second, you will see a small pop up telling you the data type of that field:

Column data type
For getting full details of the metadata, there is another option. Move the mouse cursor over the first of the Calculator steps and press the spacebar. A window named Step fields and their origin will appear:

Step fields and their origin
Alternatively, you can open this window from the contextual menu available in the mouseover assistance toolbar, or by right-clicking on the step. In the menu, you have to select the Show output fields option.
As the name of the option suggests, it describes the fields leaving the step towards the next step. If you selected Show input fields instead, you would see the metadata of the incoming data, that is, data that left the previous step.
One of the columns in these windows is Step origin. This column gives the name of the step where each field was created or modified. It's easy to compare the input fields against the output fields of a step. For example, in the Calculator step, you created the field diff_dates. This field appears in the output field of the step but not in the input list, as expected.