书名：Learning Pentaho Data Integration 8 CE（Third Edition）
作者名：María Carina Roldán
本章字数：229字
更新时间：2025-04-04 17:49:50

Reading only a subset of the file

In the main tutorial, we read the full file—all the rows and all the columns. What if you only need a subset of it?

If, for any reason, you don't want to read one or more of the leading fields, as, for instance, PRICEEACH and SALES in our example, you don't have to put them in the grid. In such a case, PDI will ignore them. On the other hand, even if you don't need a field in the middle, as, for example, PRODUCTLINE, you cannot remove it from the grid, as it would cause a bad lecture of the file. Instead, you can remove the field later by using a Select values step.

Regarding the rows, there is a textbox in the Content tab named Limit, that allows you to set a maximum number of lines to read. In particular, instead of reading the first N lines, you may want to read only the rows that meet some conditions. Some steps allow you to filter the data, skip blank rows, read only the first N rows, and so on.

If the criteria for keeping or discarding rows are more elaborate, most probably you will need to use some extra steps after the input step. You will learn to do this in the next chapters.