- Learning Pentaho Data Integration 8 CE(Third Edition)
- María Carina Roldán
- 233字
- 2025-04-04 17:49:50
Reading spreadsheets
Spreadsheets are also very common kinds of files used in Extract, Transform, and Load (ETL) processes. The PDI step for reading spreadsheets is Microsoft Excel Input. Both Excel 97-2003 (XLS) and Excel 2007 (XLSX) files are allowed. Despite the name of the step, it also allows to read Open Office (ods) files.
The main difference between this step and the steps that read plain files is that in the Microsoft Excel Input step you have the possibility to specify the name of the sheet to read. For a given sheet, you will provide the name as well as the row and column to start at.
Take into account that the row and column numbers in a sheet start at 0.
You can read more than one sheet at a time, as long as all share the same format. If you want to read all sheets in the spreadsheet, or if you don't know the name of the sheet in the file, just leave the sheets grid empty.
If you don't specify any sheet name, the button for getting fields will not work. If you want to get the fields automatically, you can configure the step with a sheet name just for this purpose. Once you are ready with the fields definition, you may safely remove the sheet name from the Sheet configuration tab.