Running transformations in an interactive fashion

So far, you have learned some basics about working with Spoon during the design process. Now you will continue learning about interacting with the tool.

First, we will create a Transformation, aiming to learn some new useful steps. After that, we will adapt that Transformation for inspecting the data as it is being created.

As you progress, feel free to preview the data that is being generated, even if you're not told to do so. This will help you understand what is going on. Testing each step as you move forward makes it easier to debug and craft a functional Transformation.

Let's start with the creation of the Transformation. The objective is to generate a dataset with all dates in between a given range of dates:

  1. Create a new Transformation.
  2. From the Input group of steps, drag to the canvas the Generate Rows step, and configure it as shown:

Configuring a Generate Rows step

Note that you have to change the default value for the Limit textbox, from 10 to 1.

  1. Close the window.
  1. From the Transform category of steps, add the Calculator step, and create a hop that goes from the Generate Rows step to this one.
  2. Double-click on the Calculator step and add the field named diff_dates as the difference between end_date and start_date. That is, configure it exactly the same way as you did in the previous section.
  3. Run a preview. You should see a single row with three fields: the start date, the end date, and a field with the number of days between both.
  4. Now add the Clone row step. You will find it inside the Utility group of steps.
  5. Create a hop from the Calculator step towards this new step.
  6. Edit the Clone row step.
  7. Select the Nr clone in field? option to enable the Nr Clone field textbox. In this textbox, type diff_dates.
  8. Now select the Add clone num to output? option to enable the Clone num field textbox. In this textbox, type delta.
  9. Run a preview. You should see the following:

Previewing cloned rows

  1. Add another Calculator step, and create a hop from the Clone row step to this one.
  2. Edit the new step, and add the field named a_single_date. As Calculation, select Date A + B Days. As Field A, select start_date and as Field B, select delta. Finally, as a Value type, select Date. For the rest of the columns, leave the default values.
  1. Run a final preview. You should see this:

 Previewing a range of dates

If you don't obtain the same results, check carefully that you followed the steps exactly as explained. If you hit errors in the middle of the section, you know how to deal with them. Take your time, read the log, fix the errors, and resume your work.

Now you will run the Transformation and inspect the data as the Transformation is being executed. Before doing that, we will do some changes to the Transformation so it runs slow, allowing us to see in detail what is happening:

  1. Edit the Generate Rows step and change the date range. As end_date, type 2023-12-31.
  2. From the Utility group of steps, drag to the work area the Delay row step. With this step, we will deliberately delay each row of data.
  1. Drag the step to the hop between the Clone row step and the second Calculator step, until the hop changes the width:

 Inserting a step between two steps

  1. A window will appear asking you if you want to split the hop. Click on Yes. The hop will be split in two: one from the Clone row step to the Delay row step, and the second one from this step to the Calculator step.

You can configure PDI to split the hops automatically. You can do it by selecting the Don't ask again? checkbox in this same window, or by navigating to the Tools | Options... menu and checking the option Automatically split hops.

  1. Double-click on the Delay row step, and configure it using the following information: as Timeout, type 500, and in the drop-down list, select Milliseconds. Close the window.
  2. Save the Transformation and run it. You will see that it runs at a slower pace.

Now it is time to do the sniff testing, that is, looking at the rows that are coming into or out of a step in real time:

  1. Without stopping the execution, click on the second Calculator step. A pop-up window will show up describing the execution results of this step in real time. Ctrl-click two more steps: the Generate Rows step and the Clone row step. For each selected step, you will see the Step Metrics at runtime:

Runtime Step Metrics

  1. Now, let's inspect the data itself. Right-click on the second Calculator step and navigate to Sniff Test During Execution | Sniff test output rows. A window will appear showing the data as it's being generated.

In the Execution Results window, it's worth noting a column that we didn't mention before:

As you put a delay of 500 milliseconds for each row, it's reasonable to see that the speed for the last step is two rows per second.

Note that sniff testing slows down the Transformation and its use is recommended just for debugging purposes.

While the Transformation was running, you experimented with the feature for sniffing the output rows. In the same way, you could have selected the Sniff test input rows option to see the incoming rows of data.

As an alternative to run previews on inpidual steps, you can use the continuous preview mode. Instead of running a preview, you can run the Transformation and see the output in the Preview data tab of the Execution Results window.