Inside an R transformation, pre-computed models can be used. These models of your data behaviour are great for predictions, among other things. The following are some of the reasons for using a pre-computed model inside an R transformation:
In either case, it is possible to use a pre-computed model in an R transformation using standard R save()
and load()
functions.
You can also download a sample package
for local development.
To show you how it works, let’s use an example in which we have a cashier-data
table with the following data
(full table):
number_of_items | time_spent_in_shop | … |
---|---|---|
11 | 452 | |
27 | 3006 | |
110 | 7456 | |
… |
The table contains some observed values of customers who visited the shop. Now, let’s find out how much time
a customer with 40 items in their basket will spend in the shop. Create another table (cashier-data-predict
) like the following one (full table):
number_of_items |
---|
40 |
… |
Only the second table will be used in the actual R transformation. Upload that table to your Storage.
First, it is necessary to get a file with the R model. To create and save a very simple model, use a script similar to the following one. It is supposed to be executed outside Keboola, for example, on your local machine.
After executing the script, you get the time_model.rda
binary file with a very simple model of dependency
of the time_spent_in_shop column on the number_of_items column in the data
(cashier-data.csv):
The second step is to save the model file to Keboola. For that, go to Storage – File uploads and upload the obtained file (time_model.rda
);
it should be marked as permanent and a tag must be assigned to it.
In the sample above, we decided to give the file the tag predictionModel
.
Finally, write the actual transformation. Create an R transformation, set the input and output mapping,
and add the (predictionModel
) tag to select stored files.
Important: In the transformation, you reference only the file tag, not the actual uploaded file. The rules for transforming a tag to a file are as follows:
The following sample script demonstrates the use of the pre-computed model. The lm
variable is loaded from the predictionModel
file.
The result table will be stored according to the output mapping setting and will look like this:
number_of_items | fit | lwr | upr |
---|---|---|---|
40 | 3481 | 3168 | 3795 |
… |
This contains the predicted value and lower and upper bound of the confidence interval. The predicted value was obtained from the (very simple linear) model that was created outside Keboola in the first step. This technique with binary files can also be used for other purposes as they can contain virtually any R code or data.
When attempting to run the above transformation locally, make sure to
script.R
.in.c-r-transformations.cashier-data-predict
from the input mapping, and place it inside the in/tables
subdirectory of the working directory into the cashier-data-predict.csv
file.predictionModel
tag from Storage File Uploads and place that
file inside the in/user
subdirectory of the working directory in the predictionModel
file. Make sure the
downloaded file has no extension.data.frame
inside the out/tables
subdirectory in data-predicted.csv
.