Workspace

A workspace serves several purposes and can be used as

  • an interactive development environment (IDE) to create transformations.
  • an analytical workspace where you can interactively perform experiments and modelling with live production data.
  • an ephemeral workspace created on each run of a transformation to provide the staging area in which the transformation operates. Ephemeral transformation workspaces are not visible in the transformation UI, hence we won’t mention them further.

In legacy transformations, workspaces are called sandboxes and behave considerably differently.

Working with Workspaces

You can create a Workspace on the Transformations page:

Workspace Introduction

Select a workspace type - not all workspace types are supported in all project workspaces. Workspace types match available transformation types. For scripting languages (Python, R, and Julia), we provide JupyterLab with the matching kernel. For SQL workspaces, you can either use Snowflake workspace or a database client of your liking (for example, Dbeaver).

Workspace - Select Type

Name and optionally describe the workspace:

Workspace - Name workspace

Creating the workspace takes a few seconds, so it will take a while before it appears:

Workspace - Running

Once the workspace is ready, you’ll see it in the workspace list:

Workspace - Running

In the workspace detail, you can

  • get credentials & connect to the workspace,
  • configure workspace mappings,
  • load & unload data specified in the mappings, and
  • manage the workspace (resume & terminate & delete).

Workspace - Running

Connecting to Workspace

To connect to a JupyterLab workspace with the associated kernel (Python, R, Julia), use the URL and the password provided in the Credentials link. Use the Connect button to directly open the JupyterLab interface:

To connect to a Snowflake workspace,

  • use the Snowflake Web Interface: use the Connect link and username + password provided in the Credentials link.
  • use your favorite database client (we like DBeaver) and the credentials provided in the Credentials link.

To connect to a Synapse and Redshift workspaces, you have to use your database client (we like DBeaver) and use the database options provided in the Credentials view. If the database client does not have Synapse Driver, look for Azure SQL Server.

Workspace Lifecycle

When a workspace is created, it enters the Active state and can be used. Database (Snowflake, Redshift, and Synapse) workspaces are billed by the runtime of queries executed in them. As such, we leave them in active state until you delete them. JupyterLab workspaces are billed by their running time. To save money, workspaces can be terminated and resumed.

Workspace - States

Workspace termination can be done manually or it is done automatically after 1 hour period of inactivity. Inactivity is measured from the last save of any notebook in JupyterLab. When a workspace is terminated, it is switched off and consumes no credits. A terminated workspace can be resumed. Resuming a workspace means that we restore the last saved version of all notebooks in the home directory (/data). We also load the current data from Input Mapping.

On a restored workspace, you’ll get

  • all notebook files with their results,
  • the current version of input mapping sources (beware that these might have changed in the mean time), and a
  • new password.

Terminating and restoring a workspace means that you’ll loose

  • the contents of memory,
  • state of the notebook executions (the results are preserved however), and
  • modifications to any data or temporary files on the local workspace drive.

Note: When you terminate and restore the workspace, its password changes.

When a workspace is terminated and you return to the JupyterLab interface (e.g., when you put your computer to sleep), you’ll see the following error:

Server Connection Error
A connection to the Jupyter server could not be established. JupyterLab will continue trying to reconnect. Check your network connection or Jupyter server configuration.

If you see this error, please go to the list of workspaces in Keboola Connection, resume the workspace and reconnect from there.

Loading Data

To load arbitrary data into the workspace, configure Table Input Mapping or File Input Mapping (or both) and click the Load Data button.

Workspace - Load Data

When loading data into a workspace, you can specify entire buckets, which can be especially useful when you are not sure what tables you’ll need in your work. You can also take advantage of alias tables and prepare buckets with the tables you’ll need.

Unloading Data

You can also unload data from the workspace. To unload data, configure Table Output Mapping or File Output Mapping (or both) and click Unload Data button.

Workspace - Unload Data

Unloading data is useful, for example, when your ad-hoc analysis leads to valuable results, or when you trained a new model which you’d like to use in transformations.

Developing Transformations

Workspaces are highly useful for developing transformations. When you configure mappings and develop a script in JupyterLab, you can use the Create Transformation button to deploy the notebook into a transformation.

Workspace - Create Transformation

Enter the name of the new transformation:

Workspace - Create Transformation

You can also create workspaces from transformations.

Analytical Workspaces

Apart from developing transformations, you can use workspaces to perform ad-hoc analysis of production data of your choice. A workspace provides you with a safe and isolated environment where you can experiment. The input mapping isolation also means that you can work on live production projects without data in the workspace constantly changing — you update them on demand by loading data into the workspace.

In a private beta preview, the interactive workspaces can also be extended with MLflow and Spark integrations.