Skip to content
Transformations

Array Splitting

Array splitting is what you do if you have a list of values in a single cell delimited by a character (a comma, semi-colon, etc.), and want each of those delimited values in a separate row.

The following example will take a table like

idname
1first,second,third
2fourth,fifth,sixth

and produce a table like

globalIdglobalPosglobalVal
11first
12second
13third
21fourth
22fifth
23sixth

Use the sample table shown above and import it into your Storage. Then set the input and output mapping for your transformation.

Screenshot - Input/Output mapping

The following script will take each row of the source table, and split the column whose name is specified in the splitCol variable by the character specified in the splitChar variable. The resulting values will be put together with their ID specified in the idCol variable, and they will also be assigned a new sequential ID in the globalPos column.

splitChar = ','
splitCol = 'name'
idCol = 'id'
data <- read.csv(file = "in/tables/array-source.csv")
data <- data.frame(data)
# helper function to generate positions for each array
genPos = function(array){
array <- getVals(array)
return(match(array,array))
}
# helper function to grab all the individual values
getVals = function(array){
array <- toString(array)
array <- unlist(strsplit(array, splitChar, fixed = FALSE))
array <- array[array != ""]
return(array)
}
# rerun these three lines before the for loop, if not the first time
globalId <- c()
globalPos <- c()
globalVal <- c()
# parse through all data and do the actual formatting
for (i in 1:length(data[[splitCol]])) {
posi <- genPos(data[[splitCol]][i])
idi <- c(rep(data[[idCol]][i], length(posi)))
vali <- getVals(data[[splitCol]][i])
globalId <- c(globalId, idi)
globalPos <- c(globalPos, posi)
globalVal <- c(globalVal, vali)
}
processedData <- data.frame(cbind(globalId, globalPos, globalVal))
write.csv(processedData, file = "out/tables/split-values.csv", row.names = FALSE)

When you run the transformation, you will obtain a table with three columns:

  • globalId — ID of each row in the original table (the name of the id column is in the idCol variable),
  • globalPos — sequential ID of each value within the original row, and
  • globalVal — each value that has been split out of the original array column (the name of that column is specified in the splitCol variable).
Ask Kai

Ask anything about Keboola — I'll search the docs and cite the pages I use.