Array splitting is what you do if you have a list of values in a single cell delimited by a character (a comma, semi-colon, etc.), and want each of those delimited values in a separate row.
The following example will take a table like
| id | name |
|---|---|
| 1 | first,second,third |
| 2 | fourth,fifth,sixth |
and produce a table like
| globalId | globalPos | globalVal |
|---|---|---|
| 1 | 1 | first |
| 1 | 2 | second |
| 1 | 3 | third |
| 2 | 1 | fourth |
| 2 | 2 | fifth |
| 2 | 3 | sixth |
Use the sample table shown above and import it into your Storage. Then set the input and output mapping for your transformation.

The following script will take each row of the source table, and split the column whose name is specified in the
splitCol variable by the character specified in the splitChar variable. The resulting values will be put
together with their ID specified in the idCol variable, and they will also be assigned a new sequential ID
in the globalPos column.
splitChar = ','
splitCol = 'name'
idCol = 'id'
data <- read.csv(file = "in/tables/array-source.csv")
data <- data.frame(data)
# helper function to generate positions for each array
genPos = function(array){
array <- getVals(array)
return(match(array,array))
}
# helper function to grab all the individual values
getVals = function(array){
array <- toString(array)
array <- unlist(strsplit(array, splitChar, fixed = FALSE))
array <- array[array != ""]
return(array)
}
# rerun these three lines before the for loop, if not the first time
globalId <- c()
globalPos <- c()
globalVal <- c()
# parse through all data and do the actual formatting
for (i in 1:length(data[[splitCol]])) {
posi <- genPos(data[[splitCol]][i])
idi <- c(rep(data[[idCol]][i], length(posi)))
vali <- getVals(data[[splitCol]][i])
globalId <- c(globalId, idi)
globalPos <- c(globalPos, posi)
globalVal <- c(globalVal, vali)
}
processedData <- data.frame(cbind(globalId, globalPos, globalVal))
write.csv(processedData, file = "out/tables/split-values.csv", row.names = FALSE)When you run the transformation, you will obtain a table with three columns:
globalId – ID of each row in the original table (the name of the id column is in the idCol variable),globalPos – sequential ID of each value within the original row, andglobalVal – each value that has been split out of the original array column (the name of that column is specified in the splitCol variable).