How do you separate comma separated values in different columns while maintaining values in the rest of the row in Google Sheets? - csv

How do you adjust comma separated values in such a way that the value separated with commas is separated and that a new row is created for this value and that the other values are the same as in the row from which the value comes? That would look like this:
From this..
..to this.
I'm actually looking for an answer that doesn't use google script when possible and without using gigantic long and complex formulas. The use of a pivot table within Google sheets may be used, but is also not my preference. But if it's not possible to use only formulas then I'm open to other answers as well.
I've had this question for over a year and I can't find serious answers online after a few hours of searching. There will be answers using a google script, but that doesn't really fall within the scope of my question. I am willing to adjust or rephrase my question if the current question remains unanswered.
I myself have no idea how to answer the question and the attempts I have made are not to be taken seriously.

Lambda Update
It's 2022. We now have LAMBDA and a bunch of array functions. Thus we can combine everything into a single formula, as originally desired. The idea is still the same as before, just much cleaner. (Also FLATTEN is no longer undocumented.)
=ArrayFormula(
SPLIT(
TRANSPOSE(
SPLIT(
JOIN(
"",
BYROW(
A1:E4,
LAMBDA(row,
JOIN(
",",
REDUCE(
";",
row,
LAMBDA(cell1,cell2,
FLATTEN(FLATTEN(cell1)&","&SPLIT(cell2,","))
)
)
)
)
)
),
";,",
)
),
","
)
)
How it works
REDUCE combines all elements in an array to a single result using a function (Named Function or LAMBDA). In this case, we use the same permutation trick (combine column vector with row vector) as in the old solution to serialize every row. This gives us a column of rows, each starting with ;,
The rows are JOINed together, serializing the array.
BYROW applies a function to each row in a range, returning a single value for each row. Here, we use the process above in a LAMBDA function which gives a single serialized string for each row in the range.
We then JOIN all these together. Each row is delimited by ;,.
Split on the ;,, giving a row of serialized rows
Transpose the row to get a column
Split again on , deserializing each row.
For a much more readable solution, of course, you can name the lambdas, resulting in a formula that's cleaner still.
Draggable Formula solution (obsoleted by LAMBDA)
Let's see if I can get this ball rolling.
At a Glance
This solution is unfortunately unstable, as it relies on the Flatten undocumented function (turn any range into a column array), and requires two formulas to work. While I'm sure that you can do the same thing without Flatten(), this at least saves us some typing, as we rely on it heavily. Without flatten, we can achieve the same with TRANSPOSE(SPLIT(TEXTJOIN(...)), which is not nearly as elegant.
The core formula, while it does have a linear growth factor and can get messy with more columns, does have an easy pattern to follow for the setup. It can also be dragged, which is the next best thing to a single ArrayFormula.
Stage 1: Serialize Rows
As you might have expected, we're going to use some string serialization tricks to get what we want. Here's the core formula:
=TEXTJOIN(",",,
ArrayFormula(
Flatten(Flatten(Flatten(Flatten(Flatten(
SPLIT(A1,",")&",")&
SPLIT(B1,",")&",")&
SPLIT(C1,",")&",")&
SPLIT(D1,",")&",")&
SPLIT(E1,",")&";")
))&","
As you can see, it accounts for any commas inside each cell in the row. To add more columns, simply add another Flatten( and add your column to the list. Just make sure that the last one uses a ; and not a ,.
We take advantage of the fact that, in general, when ArrayFormula is applied to a column vector and a row vector, we can do an operation on every permutation of the two mixed together.
Examples:
=ArrayFormula({0;1}&{2,3}) is equivalent to ={"02","12";"03","13"}
=ArrayFormula(SEQUENCE(10)*SEQUENCE(1,10)) gives us a 10x10 multiplication table.
In our case, we use this to generate every possible permutation of rows based on the commas in each cell, serializes the row into a CSV string, ending each in a semicolon, then joins all the rows into one long string. The extra "," is so we can concatenate multiple tables together in the next stage.
When you're set up with the proper number of columns, drag this down to the height of the table. (Note: If some of your values can be blank, you also have to do some error checking around each SPLIT.)
Stage 2: Deserialize
This formula is considerably simpler. (Assuming serialization data is in column F.)
=ArrayFormula(
SPLIT(
TRANSPOSE(
SPLIT(
JOIN(,F:F),
";,",
)
),
","
)
)
First, glue all the strings together using JOIN. Since we know that each row ends with a ";,", we split on that to get our rows. After that, we can split each row up into cells by splitting on ",", resulting in our table.
Conclusion
It's not a single ArrayFormula, sure, but neither of these formulas is really all that complex, which is nice. ArrayFormulas can get messy and confusing quickly.
We've managed to avoid scripting too, which is a plus.
You can also hide the serialization column if you find it unsightly!

Related

Google script custom function for different column [duplicate]

I'm trying to do a couple of different things with a spreadsheet in Google and running into some problems with the formulas I am using. I'm hoping someone might be able to direct me to a better solution or be able to correct the current issue I'm having.
First off all, here is a view of the data on Sheet 1 that I am pulling from:
Example Spreadsheet
The first task I'm trying to accomplish is to create a sheet that lists all of these shift days with the date in one column and the subject ("P: Ben" or S: Nicole") in another column. This sheet would be used to import the data via a CSV into our calendar system each month. I tried doing an Index-Match where it used the date to pull the associated values however I found that I had to keep adjusting the formula offsets in order to capture new information. It doesn't seem like Index-Match works when multiple rows/columns are involved. Is there a better way to pull this information?
The second task I am trying to accomplish is to create a new tab which lists all the dates a specific person is assigned too (that way this tab will update in real time and everyone can just look at their own sheet to see what days they are on-call). However, I run into the same problem here because for each new row I have to change the formula to reflect the correct information otherwise it doesn't pull the correct cell when it finds a match.
I would appreciate any and all information/advice on how to accomplish these tasks with the formula combination I mentioned or suggestions on other formulas to use that I have not been able to find.
Thanks in advance!
Brandon. There are a few ways to attack your tasks, but looking at the structure of your data, I would use curly brackets {} to create arrays. Here is an excerpt of how Google explains arrays in Sheets:
You can also create your own arrays in a formula in your spreadsheet
by using brackets { }. The brackets allow you to group together
values, while you use the following punctuation to determine which
order the values are displayed in:
Commas: Separate columns to help you write a row of data in an array.
For example, ={1, 2} would place the number 1 in the first cell and
the number 2 in the cell to the right in a new column.
Semicolons: Separate rows to help you write a column of data in an array. For
example, ={1; 2} would place the number 1 in the first cell and the
number 2 in the cell below in a new row.
Note: For countries that use
commas as decimal separators (for example €1,00), commas would be
replaced by backslashes () when creating arrays.
You can join multiple ranges into one continuous range using this same
punctuation. For example, to combine values from A1-A10 with the
values from D1-D10, you can use the following formula to create a
range in a continuous column: ={A1:A10; D1:D10}
Knowing that, here's a sample sheet of your data.
First Task:
create a sheet that lists all of these shift days with the date in one
column and the subject ("P: Ben" or S: Nicole") in another column.
To organize dates and subjects into discrete arrays, we'll collect them using curly brackets...
Dates: {A3:G3,A7:G7,A11:G11,A15:G15}
Subjects: {A4:G4,A5:G5,A8:G8,A9:G9,A12:G12,A13:G13,A16:G16,A17:G17}
This actually produces two rows rather than columns, but we'll deal with that in a minute. You'll note that, because there are two subjects per every one date, we need to effectively double each date captured.
Dates: {A3:G3,A3:G3,A7:G7,A7:G7,A11:G11,A11:G11,A15:G15,A15:G15}
Subjects: {A4:G4,A5:G5,A8:G8,A9:G9,A12:G12,A13:G13,A16:G16,A17:G17}
Still with me? If so, all that's left is to (a) turn these two rows into two columns using the TRANSPOSE function, (b) combine our two columns using another pair of curly brackets and a semicolon and (c) add a SORT function to list the dates in chronological order...
=SORT(TRANSPOSE({{A3:G3,A3:G3,A7:G7,A7:G7,A11:G11,A11:G11,A15:G15,A15:G15};{A4:G4,A5:G5,A8:G8,A9:G9,A12:G12,A13:G13,A16:G16,A17:G17}}),1,TRUE)
Second Task:
create a new tab which lists all the dates a specific person is
assigned too (that way this tab will update in real time and everyone
can just look at their own sheet to see what days they are on-call).
Assuming the two-column array we just created lives in A2:B53 on a new sheet called "Shifts," then we can use the FILTER function and SEARCH based on each name. The formula at the top of Ben's sheet would look like this:
=FILTER(Shifts!A2:B53,SEARCH("Ben",Shifts!B2:B53))
Hopefully this helps, but please let me know if I've misinterpreted anything. Cheers.

How to reuse a formula in another cell but change the referenced columns [duplicate]

I am working with a large (but simple) formula In Google Sheets that re-uses the same blocks of formulas repeatedly. To get a bunch of data from a bunch of different tabs I have to use 708 characters in that block of formulas. But then I need to repeatedly reference that data over and over within just the 1 cell which multiplies the length of the formula to the point where I can't even tell what is going on any more.
For example I have a cell with the final code (with 2216 characters) of:
=iferror(IF(ISNUMBER(SEARCH("a",concatenate(TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Brown!$C$3:$C$68&Brown!$D$3:$D$68,Brown!H$3:H$68,""))),TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Godoy!$C$3:$C$76&Godoy!$D$3:$D$76,Godoy!H$3:H$76,""))),TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Sindel!$C$7:$C$60&Sindel!$D$7:$D$60,Sindel!H$7:H$60,""))),TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Taylor!$C$3:$C$82&Taylor!$D$3:$D$82,Taylor!H$3:H$82,""))),TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Wanner!$C$3:$C$55&Wanner!$D$3:$D$55,Wanner!H$3:H$55,""))),TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Gehrman!$C$3:$C$16&Gehrman!$D$3:$D$16,Gehrman!H$3:H$16,""))),TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Francois!$C$3:$C$17&Francois!$D$3:$D$17,Francois!H$3:H$17,"")))))),"A",average(ArrayFormula(mid(concatenate(TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Brown!$C$3:$C$68&Brown!$D$3:$D$68,Brown!H$3:H$68,""))),TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Godoy!$C$3:$C$76&Godoy!$D$3:$D$76,Godoy!H$3:H$76,""))),TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Sindel!$C$7:$C$60&Sindel!$D$7:$D$60,Sindel!H$7:H$60,""))),TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Taylor!$C$3:$C$82&Taylor!$D$3:$D$82,Taylor!H$3:H$82,""))),TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Wanner!$C$3:$C$55&Wanner!$D$3:$D$55,Wanner!H$3:H$55,""))),TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Gehrman!$C$3:$C$16&Gehrman!$D$3:$D$16,Gehrman!H$3:H$16,""))),TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Francois!$C$3:$C$17&Francois!$D$3:$D$17,Francois!H$3:H$17,"")))),sequence(len(concatenate(TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Brown!$C$3:$C$68&Brown!$D$3:$D$68,Brown!H$3:H$68,""))),TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Godoy!$C$3:$C$76&Godoy!$D$3:$D$76,Godoy!H$3:H$76,""))),TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Sindel!$C$7:$C$60&Sindel!$D$7:$D$60,Sindel!H$7:H$60,""))),TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Taylor!$C$3:$C$82&Taylor!$D$3:$D$82,Taylor!H$3:H$82,""))),TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Wanner!$C$3:$C$55&Wanner!$D$3:$D$55,Wanner!H$3:H$55,""))),TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Gehrman!$C$3:$C$16&Gehrman!$D$3:$D$16,Gehrman!H$3:H$16,""))),TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Francois!$C$3:$C$17&Francois!$D$3:$D$17,Francois!H$3:H$17,"")))))),1)*1))),"")
This looks crazy long, but it is only because I am using this one formula (with 708 characters):
concatenate(TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Brown!$C$3:$C$68&Brown!$D$3:$D$68,Brown!H$3:H$68,""))),TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Godoy!$C$3:$C$76&Godoy!$D$3:$D$76,Godoy!H$3:H$76,""))),TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Sindel!$C$7:$C$60&Sindel!$D$7:$D$60,Sindel!H$7:H$60,""))),TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Taylor!$C$3:$C$82&Taylor!$D$3:$D$82,Taylor!H$3:H$82,""))),TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Wanner!$C$3:$C$55&Wanner!$D$3:$D$55,Wanner!H$3:H$55,""))),TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Gehrman!$C$3:$C$16&Gehrman!$D$3:$D$16,Gehrman!H$3:H$16,""))),TEXTJOIN("",TRUE,arrayformula(if($B3&$C3=Francois!$C$3:$C$17&Francois!$D$3:$D$17,Francois!H$3:H$17,""))))
3 times within the cell.
Is it possible to have one cell just contain the block of functions that I want to use (as a string) and then somehow convert the string to code to reuse it without making a monster function?
For example, could I assign A1 to hold the long code that I want to have multiple times and then have a formula like:
=IFERROR(IF(ISNUMBER(SEARCH("a",textToFormula(A1))),"A",AVERAGE(ArrayFormula(mid(textToFormula(A1),sequence(len(textToFormula(A1))),1)*1)).
I should also mention that there is no room in my sheet to just put the string of data I am looking for in a separate cell, because I have to apply this formula roughly 50 rows and 180 columns.
Ouch! That is a long formula! Sadly, there's no eval() like in JavaScript, but we can at least make you a simpler formula.
How about this one? It's still a bit long, but far less complex. It only Queries each sheet once. This one works in cell F3, but can be dragged.
=IF(
JOIN("",{Teacher1!G$7:G;Teacher2!G$7:G;Teacher3!G$7:G;Teacher4!G$7:G;Teacher5!G$7:G})<>"",
IFERROR(Average(ArrayFormula(--{
QUERY(ArrayFormula(TO_TEXT(Teacher1!$A$7:$GX)),"select Col"&COLUMN()+1&" where Col3='"&$B3&"' and Col4='"&$C3&"'");
QUERY(ArrayFormula(TO_TEXT(Teacher2!$A$7:$GX)),"select Col"&COLUMN()+1&" where Col3='"&$B3&"' and Col4='"&$C3&"'");
QUERY(ArrayFormula(TO_TEXT(Teacher3!$A$7:$GX)),"select Col"&COLUMN()+1&" where Col3='"&$B3&"' and Col4='"&$C3&"'");
QUERY(ArrayFormula(TO_TEXT(Teacher4!$A$7:$GX)),"select Col"&COLUMN()+1&" where Col3='"&$B3&"' and Col4='"&$C3&"'");
QUERY(ArrayFormula(TO_TEXT(Teacher5!$A$7:$GX)),"select Col"&COLUMN()+1&" where Col3='"&$B3&"' and Col4='"&$C3&"'")}
)),"A"),
""
)
The Query Statement:
Each cell queries each sheet as a table where the name is matched in the sheet as a row.
The COLUMN()+1 is to get the corresponding columns to line up. I.e. If we're in column F (6), we want to look in column G (7).
The TO_TEXT allows us to look for non-numbers ("A").
After that, convert each query result to a number with --, then take the Average. If any of the numbers cannot be converted to a number, Average gives us an error, and we assume the value was "A".
In the case that all cells in a column for a date are blank (the blank JOIN), bypass the queries altogether and output a blank cell.

Removing duplicate/opposite entries in Google Sheets

I have a sheet containing the following data:
URL A, URL B, similar score in percent.
If URL A is 98% similar to URL B, it means that URL B is 98% similar to URL A, and listed as well.
I want to find and eliminate these duplicates/reversed entries. For now, I have tried having two extra columns concatenating URL A+URL B in one, and URL B+URL A in one. This way I have unique identifiers.
After this I'm kinda stuck, because I'm dealing with a lot of variables, as data is in two different rows, and two different columns. I might be looking into a script, taking the A+B value, iterating through the B+A value until it finds a match, and somehow marks this (or simply just deletes it), since my knowledge of formulas for highlighting these duplicates are falling short.
This sheet shows the concept - the first 100 rows (it's about 11K in total): https://docs.google.com/spreadsheets/d/1YKsguAn1lYjV4FlP_6_TlKGvFcpFAEzn7bpAyOEmozQ/edit?usp=sharing
Any suggestions for what I should look into?
Try the filter(match()) pattern to find duplicate values, like this:
=unique(
flatten(
filter(
A2:B,
match(A2:A & B2:B, B2:B & A2:A, 0),
C2:C >= 90
)
)
)
I ended up with a solution where I sorted by URL A and implemented this formula:
=IF(A2<B2,A2&B2,B2&A2)
This way I had the concatenation the same way for the real one and the opposite. I didn't know you could use "<" on strings.
After this, I could delete duplicated values in the column with the formula above.

Split and repeat without

In this sheet, I've the below input data:
As seen, the courses are separated by /
I want to display the same in the format below, where each line shows one course only, with the data of the student repeated:
I know using =split(C3," / ",true,true) can split the courses into 2 columns at the same row, but I need them in the same column, so I tried =TRANSPOSE(split(C3," / ",true,true)) that is working fine for the first line only, but it fail with using ARRAYFORMULA.
Any thought? I'm opened for any potential solution, formula or script or any other.
UPDATE
I tried this trick, creating a new column showing number of courses for each student as =ArrayFormula(LEN(REGEXREPLACE(C11:C13, "[^/]", ""))+1)
Then using Rep to repeat each row based on the number of courses =arrayformula({transpose(split(concatenate(rept(B11:B13 & ",",D11:D13)),",",false,true)),transpose(split(concatenate(REPT(C11:C13 & ",",D11:D13)),",",false,true))}) then ended up with:
But here, I've the courses still joint together, how can i split them!
I've added two sheets to your sample spreadsheet. "Sheet2" is a cleanup of your testing sheet, "Sheet1." The other sheet ("Erik Help") references Sheet2, not Sheet1, and contains the following formula in cell A1:
=ArrayFormula({"Student ID","Student Name","Course";SUBSTITUTE(SPLIT(QUERY(FLATTEN(SPLIT(FILTER(SUBSTITUTE("/ "&Sheet2!C3:C,"/","/ "&Sheet2!A3:A&"zzz~"&Sheet2!B3:B&"~"),Sheet2!A3:A<>""),"/")),"Select * WHERE Col1 Is Not Null"),"~"),"zzz","")})
This one array formula produces all headers and results.
A virtual array is formed between the curly brackets { }. Headers are introduced first followed by a semicolon, which means "bump down one row to continue." The header titles can be changed as you like.
How It Works:
An addition "/ " is concatenated to the front of every non-blank entry in Sheet2!C2:C. Then SUBSTITUTE replaces every one of these forward slashes with Col A data, "zzz~", Col B data and "~". The tildes (~) will be used later by the outer SPLIT. The "zzz" is added to make sure that ID numbers are converted to text so that they hold formatting throughout the processing and don't turn into real numbers; later, the outer SUBSTITUTE will replace those with null (i.e., get rid of the 'zzz').
Once the initial concatenations are complete, they are SPLIT at the forward slash and then FLATTENed into one column. QUERY removes any blank rows in this virtual array so far. The remaining results are again SPLIT at the tilde. Finally, that outer SUBSTITUTE removes the temporary instances of 'zzz'.
I also added a custom CF formula for the alternating color banding on alternate rows.
You can try this one:
Formula:
=ARRAYFORMULA(TRIM(QUERY(SPLIT(FLATTEN(IF(IFERROR(SPLIT(C3:C5, "/"))="",,
A3:A5&"×"&B3:B5&"×"&SPLIT(C3:C5, "/"))), "×"),
"where Col3 is not null")))
Output:
Reference:
How to transpose & split multiple columns and repeat specific cells in a column

Find Row Where Sum is Reached from Single Joined Column (not a range of cells)

I'm trying to run a formula to identify in which row a total sum is reached.
I've been able to do that calculation when I have an entire range of cells to work with, however, I'm doing a filter / join calculation because I need to do this from an individual row with all the data instead of an entire range of cells.
Here is an example google sheet (EDITABLE - feel free) where you can see the range and working formula (both below). Help getting this from the single-cell versions on the top would be very helpful. The error I get with both row() & index() formulas is that the "argument must be a range".
If there's another way to do this besides the single-cell I had that doesn't require referencing the range (e.g. using FILTER) then I'm open to it.
My desired result is to be able to pull the get the second column (date) at the point when the sum is reached (can be via the INDEX & MATCH formula I used or an alternative). This will tell me the earliest date that feeds into the desired sum.
Yes unfortunately you can't do that trick with SUMIFS to get a running total unless the column being totalled is an actual range.
The only approach I know is to multiply successive values by a triangular array like this:
1 0 0 ...
1 1 0 ...
1 1 1 ...
so you get just the sum of the first value, the first 2 values, then 3 values up to n.
This is the formula in F5:
=ArrayFormula(match(E14,mmult(IF(ROW(A1:INDEX(A1:ALL1000,COUNT(split(A5,",")),COUNT(split(A5,","))))>=
COLUMN(A1:INDEX(A1:ALL1000,COUNT(split(A5,",")),COUNT(split(A5,",")))),1,0),TRANSPOSE(SPLIT(A5,",")))))
And the formula in F6 is just
=to_date(INDEX(TRANSPOSE(SPLIT(B5,",")),F5,1))
EDIT
You might have guessed that the above formula was adapted from Excel, where you try to avoid volatile functions like Offset and Indirect.
I have realised since posting this answer that it could be improved in two ways:
(1) By using Offset or Indirect, thus avoiding the need to define a range of arbitrary size like A1:ALL1000
(2) By implying a 2D array by comparing a row and column vector, rather than actually defining a 2D array. This would give you something like this in F5:
=ArrayFormula(match(E14,mmult(IF(ROW(indirect("A1:"&address(COUNT(split(A5,",")),1)))>=
COLUMN(indirect("A1:"&address(1,COUNT(split(A5,","))))),1,0),TRANSPOSE(SPLIT(A5,",")))))
which could be further simplified to:
=ArrayFormula(match(E14,mmult(IF(ROW(indirect("A1:A"&COUNT(split(A5,","))))>=
COLUMN(indirect("A1:"&address(1,COUNT(split(A5,","))))),1,0),TRANSPOSE(SPLIT(A5,",")))))