How to count a range of values in datastudio - csv

I am trying to transform a report that I have made in sheets into the newest datastudio tool. In my sheet, I have a table where there are several columns that holds data related (this is because each row could have more than one value so I used the "split text into columns" function to represent).
What I have is something similar to:
ID
Component
Component 1
Component 2
101
wood
metal
gold
102
metal
copper
103
wood
gold
metal
In my excel, I have a formula to count the time a certain component is shown by using =COUNTIF(<range>,<string>)
Therefore, with the above formula, I have something similar to:
Component
count
wood
2
metal
3
copper
1
gold
2
I want to be able to build the same in datastudio. Turns out that since the components are divided into columns, I can only use one dimension and the result only shows the count of the first column.
I want to know if there is an easy way to accomplish this. My original data source is like this:
ID
Component
101
wood;metal;gold
102
meta;copper
103
wood;gold;metal
Maybe it's easier to work directly with the previous format but again, using the component in this case only counts for the first occurrence and not across the whole string.
For now, the only solution I can think of is splitting the text into rows instead of columns, but that is not feasible achievable using Google Sheets, or at least not that I am aware of.
Could somebody have an idea of how to accomplish this?
Thanks!
Edit
I am adding here the minimal reproducible example. This is the spreadsheet that I have (example) and the current report I am using so far (built-in sheets):
Now, I want to have the same report (plus more things) using datastudio. This is the report example I have in data studio. As you will see the record count for components is not accurate in DataStudio.

Here is a preliminary answer. I'll polish if we like it and delete it otherwise.
Consider the following sheet:
Range A2:B4 contains my data of interest. Rows 7:10 show the results I think you are looking for. The mechanism for creating this is the value of cell A6 which contains:
=ARRAYFORMULA(QUERY(FLATTEN(SPLIT(B2:B4,";")), "SELECT Col1, COUNT(Col1) WHERE Col1 is not null GROUP BY Col1"))
At the highest level, the formula splits each of the delimited text items into their own cell and then flattens all these cells into one column against which we then run a SQL query against the sheet to group and count them.

Related

How can I reduce google sheets lag by replacing thousands of cell by cell formula calculations with an elegant script?

first time posting, so bear with me.
I have successfully designed a google sheet that automatically creates a crop map based on the crop plan in another tab ('Successions')
In 'Successions' the following columns are relevant:
Column A - Succession ID
Column C - Row ID
Column R - Planting Date
Column W - Harvest Date
In "Map 1A" I have created a map of our field, with Column B representing the Row ID and the columns to the right each representing a week of the year, whose start dates are defined in row 2.
My goal is to map each succession on the appropriate row for the duration that it will be in the ground (Planting Date to Harvest Date). I will have different successions occupying the same row during different date ranges.
I accomplished this in three steps:
Step 1
Every cell (except the left most column) contains a formula that returns and text joins (if multiple results) any successions in that row during that week - if an only if the value it would return is different than the value the cell to left would return. The result is that the succession ID is only displayed on the week that it starts. The formula in these cells is:
=Iferror(If((Textjoin(" / ",True,FILTER(Successions!$A$2:$A$647,RegExMatch(Successions!$C$2:$C$647,$B3),Successions!$R$2:$R$647<D$2,Successions!$W$2:$W$647>D$2)))=(Textjoin(" / ",True,Iferror(FILTER(Successions!$A$2:$A$647,RegExMatch(Successions!$C$2:$C$647,$B3),Successions!$R$2:$R$647<C$2,Successions!$W$2:$W$647>C$2),""))),"",(Textjoin(" / ",True,FILTER(Successions!$A$2:$A$647,RegExMatch(Successions!$C$2:$C$647,$B3),Successions!$R$2:$R$647<D$2,Successions!$W$2:$W$647>D$2)))),"")
The code for the leftmost column, which I did not want to check the cell to the left is:
=Iferror((Textjoin(" / ",True,FILTER(Successions!$A$2:$A$647,RegExMatch(Successions!$C$2:$C$647,$B3),Successions!$R$2:$R$647<C$2,Successions!$W$2:$W$647>C$2))),"")
Step 2
Because every cell contains a formula, I could not get the succession ID to spill into the adjacent cell and thus be readable. To solve this, I have an adjacent tab called "Map Final" that mirrors "Map 1A" via an Array Formula. Because it does not carry over formulas just values, it allows text to spill over.
Step 3
Conditional formatting is applied to "Map Final" via the Custom Formula function. There will be a rule for each crop type. Each succession is automatically colored the color of its crop type, and the color fills all cells for the duration of the succession.
My question is this:
How could I accomplish this same mapping in a more efficient way? Currently any update to Successions takes 30 seconds to a minute to update. The progress bars are killing me and basically make this really cool tool unusable for crop planning purposes, during which we are going back and forth from data to map frequently to make placement decisions.
Do you think this is possible using a custom script that pushes data rather than pulling data?
Would it render faster, or is the way I'm doing it the most efficient way?
While that is my main question, I am certainly open to any advice you have for improving speed by simply refining my current method.
Thanks in advance for any help you can provide!

Consolidate response data from Google Form submissions where the respondents have the same name

I thought I could figure this out, but it seems I can't. Basically, students' information will be input to Google Sheets through Google Forms, but due to my Google Forms having sections, the data of one student appears in different rows. Each student rates 4 subjects, and they each have 4 teachers; the form separates each subject area into sections and Google Sheets places this info into 4 different rows.
I'm trying to consolidate all the data that has an identical name into one row, rather than appearing in 4 different rows. I've recreated what my data currently looks like in this example spreadsheet.
Any help would be much appreciated, this is driving me mental!
You could do it with a lot of VLOOKUP's.
Suppose you put the list of names in A21:A24
=unique(C2:C11)
Then do a lookup on each Name/subject pair in the appropriate column
=ArrayFormula(iferror(vlookup(A21:A24&{"Math","Math","English","English","Science","Science","HSIE","HSIE"},{C2:C11&D2:D11,E2:N11},{2,3,4,5,6,7,8,9},false),""))
Explanation
Maybe the way to explain it is to build it up from the case of one student - one subject to many students - many subjects? This assumes that you've got a list of student's names in A21:A24 and you want to pull out the info for each of them from the original data.
(1) One student, one subject
Hopefully trivial - just provide a reference to the student's report
=E2
(2) Several students, one subject (maths) (Names not necessarily in same order as original list)
Use vlookup to get report against each name
=ArrayFormula(vlookup(a21:a24,C2:E11,3,false))
It's an array formula so that it works through each of the 4 names, finding the first report associated with each. Note that it would pick up a blank for Susan in your data, because the first row with Susan in it contains a blank for maths.
(3) Several students, several subjects
=ArrayFormula(iferror(vlookup(A21:A24&{"Math","Math","English","English","Science","Science","HSIE","HSIE"},{C2:C11&D2:D11,E2:N11},{2,3,4,5,6,7,8,9},false),""))
So here it's looking up pairs like
BobMath BobMath BobEnglish BobEnglish...JimMath JimMath JimEnglish...
(student name + first pair of curly brackets)
in an array (second pair of curly brackets) where the first column is
BobMath
JimMath
SusanScience
...
and pulling out the correct row. It's also got to tell vlookup what column to look in (third pair of curly brackets).
So the first vlookup generated by this would be
vlookup("BobMath",array,2,false)
where array contains
BobMath Cross (several other columns)...
JimMath Blue (several other columns)...
...
BillHSIE (several other columns)... Hair
so it would pick up Cross
and the last vlookup would be
vlookup(BillHSIE,array,9,false)
which would pick up Hair.
Now it doesn't pick up a blank for Susan's maths because the first row it looks in is row 9 and it picks up Tickle from column E.
EDIT
If you then wanted to add some columns which only related to the student and not to any particular subject (like 'student attends counselling') you would need a different approach.
=ARRAYFORMULA(IFERROR(VLOOKUP($A21:$A24,FILTER({$C2:$C11,M2:M11},M2:M11<>""),2,FALSE),""))
This does a lookup as in (2) above but first skips any cells which are blank in the particular column being used for lookup. This formula only works on a single column, but can be pulled across for additional columns.

How to create a dynamic table in Excel?

I am trying to create a dynamic table - I have tried a Pivot Table, but cannot get it to work. So I thought that maybe it could be done with an IF-statement, but that did not work for me neither.
Basically, I have 2 tables, 1 table containing the information (data source table) and 1 table that should be dynamic according to the data in the first table.
So if I change the data in the E-column, the Fruit table (image below) must be updated accordingly.
So if I write 2 instead of 1 in the count of Apples, then it should create 2 apples under the "Fruit"-column". Data in the remaining columns will be calculated with a formula/fixed data - so that is not important.
I am open to any solutions; formulas, pivot tables, VBA, etc.
Have a nice weekend.
I have both Excel 2010 and 2013.
If you want to repeat some text a number of times you can use a somewhat complicated formula to do it. It relies on there not being duplicate entries in the Fruits table and no entries with 0 count.
Picture of ranges and results
Formulas involved include a starter cell E2 and a repeating entry E3 and copied down. These are actually normal formulas, no array required. Note that I have created a Table for the data which allows me to use named fields to get the whole column.
E2 = INDEX(Table1[Fruits],1)
E3 = IF(
INDEX(Table1[Count],MATCH(E2,Table1[Fruits],0))>COUNTIF($E$2:E2,E2),
E2,
INDEX(Table1[Fruits],MATCH(E2,Table1[Fruits],0)+1))
How it works
This formula relies on checking the number of entries above the current one and comparing to the desired count. Some notes:
The starter cell is needed to get the first result.
After the first cell, it counts how often the value above appears in the total list. This is compared to the desired count. If less than desired, it will repeat the value from above. If greater, it will go to the next item in the list. There is a dual relative/absolute reference in here to count cells above.
Since it goes to the next item in the list, don't put a 0 for a count or it will get included once.
You can copy this down for as many cells as you want. It will #REF! when it runs out of data. You can wrap in an IFERROR(..., "") to make these display pretty.
If the non-0 rule is too much, it can probably be removed with a little effort. If there are duplicates, that will be much harder to deal with.

How do I efficiently output to a non-contiguous range in Google Apps Script (GAS)

I'm new to Google Script, so appreciate your help!
Here's what my data looks like (3 non-contiguous records, 4 non-contiguous fields):
https://docs.google.com/spreadsheets/d/18FFB2HlcfcciHj7NPmihZbuf47op2UMdRTKfpyTqowU/edit#gid=0
I have an array of the items and each item is an object that contains 4 keys. I want to output to Google Sheets in as few SetValue requests as possible. If I can't do it in 1 call, then it makes most logical sense to output each item at a time.
My idea is that I can create ranges for fields 1 and 4 that span the entire column. Then I can create ranges that span the entire Item row. Then the INTERSECTION between the 2 is the range I want to output to, once I have assembled an array of 2 values.
Or perhaps since assuming I know the rows/columns of each cell below, I can return the range and the use a UNION of the ranges to create the mapping instead.
But is there a function to do Intersection or Union in GAS? Or am I better off just outputting each cell 1 by 1?
Thanks for your help!
There's no such function. It's either one by one or contiguous cells.
But there's quite a few tricks/alternatives that might work for you. As Sandy pointed in a comment, you could get a contiguous range that had all required cells and set the non-required ones to blank or their original values. This has the down side of not working for formulas.
You could also pre-configured the required cells with simple formulas "pointing" to a continuous range. That would simply bulk update at once.
If bulk setting is really required, you could also grab all values and formulas of the wide range, and convert all plain values to their equivalent formulas, e.g. abc becomes ="abc" and 1/1/2015 =DATE(2015,1,1) (yes, it's cumbersome) and then use setFormulas to set everything back, both original formulas and values converted to formulas. Making no actual content change in the cells you don't want to and changing the required ones, all in one bulk operation.
Anyway, these are just workarounds. As stated in the first sentence, it's not possible, period. You probably want to star this report in Apps Script issue tracker to kind of vote for this feature and receive updates.

SSIS, Dealing with rows which depend on other rows

This is quite a strange problem, wasn't quite sure how to title it. The issue I have is some data rows in an SSIS task which need to be modified depending on other rows.
Name Location IsMultiple
Bob England
Jim Wales
John Scotland
Jane England
A simplifed dataset, with some names, their locations, and a column 'IsMultiple' which needs to be updated to show which rows share locations. (Bob and Jane's rows would be flagged 'True' in the example above).
In my situation there is much more complex logic involved, so solutions using sql would not be suitable.
My initial thoughts were to use an asyncronous script task, take in all the data rows, parse them, and then output them all after the very last row has been input. The only way I could think of doing this was to call row creation in the PostExecute Phase, which did not work.
Is there a better way to go about this?
A couple of options come to mind for SSIS solutions. With both options you would need the data sorted by location. If you can do this in your SQL source, that would be best. Otherwise, you have the Sort component.
With sorted data as your input you can use a Script component that compares the values of adjacent rows to see if multiple locations exist.
Another option would be to split your data path into two. Do this by adding a Multicast component. The first path would be your main path that you currently have. In the second task, add an Aggregate transformation after the Multicast component. Edit the Aggregate and select Location as a Group By operation. Select (*) as a Count all. The output will be rows with counts by location.
After the Aggregate, Add a Merge Join component and select your first and second data paths as inputs. Your join keys should be the Location column from each path. All the inputs from path 1 should be outputs and include the count from path 2 as an output.
In a derived column, modify the isMultiple column with an expression that expresses "If count is greater than 1 then true else false".
If possible, I might recommend doing it with pure SQL in a SQL task on your control flow prior to your data flow. A simple UPDATE query where you GROUP BY location and do a HAVING COUNT for everything greater than 1 should be able to do this. But if this is a simplified version this may not be feasible.
If the data isn't available until after the data flow is done you could place the SQL task after your data flow on your control flow.