GNUPlot - Arbitrary number of columns in stacked line - csv

I am working on a script that generates a csv with arbitrary number of y values for a given x value. The first row of the csv has the names of the data sets. The x value is a unix timestamp. I would like to use gnuplot to graph this data as a stacked line graph where the values are shown as fractions of a total for that row. How would I do this?
I've looked at the following solutions, and attempted to integrate them, but I cannot figure out what I am doing wrong.
It will either say not enough columns or some mismatch for the number of columns.
There are up to N columns of data for a given time index. The total I am looking at is for a given time index.
--
An example of my data:
KEY,CSS,JavaScript,Perl,Python,Shell
1428852630,0,0,0,0,406
1428852721,0,0,0,0,406
1428852793,0,0,0,0,406
1428853776,0,0,0,0,781
1429889154,0,0,0,0,1200
1429891056,0,0,0,0,1648
1429891182,0,0,0,0,1648
1429891642,0,0,0,0,1648
1430176065,0,0,0,0,2056
However, there might be a large number of columns, I want one that sets the number of columns on run time.
http://gnuplot.sourceforge.net/demo/histograms.html - This seems to have issues with being modified to have an arbitrary number of columns.
plot 'immigration.dat' using (100.*$2/$24):xtic(1) t column(2), \
for [i=3:23] '' using (100.*column(i)/column(24)) title column(i)
https://newspaint.wordpress.com/2013/09/11/creating-a-filled-stack-graph-in-gnuplot/

This answer shows how to count columns, with a slight modification:
file="file.dat"
get_number_of_cols = "awk -F, 'NR == 1 { print NF; exit }' ".file
nc=system(get_number_of_cols)
Then you need so sum colums 2 to nc, let's do it with a recursion:
sum(c,C)=((c!=C)?column(c)+sum(c+1,C):column(c))
And now you can plot:
set key outside
set datafile separator ","
plot for [i=nc:2:-1] file using 0:(100*sum(2,i)/sum(2,nc)):xtic(1) title columnhead(i) with filledcurve y=0

Related

Hide Empty Zero Data Points in SSRS Line Chart

I have an SSRS line chart that I need to figure out how to hide empty data points - stop the line from making markers/continuing the line where Category Group values are zero:
The values and series and groups are setup as so:
With the data looking like this:
I have tried filtering both at the chart level and the Category Group levels to filter out data that would create groups for Series 2020 and Category October/November/December, this creating or filling those points in my mind:
Where the expression is "=DateSerial(YEAR(today()),MONTH(today()), 1)" achieving the net result of filtering out data points/rows that from an incomplete month - meaning when the report would be run on 10/10/2020, only data from before 10/1/2020 should be used to generate groups.
The problem is that you are using COUNT() which will always return a value, zero if there are no records to count.
I created a simple dataset and using count of FILE_NUMBER I got this (replicating your issue) ...
The easiest way round this is to change the value expression to something like this...
=SUM(IIF(Fields!FILE_NUMBER.Value = Nothing, 0, 1))
This way we add 1 to the sum for every non-empty value and nothing if it's empty. If the total sum is still empty, by default, the chart will not plot that point.
So we end up with this...

How do you separate comma separated values in different columns while maintaining values in the rest of the row in Google Sheets?

How do you adjust comma separated values in such a way that the value separated with commas is separated and that a new row is created for this value and that the other values are the same as in the row from which the value comes? That would look like this:
From this..
..to this.
I'm actually looking for an answer that doesn't use google script when possible and without using gigantic long and complex formulas. The use of a pivot table within Google sheets may be used, but is also not my preference. But if it's not possible to use only formulas then I'm open to other answers as well.
I've had this question for over a year and I can't find serious answers online after a few hours of searching. There will be answers using a google script, but that doesn't really fall within the scope of my question. I am willing to adjust or rephrase my question if the current question remains unanswered.
I myself have no idea how to answer the question and the attempts I have made are not to be taken seriously.
Lambda Update
It's 2022. We now have LAMBDA and a bunch of array functions. Thus we can combine everything into a single formula, as originally desired. The idea is still the same as before, just much cleaner. (Also FLATTEN is no longer undocumented.)
=ArrayFormula(
SPLIT(
TRANSPOSE(
SPLIT(
JOIN(
"",
BYROW(
A1:E4,
LAMBDA(row,
JOIN(
",",
REDUCE(
";",
row,
LAMBDA(cell1,cell2,
FLATTEN(FLATTEN(cell1)&","&SPLIT(cell2,","))
)
)
)
)
)
),
";,",
)
),
","
)
)
How it works
REDUCE combines all elements in an array to a single result using a function (Named Function or LAMBDA). In this case, we use the same permutation trick (combine column vector with row vector) as in the old solution to serialize every row. This gives us a column of rows, each starting with ;,
The rows are JOINed together, serializing the array.
BYROW applies a function to each row in a range, returning a single value for each row. Here, we use the process above in a LAMBDA function which gives a single serialized string for each row in the range.
We then JOIN all these together. Each row is delimited by ;,.
Split on the ;,, giving a row of serialized rows
Transpose the row to get a column
Split again on , deserializing each row.
For a much more readable solution, of course, you can name the lambdas, resulting in a formula that's cleaner still.
Draggable Formula solution (obsoleted by LAMBDA)
Let's see if I can get this ball rolling.
At a Glance
This solution is unfortunately unstable, as it relies on the Flatten undocumented function (turn any range into a column array), and requires two formulas to work. While I'm sure that you can do the same thing without Flatten(), this at least saves us some typing, as we rely on it heavily. Without flatten, we can achieve the same with TRANSPOSE(SPLIT(TEXTJOIN(...)), which is not nearly as elegant.
The core formula, while it does have a linear growth factor and can get messy with more columns, does have an easy pattern to follow for the setup. It can also be dragged, which is the next best thing to a single ArrayFormula.
Stage 1: Serialize Rows
As you might have expected, we're going to use some string serialization tricks to get what we want. Here's the core formula:
=TEXTJOIN(",",,
ArrayFormula(
Flatten(Flatten(Flatten(Flatten(Flatten(
SPLIT(A1,",")&",")&
SPLIT(B1,",")&",")&
SPLIT(C1,",")&",")&
SPLIT(D1,",")&",")&
SPLIT(E1,",")&";")
))&","
As you can see, it accounts for any commas inside each cell in the row. To add more columns, simply add another Flatten( and add your column to the list. Just make sure that the last one uses a ; and not a ,.
We take advantage of the fact that, in general, when ArrayFormula is applied to a column vector and a row vector, we can do an operation on every permutation of the two mixed together.
Examples:
=ArrayFormula({0;1}&{2,3}) is equivalent to ={"02","12";"03","13"}
=ArrayFormula(SEQUENCE(10)*SEQUENCE(1,10)) gives us a 10x10 multiplication table.
In our case, we use this to generate every possible permutation of rows based on the commas in each cell, serializes the row into a CSV string, ending each in a semicolon, then joins all the rows into one long string. The extra "," is so we can concatenate multiple tables together in the next stage.
When you're set up with the proper number of columns, drag this down to the height of the table. (Note: If some of your values can be blank, you also have to do some error checking around each SPLIT.)
Stage 2: Deserialize
This formula is considerably simpler. (Assuming serialization data is in column F.)
=ArrayFormula(
SPLIT(
TRANSPOSE(
SPLIT(
JOIN(,F:F),
";,",
)
),
","
)
)
First, glue all the strings together using JOIN. Since we know that each row ends with a ";,", we split on that to get our rows. After that, we can split each row up into cells by splitting on ",", resulting in our table.
Conclusion
It's not a single ArrayFormula, sure, but neither of these formulas is really all that complex, which is nice. ArrayFormulas can get messy and confusing quickly.
We've managed to avoid scripting too, which is a plus.
You can also hide the serialization column if you find it unsightly!

Find Row Where Sum is Reached from Single Joined Column (not a range of cells)

I'm trying to run a formula to identify in which row a total sum is reached.
I've been able to do that calculation when I have an entire range of cells to work with, however, I'm doing a filter / join calculation because I need to do this from an individual row with all the data instead of an entire range of cells.
Here is an example google sheet (EDITABLE - feel free) where you can see the range and working formula (both below). Help getting this from the single-cell versions on the top would be very helpful. The error I get with both row() & index() formulas is that the "argument must be a range".
If there's another way to do this besides the single-cell I had that doesn't require referencing the range (e.g. using FILTER) then I'm open to it.
My desired result is to be able to pull the get the second column (date) at the point when the sum is reached (can be via the INDEX & MATCH formula I used or an alternative). This will tell me the earliest date that feeds into the desired sum.
Yes unfortunately you can't do that trick with SUMIFS to get a running total unless the column being totalled is an actual range.
The only approach I know is to multiply successive values by a triangular array like this:
1 0 0 ...
1 1 0 ...
1 1 1 ...
so you get just the sum of the first value, the first 2 values, then 3 values up to n.
This is the formula in F5:
=ArrayFormula(match(E14,mmult(IF(ROW(A1:INDEX(A1:ALL1000,COUNT(split(A5,",")),COUNT(split(A5,","))))>=
COLUMN(A1:INDEX(A1:ALL1000,COUNT(split(A5,",")),COUNT(split(A5,",")))),1,0),TRANSPOSE(SPLIT(A5,",")))))
And the formula in F6 is just
=to_date(INDEX(TRANSPOSE(SPLIT(B5,",")),F5,1))
EDIT
You might have guessed that the above formula was adapted from Excel, where you try to avoid volatile functions like Offset and Indirect.
I have realised since posting this answer that it could be improved in two ways:
(1) By using Offset or Indirect, thus avoiding the need to define a range of arbitrary size like A1:ALL1000
(2) By implying a 2D array by comparing a row and column vector, rather than actually defining a 2D array. This would give you something like this in F5:
=ArrayFormula(match(E14,mmult(IF(ROW(indirect("A1:"&address(COUNT(split(A5,",")),1)))>=
COLUMN(indirect("A1:"&address(1,COUNT(split(A5,","))))),1,0),TRANSPOSE(SPLIT(A5,",")))))
which could be further simplified to:
=ArrayFormula(match(E14,mmult(IF(ROW(indirect("A1:A"&COUNT(split(A5,","))))>=
COLUMN(indirect("A1:"&address(1,COUNT(split(A5,","))))),1,0),TRANSPOSE(SPLIT(A5,",")))))

How to sum cells across two matrices in different datasets?

I've researched previous answers, nothing seems relevant to this question.
I have a simple matrix, showing data from "Dataset A" - counts in the rows and month/years in the columns.
I have a second matrix, showing data from "Dataset B" with the same layout.
I wish to create a third matrix, or add a total line to the second matrix, showing the sum of each month's figures. See screenshot;
Image showing dataset A (orange header) and dataset B (blue header) and items I wish to sum
I have tried adding a new row to the dataset B matrix and using the following formula -
=SUM(Fields!Counts.Value) + Sum(Fields!Counts.Value, DatasetA")
However this fails to take into account the month groupings from Dataset A, and instead sums the correct number from Dataset B with the total number from Dataset A - not just the figure for the appropriate month.
How can I make this sum work, ensuring that only the relevant months are summed across both datasets?
Thanks!
Update: Using the LOOKUP function I have made a little progress but still have been unable to sum the two numbers (See image, below). I tried the following lookup expression to build the total;
=Sum(Fields!Counts.Value) + LOOKUP(Fields!ReceivedMonth.Value, Fields!ReceivedMonth.Value, Fields!Counts.Value, "DatasetA")
But it appears that I'm unable to SUM the Fields!Counts.Value, "DatasetA" inside the lookup (it throws a syntax error), so the total result is displaying at the total from DatasetB plus 1.
Image showing result of LOOKUP - not quite correct

Cumulative data series displays error in a table in Power BI

I would like to display plan and fact cumulative data series in a dashboard with a bar and line combined chart and a table next to each other using Power BI Version: 2.59.5135.781 64-bit (2018. June) edition.
My DAX formula looks like this:
CUMULATIVE_FACT = CALCULATE(
SUM('FACT_TABLE'[FACT_VALUE]);
FILTER(
ALL('DATES');
'DATES'[YEAR]=MAX('DATES'[YEAR]) &&
'DATES'[DATE]<=MAX('DATES'[DATE])
)
)
Which works fine and gives a result as such (bars displayed as TÉNY refer to cumulative fact)
The cumulative plan (line referred to as TERV) series is identical to this but with plan figures. Also you can change the year so the aggregation only runs for the current year.
However, I would like to display either null (blank) or zero values for the fact series after a certain date which is given as a parameter. This parameter value is stored in a table with a single column and single row in a date type value.
So I modified my formula as such
CUMULATIVE_FACT = IF(VALUES('DATES'[DATE])<= MAX(PARAMETER_TABLE[PARAMETER_DATE]);
CALCULATE(
SUM('FACT_TABLE'[FACT_VALUE]);
FILTER(
ALL('DATES');
'DATES'[YEAR]=MAX('DATES'[YEAR]) &&
'DATES'[DATE]<=MAX('DATES'[DATE])
)
); 0)
The formula works fine for the chart but my table visual gives an error.
So the chart looks okay, perfectly the way I would like to display it, but the table gives back a 'A table of multiple values was supplied where a single value was expected' error message
Error message:
The column referred to in the message is basically the CUMULATIVE_FACT measure, I just changed it for ease of understanding. I tried with BLANK() instead of 0, but it looks the same.
No idea why it is not working with the table visual. Any ideas?
The problem is coming from this piece:
VALUES('DATES'[DATE])
This returns all values in the current filter context, not just a single one. That's why you're getting
A table of multiple values was supplied where a single value was expected
when you try to compare it to MAX(PARAMETER_TABLE[PARAMETER_DATE].
It works in the chart since VALUES('DATES'[DATE]) is always a single value that corresponds to the month on the axis, whereas the table has a total line that encompasses multiple months.
I think if you just turned off the total line, it would be OK. Otherwise, change VALUES('DATES'[DATE]) to an expression that returns a single date in the way you want. For example, MAX('DATES'[DATE]) might work.