How to generate a survey curve with SQL query - mysql

I have a table with 2 columns WorkItem and LiveDays. For example
| WorkItem | LiveDays |
| A | 8 |
| B | 2 |
| C | 5 |
....
I would like to generate a survey data of the work item. Each item is normalized as starting from day 1 and ending to LiveDays, and value of nth day is how many workitems is still live (in process). For example
| Days | Counter | Comments |
| 1 | 3 | (A, B, C)|
| 2 | 3 | (A, B, C)|
| 3 | 2 | (A, C) |
| 4 | 2 | (A, C) |
| 5 | 2 | (A, C) |
| 6 | 1 | (A) |
| 7 | 1 | (A) |
| 8 | 1 | (A) |
Is it possible to use SQL query instead of inserting data into a new table with transaction?
Thanks

To show it can be done (it does after all answer your original question) here's your example reproduced using Db2 Developer-C 11.1 on dbfiddle.uk: (You did say there were several databases you could use, and this will no doubt serve to illustrate that different databases do things in different ways!). Note: additional MySQL solution further down.
CREATE TABLE surveydata AS (
WITH t1(workitem, livedays)
AS (VALUES ('A', 8), ('B', 2), ('C', 5)),
numbers(seq)
AS (VALUES (1)
UNION ALL
SELECT seq + 1
FROM numbers
WHERE seq < (SELECT MAX(livedays)
FROM t1)),
xdata(workitem, ndays)
AS (SELECT workitem,
seq
FROM t1,
numbers
WHERE seq <= livedays)
SELECT ndays AS "Days",
COUNT(*) AS "Counter",
'(' || LISTAGG(workitem, ', ') WITHIN GROUP (ORDER BY workitem) || ')' AS "Comments"
FROM xdata
GROUP BY ndays
) WITH DATA;
with the result of SELECT * FROM surveydata as below ==>
UPDATE: With a bit more fiddling, I've managed to get a solution using MySQL 8.0 as well:
WITH recursive t1(workitem, livedays)
AS (SELECT 'A', 8
UNION ALL SELECT 'B', 2
UNION ALL SELECT 'C', 5 ),
numbers(seq)
AS (SELECT 1 AS seq
UNION ALL
SELECT seq + 1
FROM numbers
WHERE seq < (SELECT MAX(livedays)
FROM t1)),
xdata(workitem, ndays)
AS (SELECT workitem,
seq
FROM t1,
numbers
WHERE seq <= livedays)
SELECT ndays AS "Days",
COUNT(*) AS "Counter" ,
CONCAT('(', GROUP_CONCAT(workitem ORDER BY workitem SEPARATOR ', '), ')') AS "Comments"
FROM xdata
GROUP BY ndays;

Related

How to get maximum appearance count of number from comma separated number string from multiple rows in MySQL?

My MySQL table having column with comma separated numbers. See below example -
| style_ids |
| ---------- |
| 5,3,10,2,7 |
| 1,5,12,9 |
| 6,3,5,9,4 |
| 8,3,5,7,12 |
| 7,4,9,3,5 |
So my expected result should have top 5 numbers with maximum appearance count in descending order as 5 rows as below -
| number | appearance_count_in_all_rows |
| -------|----------------------------- |
| 5 | 5 |
| 3 | 4 |
| 9 | 3 |
| 7 | 2 |
| 4 | 2 |
Is it possible to get above result by MySQL query ?
As already alluded to in the comments, this is a really bad idea. But here is one way of doing it -
WITH RECURSIVE seq (n) AS (
SELECT 1 UNION ALL SELECT n+1 FROM seq WHERE n < 20
), tbl (style_ids) AS (
SELECT '5,3,10,2,7' UNION ALL
SELECT '1,5,12,9' UNION ALL
SELECT '6,3,5,9,4' UNION ALL
SELECT '8,3,5,7,12' UNION ALL
SELECT '7,4,9,3,5'
)
SELECT seq.n, COUNT(*) appearance_count_in_all_rows
FROM seq
JOIN tbl ON FIND_IN_SET(seq.n, tbl.style_ids)
GROUP BY seq.n
ORDER BY appearance_count_in_all_rows DESC
LIMIT 5;
Just replace the tbl cte with your table.
As already pointed out you should fix the data if possible.
For further details read Is storing a delimited list in a database column really that bad?.
You could use below answer which is well explained here and a working fiddle can be found here.
Try,
select distinct_nr,count(distinct_nr) as appearance_count_in_all_rows
from ( select substring_index(substring_index(style_ids, ',', n), ',', -1) as distinct_nr
from test
join numbers on char_length(style_ids) - char_length(replace(style_ids, ',', '')) >= n - 1
) x
group by distinct_nr
order by appearance_count_in_all_rows desc ;

Parsing JSON list in Snowflake - converting redshift sql to snowflake sql

I have some Redshift SQL that I'm trying to convert to snowflake SQL to extract values from a json field. The issue I'm running into is the specification of the index required.
Because I run A/B/n tests, there can be multiple indexes up to 'n'.
So I had this piece of SQL working for Redshift:
SELECT JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (e.splits,n.n),'split_type') types
, JSON_EXTRACT_PATH_TEXT(json_extract_array_element_text (e.splits,n.n),'weight') as weight
FROM experiments e, (SELECT (p0.n + p1.n*2 + p2.n * POWER(2,2) + p3.n * POWER(2,3) + p4.n * POWER(2,4) + p5.n * POWER(2,5)
+ p6.n * POWER(2,6) + p7.n * POWER(2,7) + p8.n * POWER(2,8) + p9.n * POWER(2,9))::int as n
FROM
(SELECT 0 as n UNION SELECT 1) p0,
(SELECT 0 as n UNION SELECT 1) p1,
(SELECT 0 as n UNION SELECT 1) p2,
(SELECT 0 as n UNION SELECT 1) p3,
(SELECT 0 as n UNION SELECT 1) p4,
(SELECT 0 as n UNION SELECT 1) p5,
(SELECT 0 as n UNION SELECT 1) p6,
(SELECT 0 as n UNION SELECT 1) p7,
(SELECT 0 as n UNION SELECT 1) p8,
(SELECT 0 as n UNION SELECT 1) p9
Order by 1
) n
WHERE types <> ''
AND weight <> ''
From reading snowlfake's documentation, it would seem like the following should work:
SELECT parse_json(parse_json(e.splits)[n.n]):split_type as types,
parse_json(parse_json(e.splits)[n.n]):weight as weight
FROM experiments e, (SELECT (p0.n ...
However I get the error "SQL compilation error: error line 1 at position 39 invalid identifier 'N.N'"
I'm wondering if someone would be able to help with this issue?
EDIT:
experiments table looks like:
exp_ID | splits
1 | [{"id":203,"weight":50,"split_type":"a"},{"id":204,"weight":50,"split_type":"control"}]
2 | [{"id":205,"weight":33.33,"split_type":"a"},{"id":206,"weight":33.33,"split_type":"b"},{"id":207,"weight":33.33,"split_type":"c"}]
3 | [{"id":208,"weight":25,"split_type":"a"},{"id":209,"weight":25,"split_type":"b"},{"id":210,"weight":25,"split_type":"c"},{"id":211,"weight":25,"split_type":"d"}]
required output:
exp_ID | ID | types | weight
1 | 203 | a | 50
1 | 204 | control | 50
2 | 205 | a | 33.33
2 | 206 | b | 33.33
2 | 207 | c | 33.33
3 | 208 | a | 25
3 | 209 | b | 25
3 | 210 | c | 25
3 | 211 | d | 25
With a table defined as
create temp table EXPERIMENTS(EMP_ID int, SPLITS variant);
You can insert rows like this (This is just for testing. Do not use single-row inserts for production pipelines):
insert into experiments select 1, parse_json('[{"id":203,"weight":50,"split_type":"a"},{"id":204,"weight":50,"split_type":"control"}]');
insert into experiments select 2, parse_json('[{"id":205,"weight":33.33,"split_type":"a"},{"id":206,"weight":33.33,"split_type":"b"},{"id":207,"weight":33.33,"split_type":"c"}]');
insert into experiments select 3, parse_json('[{"id":208,"weight":25,"split_type":"a"},{"id":209,"weight":25,"split_type":"b"},{"id":210,"weight":25,"split_type":"c"},{"id":211,"weight":25,"split_type":"d"}]');
With it stored in the table that way, you can query the JSON in columns like this:
select EXP_ID
,VALUE:id as ID
,VALUE:split_type::string as TYPES
,VALUE:weight as WEIGHT
from experiments
,lateral flatten(splits)
The article below is to demonstrate various examples of using LATERAL FLATTEN to extract information from a JSON Document. Examples are provided for its utilization together with GET_PATH, UNPIVOT, and SEQ functions.
https://community.snowflake.com/s/article/Dynamically-extracting-JSON-using-LATERAL-FLATTEN

Add row number after splitting a string field

I have a table that contains 2 fields:
ID: text
Suggestions: string (comma separated values)
I would like to make a select query that would return a new numbered rows representing each suggestion with its own number as shown in the original string
Example:
Note: this ranking must be guaranteed to be the same everytime I run the query..
Thanks
If Version of your DB is 8.0+, then with recursive cte as clause might be used as in the following select statement ( after needed DML's provided such as create table and insert statements ):
mysql> create table tab( ID int, suggestions varchar(25));
mysql> insert into tab values(1,'A,B,C');
mysql> insert into tab values(2,'D,E,F,G,H');
mysql> select q2.*,
row_number()
over
(partition by q2.id order by q2.suggestion) as number
from
(
select distinct
id,
substring_index(
substring_index(suggestions, ',', q1.nr),
',',
-1
) as suggestion
from tab
cross join
(with recursive cte as
(
select 1 as nr
union all
select 1+nr from cte where nr<10
)
select * from cte) q1
) q2;
+------+------------+--------+
| id | suggestion | number |
+------+------------+--------+
| 1 | A | 1 |
| 1 | B | 2 |
| 1 | C | 3 |
| 2 | D | 1 |
| 2 | E | 2 |
| 2 | F | 3 |
| 2 | G | 4 |
| 2 | H | 5 |
+------+------------+--------+
Find here same problem is solved.
https://gist.github.com/avoidwork/3749973
I would suggest a series of subqueries:
select id, substring_index(suggestions, ',', 1) as suggestion, 1
from example
where suggestions is not null
union all
select id, substring_index(substring_index(suggestions, ',', 2), ',', -1) as suggestion, 2
from example
where suggestions like '%,%'
union all
select id, substring_index(substring_index(suggestions, ',', 3), ',', -1) as suggestion, 3
from example
where suggestions like '%,%,%'
union all
select id, substring_index(substring_index(suggestions, ',', 4), ',', -1) as suggestion, 4
from example
where suggestions like '%,%,%,%'
union all
select id, substring_index(substring_index(suggestions, ',', 5), ',', -1) as suggestion, 5
from example
where suggestions like '%,%,%,%,%';
This can easily be extended if you have more than 5 options per id.

Custom sorting using two different columns sql

In my table, I have these two columns called year and season that i'd like to sort by. Some example of their values might be
----------------------------
| id | etc | year | season |
| 0 | ... | 2016 | FALL |
| 1 | ... | 2015 | SPRING |
| 2 | ... | 2015 | FALL |
| 3 | ... | 2016 | SPRING |
----------------------------
How would I go about performing a select where I get the results as such?
| 1 | ... | 2015 | SPRING |
| 2 | ... | 2015 | FALL |
| 3 | ... | 2016 | SPRING |
| 0 | ... | 2016 | FALL |
The easy part would be ORDER BY table.year ASC, but how do I manage the seasons now? Thanks for any tips!
You can do this:
SELECT *
FROM yourtable
ORDER BY year, CASE WHEN season = 'spring' THEN 0 ELSE 1 END;
If you want to do the same for the other two seasons, you can do the same using CASE, but it will be much easier and more readable to use a table something like this:
SELECT t1.*
FROM yourtable AS t1
INNER JOIN
(
SELECT 'spring' AS season, 0 AS sortorder
UNION
SELECT 'Fall' AS season, 1 AS sortorder
UNION
SELECT 'Winter' AS season, 2 AS sortorder
UNION
SELECT 'summer' AS season, 3 AS sortorder
) AS t2
ORDER BY t1.year, t2.season;
If you want to order by all four seasons, starting with Spring, extend your CASE statement:
ORDER BY CASE season
WHEN 'spring' then 1
WHEN 'summer' then 2
WHEN 'fall' then 3
WHEN 'autumn' then 3
WHEN 'winter then 4
ELSE 0 -- Default if an incorrect value is entered. Could be 5
END
Alternately, to handle all possible cases, you might want to build a table with the season name and a sort order. Say, for example, some of your data was in german. You could have a table - SeasonSort - with the fields SeasonName and SortOrder. Then add data:
CREATE TABLE SeasonSort (SeasonName nvarchar(32), SortOrder tinyint)
INSERT INTO SeasonSort (SeasonName, SortOrder)
VALUES
('spring', 1),
('frühling', 1),
('fruhling', 1), -- Anglicized version of German name
('summer', 2),
('sommer', 2),
('fall', 3),
('autumn', 3),
('herbst', 3),
('winter', 4) -- same in English and German
Then your query would become:
SELECT t.*
FROM MyTable t
LEFT JOIN seasonSort ss
ON t.season = ss.SeasonName
ORDER BY t.Year,
isnull(ss.SortOrder, 0)

Count two or more repeated characters in a string

I have table like this:
-----------
ID | Value
-----------
1 | AAAA
2 | ABCD
3 | AADC
4 | ABBD
I am trying to figure out how to return the number of times a string occurs in each of the Value.
So, if I want to count of time 'A' and 'B'appears, the sql statement will return like this:
-------------------
ID | Value | Count
-------------------
1 | AAAA | 0
2 | ABCD | 1
3 | AADC | 0
4 | ABBD | 2
5 | ABBB | 3
6 | AABB | 3
7 | AAAB | 3
Is there any way to do this? I do not want to use php, vb, etc. Just MySQL
Seems you want to count the values and then combine the result. I believe something like this will work for you.
SQLFiddle
SELECT
id,
value,
ROUND (
(
LENGTH(value)
- LENGTH(REPLACE(value, "A", ""))
) / LENGTH("A")
) AS count
FROM chars
UNION ALL
SELECT
id,
value,
ROUND (
(
LENGTH(value)
- LENGTH(REPLACE(value, "B", ""))
) / LENGTH("B")
) AS count
FROM chars
You can try this mate:
SELECT
ID,
Value,
LENGTH(REPLACE(Value, 'A', '')) 'count_a',
LENGTH(REPLACE(Value, 'B', '')) 'count_b'
FROM
your_table;
or this one:
SELECT
ID,
Value,
LENGTH(REPLACE(Value, IF(LENGTH(REPLACE(Value, 'A','')) = 3, 'A', 'B'), '')) 'Count',
FROM
your_table;
This one is based on the given expected result