How to transpose values in rows to columns in MySQL - mysql

This image shows how my raw table looks like:
Following are the conditions to get the transposed table from the image below:
Each row has a unique id
We only need columns for groups A,B,C in the group field and not others.
There could be single or multiple id for group A for the same app id, I need to get those rows for which date is minimum.
There could be single or multiple id for group B and C for the same app id, I need to get those rows for which date is maximum
The image below shows how my final table should look like:

Each row has a unique id
We only need columns for groups A,B,C in the group field and not others.
add this to your query
WHERE `GROUP` IN ('A','B','C')
There could be single or multiple id for group A for the same app id, I need to get those rows for which date is minimum.
add somewhere after the SELECT:
MIN(date) OVER (PARTIITON BY appid)
There could be single or multiple id for group B and C for the same app id, I need to get those rows for which date is maximum
change the added option on point 3 to:
CASE WHEN `group` IN ('B','C')
THEN MAX(date) OVER (PARTIITON BY appid)
ELSE MIN(date) OVER (PARTIITON BY appid)
END
Maybe this helps you to try and take a serious request of solving this yourself (and learn from it) in stead of asking for a solution and then do copy/paste...
BTW: Naming fiels with reserved words, like GROUP and DATE is not a very smart thing to do. A better name for the column GROUP might be CategoryGroup (or whatever this group is referring to)

I took a different approach to this. The SQL is longer but I think it's more auditable.
The main logic point is that I broke A and BC into 2 different subqueries, and used QUALIFY ROW_NUMBER() to choose the correct row, based on either ASC or DESC per your requirements.
I know you are using mysql and this might not work since I don't have an instance to test this one, but here is the SQL I got from building this logic in Rasgo, which I tested on Snowflake and it worked.
-- This splits the data into group A only
WITH CTE_A AS (
SELECT
*
FROM
{{ your_table }}
WHERE
my_group = 'A'
),
-- This splits the data into group B and C only
CTE_B AS (
SELECT
*
FROM
{{ your_table }}
WHERE
my_group IN('B', 'C')
),
-- Selecting from A only, it keeps the most recent row ASCENDING
CTE_A_FIRST AS (
SELECT
*
FROM
CTE_A QUALIFY ROW_NUMBER() OVER (
PARTITION BY APP_ID,
MY_GROUP
ORDER BY
MY_DATE ASC
) = 1
),
-- Selecting from A only, it keeps the most recent row DESCENDING
CTE_B_LAST AS (
SELECT
*
FROM
CTE_B QUALIFY ROW_NUMBER() OVER (
PARTITION BY APP_ID,
MY_GROUP
ORDER BY
MY_DATE DESC
) = 1
),
-- Here we just union A and BC back to one another
CTE_ABC AS (
SELECT
ID,
APP_ID,
MY_DATE,
MY_GROUP,
SCORE1,
SCORE2
FROM
CTE_B_LAST
UNION ALL
SELECT
ID,
APP_ID,
MY_DATE,
MY_GROUP,
SCORE1,
SCORE2
FROM
CTE_B
),
-- We pivot the date horizontally so we get a date for A B C
-- the MIN does not matter, since at this point, we only have 1
CTE_PVT_DATE AS (
SELECT
APP_ID,
B,
C,
A
FROM
(
SELECT
APP_ID,
MY_DATE,
MY_GROUP
FROM
CTE_ABC
) PIVOT (
MIN (MY_DATE) FOR MY_GROUP IN ('B', 'C', 'A')
) as p (APP_ID, B, C, A)
),
-- We pivot the SCORE1 horizontally so we get a date for A B C
-- the MIN does not matter, since at this point, we only have 1
CTE_PVT_SCORE1 AS (
SELECT
APP_ID,
B,
C,
A
FROM
(
SELECT
APP_ID,
SCORE1,
MY_GROUP
FROM
CTE_ABC
) PIVOT (
MIN (SCORE1) FOR MY_GROUP IN ('B', 'C', 'A')
) as p (APP_ID, B, C, A)
),
-- We pivot the SCORE2 horizontally so we get a date for A B C
-- the MIN does not matter, since at this point, we only have 1
CTE_PVT_SCORE2 AS (
SELECT
APP_ID,
B,
C,
A
FROM
(
SELECT
APP_ID,
SCORE2,
MY_GROUP
FROM
CTE_ABC
) PIVOT (
MIN (SCORE2) FOR MY_GROUP IN ('B', 'C', 'A')
) as p (APP_ID, B, C, A)
),
-- We join the subqueries above together on the APP_IDs
CTE_JOINED AS (
SELECT
t0.*,
t1.APP_ID as SCORE1_APP_ID,
t1.B as SCORE1_B,
t1.C as SCORE1_C,
t1.A as SCORE1_A,
t2.APP_ID as SCORE2_APP_ID,
t2.B as SCORE2_B,
t2.C as SCORE2_C,
t2.A as SCORE2_A
FROM
CTE_PVT_DATE t0
INNER JOIN CTE_PVT_SCORE1 t1 ON t0.APP_ID = t1.APP_ID
INNER JOIN CTE_PVT_SCORE2 t2 ON t0.APP_ID = t2.APP_ID
)
-- The final select is really just renaming ...
-- the magic has already happened
SELECT
A AS DATE_A,
B AS DATE_B,
C AS DATE_C,
APP_ID,
SCORE1_B,
SCORE1_C,
SCORE1_A,
SCORE2_B,
SCORE2_C,
SCORE2_A
FROM
CTE_JOINED

I'll roll out my attempt along several steps and then show you the full solution made up of these steps, so that you can understand it piece by piece, given the following definition of your input table:
CREATE TABLE tab(
id INT,
app_id INT,
date VARCHAR(20),
group VARCHAR(20),
score1 INT,
score2 INT
);
STEP 1. Formatting date using a proper DATE format ("YYYY-MM-DD"). For this purpose the function STR_TO_DATE can come in handy.
WITH formatted_tab AS (
SELECT id,
app_id,
STR_TO_DATE(date, '%m/%d/%Y') AS date,
group,
score1,
score2
FROM tab
)
STEP 2. Extracting the useful dates according to the group field. As long as you treat group "A" differently with respect to group "B" and "C" specifically, the idea here is to address each group with a different query, where
in the former case the MIN aggregation function is applied,
in the latter case the MAX aggregation function is applied,
Then the two output result sets are combined with a UNION operation.
(
SELECT app_id,
MIN(date) AS date,
group
FROM formatted_tab
WHERE group IN ('A')
GROUP BY app_id,
group
UNION
SELECT app_id,
MAX(date) AS date,
group
FROM formatted_tab
WHERE group IN ('B', 'C')
GROUP BY app_id,
group
) needed_dates
STEP 3. Getting back scores corresponding to group and date field. This is done with a simple INNER JOIN between the last generated table and the formatted table.
(
SELECT needed_dates.*,
formatted_tab.score1,
formatted_tab.score2
FROM needed_dates
INNER JOIN formatted_tab
ON needed_dates.app_id = formatted_tab.app_id
AND needed_dates.date = formatted_tab.date
AND needed_dates.group = formatted_tab.group
) needed_infos
STEP 4. Pivoting the table exploiting MySQL tools like:
the IF statement to retrieve the values corresponding to a specific group
the MAX aggregation function, to aggregate on the same group
These tools are applied for each group you specified ('A', 'B' and 'C').
SELECT app_id,
MAX(IF(group='A', date , NULL)) AS date_groupA,
MAX(IF(group='B', date , NULL)) AS date_groupB,
MAX(IF(group='C', date , NULL)) AS date_groupC,
MAX(IF(group='A', score1, NULL)) AS score1_groupA,
MAX(IF(group='A', score2, NULL)) AS score2_groupA,
MAX(IF(group='B', score1, NULL)) AS score1_groupB,
MAX(IF(group='B', score2, NULL)) AS score2_groupB,
MAX(IF(group='C', score1, NULL)) AS score1_groupC,
MAX(IF(group='C', score2, NULL)) AS score2_groupC
FROM needed_infos
GROUP BY app_id
Full attempt. This is the combination of the previous snippets. The only difference is the presence of backticks for the field names, that avoid MySQL to misunderstand them with MySQL private keywords like "date" (indicating the DATE type), "group" (use as keyword in the GROUP BY clause) or similar.
WITH `formatted_tab` AS (
SELECT `id`,
`app_id`,
STR_TO_DATE(`date`, '%m/%d/%Y') AS `date`,
`group`,
`score1`,
`score2`
FROM `tab`
)
SELECT `app_id`,
MAX(IF(`group`='A', `date` , NULL)) AS date_groupA,
MAX(IF(`group`='B', `date` , NULL)) AS date_groupB,
MAX(IF(`group`='C', `date` , NULL)) AS date_groupC,
MAX(IF(`group`='A', `score1`, NULL)) AS score1_groupA,
MAX(IF(`group`='A', `score2`, NULL)) AS score2_groupA,
MAX(IF(`group`='B', `score1`, NULL)) AS score1_groupB,
MAX(IF(`group`='B', `score2`, NULL)) AS score2_groupB,
MAX(IF(`group`='C', `score1`, NULL)) AS score1_groupC,
MAX(IF(`group`='C', `score2`, NULL)) AS score2_groupC
FROM ( SELECT needed_dates.*,
formatted_tab.score1,
formatted_tab.score2
FROM ( SELECT `app_id`,
MIN(`date`) AS `date`,
`group`
FROM `formatted_tab`
WHERE `group` IN ('A')
GROUP BY `app_id`,
`group`
UNION
SELECT `app_id`,
MAX(`date`) AS `date`,
`group`
FROM `formatted_tab`
WHERE `group` IN ('B', 'C')
GROUP BY `app_id`,
`group`
) needed_dates
INNER JOIN formatted_tab
ON needed_dates.app_id = formatted_tab.app_id
AND needed_dates.date = formatted_tab.date
AND needed_dates.group = formatted_tab.group
) needed_infos
GROUP BY `app_id`
You'll find a tested SQL Fiddle here.

Related

Fastest way to join 2 tables to get the most recent record

I have 2 tables:
First one is bom
Article
AB
CD
EF
GH
CREATE TABLE bom
(
Article VARCHAR(250)
);
INSERT INTO bom (Article)
VALUES
('AB'),
('CD'),
('EF'),
('GH');
Second one is purchases
Article
OrderDate
Price
AB
'2020-01-10'
12
AB
'2020-01-05'
10
AB
'2020-01-03'
8
EF
'2020-01-01'
7
CREATE TABLE purchases
(
Article VARCHAR(250),
OrderDate DATE,
Price DOUBLE
);
INSERT INTO purchases (Article, OrderDate, Price)
VALUES
('AB', '2020-01-10', 12.0),
('AB', '2020-01-05', 10.0),
('AB', '2020-01-03', 8.0),
('EF', '2020-01-01', 7.0);
I want to extract the most recent price for each row of Article at a given date.
For instance, at #evalDay = '2020-01-04', I want to get
Article
OrderDate
Price
AB
'2020-01-03'
8
EF
'2020-01-01'
7
I've managed it to work using a window function (row_number() over), but the performance is not as good as I need. This is a simplified example, but my bom table has a few hundred of rows, whereas the purchases has about 1 million rows. On my computer, it takes approx. 50ms to execute. Of course I use indexes and compound indexes.
My solution:
set #evalDay = '2020-01-04';
with cte (Article, OrderDate, Price, rn) as (
select purchases.*,
row_number() over (
partition by bom.article
order by purchases.OrderDate desc
) as rn
from bom
join purchases on bom.Article = purchases.Article
where purchases.OrderDate <= #evalDay
)
select *
from cte
where rn = 1;
In this case, what's the fastest approach to get the answer?
I would try the following approaches:
Move the join to outside
with cte (Article, OrderDate, Price, rn) as (
select *,
row_number() over (partition by article order by OrderDate desc) as rn
from purchases
where OrderDate <= #evalDay)
select cte.*, bom.*
from cte
join bom
on cte.Article = bom.Article
where cte.rn = 1;
Remove the join if no additional columns needed from bom
with cte (Article, OrderDate, Price, rn) as (
select *,
row_number() over (partition by article order by OrderDate desc) as rn
from purchases
where OrderDate <= #evalDay)
select *
from cte
where rn = 1;
If the above still doesn't perform, consider creating a table to store the result (Article,OrderOdate,Price,evalDate) partitioned by evalDate.
If you have more than one row in purchases which has the same latest date for the same Article, this might not give you what you want but it's a fairly simple query....
set #evalDay = '2020-01-04';
Select a.*
From purchases a,
(select Article, Max(OrderDate) AS ODate
from purchases
where OrderDate <= #evalDay
group by Article) b
Where a.Article = b.Article
And a.OrderDate = b.ODate;

Find the latest rows by filtering the status

I have a table called person_list. The data is,
Insert into person_list(person_allocation_id, person_id, created_datetime, boss_user_name, allocation_status_id) values
(111008, 1190016, '2021-01-05 11:09:25', 'Rajesh', '2'),
(111007, 1190015, '2020-12-12 09:23:31', 'Sushmita', '2'),
(111006, 1190014, '2020-12-11 10:48:26', '', '3'),
(111005, 1190014, '2020-12-10 13:46:15', 'Rangarao', '2'),
(111004, 1190014, '2020-12-10 13:36:10', '', '3');
Here person_allocation_id is the primary key.
person_id may be duplicated some times.
All of these rows are sorted by person_allocation_id (in descending order)
Now, I would like to filter the rows which are having allocation_status_id = '2' and boss_user_name should be non-empty for the person_id.
The difficulty here is that I have to exclude the row if the person_id is having allocation_status_id = '3' as their latest status (according to date).
I am unable to understand how could I compare the dates in one row with another in the previous row.
So finally I should get only 2 rows in my final result set (person_allocation_id are 111008 and 111007).
Somehow I achieved this in Oracle.
select person_id, person_allocation_id, create_datetime, boss_user_name, allocation_status_id
from (
select person_id, person_allocation_id, create_datetime, boss_user_name, allocation_status_id,
rank() over (partition by person_id order by create_datetime desc) rnk
from person_list
where allocation_status_id = '2')
where rnk = 1;
But, I need this for MySql DB. Anyone, please help?
Thanks.
SELECT t1.*
FROM person_list t1
JOIN ( SELECT MAX(t2.person_allocation_id) person_allocation_id, t2.person_id
FROM person_list t2
GROUP BY t2.person_id ) t3 USING (person_allocation_id, person_id)
WHERE t1.allocation_status_id = '2'
fiddle
Add more conditions to WHERE clause if needed (for example, AND boss_user_name != '').
You can use a correlated subquery to get the latest allocation_status_id value per person_id:
select person_allocation_id
, person_id
, created_datetime
, boss_user_name
, allocation_status_id
from (
select person_allocation_id
, person_id
, created_datetime
, boss_user_name
, allocation_status_id
, (select pl2.allocation_status_id
from person_list pl2
where pl2.person_id = pl.person_id
order by pl2.created_datetime desc
limit 1) latest_allocation_status_id
from person_list pl) t
where
allocation_status_id = '2' and latest_allocation_status_id <> '3'
and boss_user_name <> ''
The outer query is able to check the latest status and return the expected result set. The query works for MySQL 5.7
Demo here
As a side note, for MySQL 8.0 you can replace the correlated subquery with a window function:
last_value(allocation_status_id) over (partition by person_id
order by created_datetime desc)
Demo for window function

How many different ways are there to get the second row in a SQL search?

Let's say I was looking for the second most highest record.
Sample Table:
CREATE TABLE `my_table` (
`id` int(2) NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`value` int(10),
PRIMARY KEY (`id`)
);
INSERT INTO `my_table` (`id`, `name`, `value`) VALUES (NULL, 'foo', '200'), (NULL, 'bar', '100'), (NULL, 'baz', '0'), (NULL, 'quux', '300');
The second highest value is foo. How many ways can you get this result?
The obvious example is:
SELECT name FROM my_table ORDER BY value DESC LIMIT 1 OFFSET 1;
Can you think of other examples?
I was trying this one, but LIMIT & IN/ALL/ANY/SOME subquery is not supported.
SELECT name FROM my_table WHERE value IN (
SELECT MIN(value) FROM my_table ORDER BY value DESC LIMIT 1
) LIMIT 1;
Eduardo's solution in standard SQL
select *
from (
select id,
name,
value,
row_number() over (order by value) as rn
from my_table t
) t
where rn = 1 -- can pick any row using this
This works on any modern DBMS except MySQL. This solution is usually faster than solutions using sub-selects. It also can easily return the 2nd, 3rd, ... row (again this is achievable with Eduardo's solution as well).
It can also be adjusted to count by groups (adding a partition by) so the "greatest-n-per-group" problem can be solved with the same pattern.
Here is a SQLFiddle to play around with: http://sqlfiddle.com/#!12/286d0/1
This only works for exactly the second highest:
SELECT * FROM my_table two
WHERE EXISTS (
SELECT * FROM my_table one
WHERE one.value > two.value
AND NOT EXISTS (
SELECT * FROM my_table zero
WHERE zero.value > one.value
)
)
LIMIT 1
;
This one emulates a window function rank() for platforms that don't have them. It can also be adapted for ranks <> 2 by altering one constant:
SELECT one.*
-- , 1+COALESCE(agg.rnk,0) AS rnk
FROM my_table one
LEFT JOIN (
SELECT one.id , COUNT(*) AS rnk
FROM my_table one
JOIN my_table cnt ON cnt.value > one.value
GROUP BY one.id
) agg ON agg.id = one.id
WHERE agg.rnk=1 -- the aggregate starts counting at zero
;
Both solutions need functional self-joins (I don't know if mysql allows them, IIRC it only disallows them if the table is the target for updates or deletes)
The below one does not need window functions, but uses a recursive query to enumerate the rankings:
WITH RECURSIVE agg AS (
SELECT one.id
, one.value
, 1 AS rnk
FROM my_table one
WHERE NOT EXISTS (
SELECT * FROM my_table zero
WHERE zero.value > one.value
)
UNION ALL
SELECT two.id
, two.value
, agg.rnk+1 AS rnk
FROM my_table two
JOIN agg ON two.value < agg.value
WHERE NOT EXISTS (
SELECT * FROM my_table nx
WHERE nx.value > two.value
AND nx.value < agg.value
)
)
SELECT * FROM agg
WHERE rnk = 2
;
(the recursive query will not work in mysql, obviously)
You can use inline initialization like this:
select * from (
select id,
name,
value,
#curRank := #curRank + 1 AS rank
from my_table t, (SELECT #curRank := 0) r
order by value desc
) tb
where tb.rank = 2
SELECT name
FROM my_table
WHERE value < (SELECT max(value) FROM my_table)
ORDER BY value DESC
LIMIT 1
SELECT name
FROM my_table
WHERE value = (
SELECT min(r.value)
FROM (
SELECT name, value
FROM my_table
ORDER BY value DESC
LIMIT 2
) r
)
LIMIT 1

Balancing out MYSQL select statements

I inserted 'vanity_name' and 'name' into the first and second SELECT statements respectively.
I get a mismatched number of columns error, which I'm confused about because I added a column to both select statements to maintain a balance.
SQL Statement:
SELECT id,
vanity_name,
Date_format(DATE, '%M %e, %Y') AS DATE,
TYPE
FROM (SELECT resume_id AS id,
date_mod AS DATE,
'resume' AS TYPE
FROM resumes
WHERE user_id = '1'
UNION ALL
SELECT profile_id,
name,
date_mod AS DATE,
'profile'
FROM profiles
WHERE user_id = '1'
ORDER BY DATE DESC
LIMIT
5) AS d1
ORDER BY DATE DESC
Erm, you have four columns in your outer select, three in the inner select.
id, vanity_name, date, type
vs.
id, date, TYPE
Based on the parenthesis, you're trying to union:
(SELECT resume_id AS id, date_mod AS date, 'resume' AS TYPE FROM resumes WHERE user_id = '1'
with
SELECT profile_id,name,date_mod AS date, 'profile' FROM profiles ... LIMIT 5)
and they obviously don't match. Reposition your parens.

mysql self join

I have a table called receiving with 4 columns:
id, date, volume, volume_units
The volume units are always stored as a value of either "Lbs" or "Gals".
I am trying to write an SQL query to get the sum of the volumes in Lbs and Gals for a specific date range. Something along the lines of: (which doesn't work)
SELECT sum(p1.volume) as lbs,
p1.volume_units,
sum(p2.volume) as gals,
p2.volume_units
FROM receiving as p1, receiving as p2
where p1.volume_units = 'Lbs'
and p2.volume_units = 'Gals'
and p1.date between "2012-01-01" and "2012-03-07"
and p2.date between "2012-01-01" and "2012-03-07"
When I run these queries separately the results are way off. I know the join is wrong here, but I don't know what I am doing wrong to fix it.
SELECT SUM(volume) AS total_sum,
volume_units
FROM receiving
WHERE `date` BETWEEN '2012-01-01'
AND '2012-03-07'
GROUP BY volume_units
You can achieve this in one query by using IF(condition,then,else) within the SUM:
SELECT SUM(IF(volume_units="Lbs",volume,0)) as lbs,
SUM(IF(volume_units="Gals",volume,0)) as gals,
FROM receiving
WHERE `date` between "2012-01-01" and "2012-03-07"
This only adds volume if it is of the right unit.
This query will display the totals for each ID.
SELECT s.`id`,
CONCAT(s.TotalLbsVolume, ' ', 'lbs') as TotalLBS,
CONCAT(s.TotalGalVolume, ' ', 'gals') as TotalGAL
FROM
(
SELECT `id`, SUM(`volume`) as TotalLbsVolume
FROM Receiving a INNER JOIN
(
SELECT `id`, SUM(`volume`) as TotalGalVolume
FROM Receiving
WHERE (volume_units = 'Gals') AND
(`date` between '2012-01-01' and '2012-03-07')
GROUP BY `id`
) b ON a.`id` = b.`id`
WHERE (volume_units = 'Lbs') AND
(`date` between '2012-01-01' and '2012-03-07')
GROUP BY `id`
) s
this is a cross join with no visible condition on the join, i don't think you meant that
if you want to sum quantities you don't need to join at all, just group as zerkms did
You can simply group by date and volume_units without self-join.
SELECT date, volume_units, sum(volume) sum_vol
FROM receving
WHERE date between "2012-01-01" and "2012-03-07"
GROUP BY date, volume_units
Sample test:
select d, vol_units, sum(vol) sum_vol
from
(
select 1 id, '2012-03-07' d, 1 vol, 'lbs' vol_units
union
select 2 id, '2012-03-07' d, 2 vol, 'Gals' vol_units
union
select 3 id, '2012-03-08' d, 1 vol, 'lbs' vol_units
union
select 4 id, '2012-03-08' d, 2 vol, 'Gals' vol_units
union
select 5 id, '2012-03-07' d, 10 vol, 'lbs' vol_units
) t
group by d, vol_units