Get specific values from same column within grouped rows - mysql

This is a problem for which I have a working query, but it feels horribly inefficient to me and I'd like some help constructing a better one. This is going into a live production environment, and the number of queries the db handles each day is incredibly high, so the more efficient this can be, the better. I have a table structured something like this (stripped to just the relevant parts):
id | type | datecolumn
1 | A | 2014-01-01
1 | B | 0000-00-00
2 | A | 2014-01-02
2 | B | 2014-01-10
3 | A | 2014-01-01
3 | B | 0000-00-00
There will always be two rows for each id, one of type A and one of type B. A will always have a valid date, and B will either have a date >= that of A, or all 0s. What I want is a query that will produce output similar to this:
id | date for A | date for B
1 | 2014-01-01 | None
2 | 2014-01-02 | 2014-01-10
3 | 2014-01-01 | None
The way I'm doing this now is as follows:
SELECT
id,
IF(MIN(datecolumn) > 0, MIN(datecolumn), MAX(datecolumn)) AS 'date for A',
IF(MIN(datecolumn) > 0, MAX(datecolumn), 'None') AS 'date for B'
GROUP BY id
But it really feels like I should be able to pluck the datecolumn value on a by-type basis somehow. I know the simplest solution should be to change the table structure so that each id only uses one row, but I'm afraid that is not possible in this case; there has to be two rows. Is there a way to leverage the type column properly in this query?
Edit: Also, this is on a table that will have upwards of 10,000,000 rows. So again, efficiency is key.

I'd stick with what you've go, but maybe write it this way...
CREATE TABLE my_table
(id INT NOT NULL
,type CHAR(1) NOT NULL
,datecolumn DATE NOT NULL DEFAULT '0000-00-00'
,PRIMARY KEY(id,type)
);
INSERT INTO my_table VALUES
(1 ,'A','2014-01-01'),
(1 ,'B','0000-00-00'),
(2 ,'A','2014-01-02'),
(2 ,'B','2014-01-10'),
(3 ,'A','2014-01-01'),
(3 ,'B','0000-00-00');
SELECT id
, MAX(CASE WHEN type = 'A' THEN datecolumn END) a
, MAX(REPLACE(CASE WHEN type='B' THEN datecolumn END,'0000-00-00','none')) b
FROM my_table
GROUP
BY id;
+----+------------+------------+
| id | a | b |
+----+------------+------------+
| 1 | 2014-01-01 | none |
| 2 | 2014-01-02 | 2014-01-10 |
| 3 | 2014-01-01 | none |
+----+------------+------------+

Make sure you have an index that covers both the id and type columns (e.g ALTER TABLE tbl ADD INDEX (type,id)), then do:
SELECT
table_a.id,
table_a.datecolumn AS 'date for A',
IF(table_b.datecolumn > 0, table_b.datecolumn, 'None') AS 'date for B'
FROM tbl AS table_a
JOIN tbl AS table_b ON table_a.id = table_b.id AND table_b.type = 'B'
WHERE table_a.type = 'A';

Related

Joining table to itself with multiple join criteria logic

I'm trying to understand the logic behind the syntax below. Based on the following question, table and syntax:
Write a query that'll identify returning active users. A returning active user is a user that has made a second purchase within 7 days of any other of their purchases. Output a list of user_ids of these returning active users.
Column + Data Type:
id: int | user_id: int | item: varchar |created_at: datetime | revenue: int
SELECT DISTINCT(a1.user_id)
FROM amazon_transactions a1
JOIN amazon_transactions a2 ON a1.user_id=a2.user_id
AND a1.id <> a2.id
AND a2.created_at::date-a1.created_at::date BETWEEN 0 AND 7
ORDER BY a1.user_id
Why does the table need to be joined to itself in this case?
How does 'AND a1.id <> a2.id' portion of syntax contribute to the join?
You are looking for users that have 2 records on that table whose date distance is lower (or equal) than 7 days
To accomplish this, you treat the table as if it were 2 different (but equal tables) because you have to match a row on the first table with a row on the second table
Of course you don't want to match a row with itself, so
AND a1.id <> a2.id
accomplishes that
The table needs to be joined with itself because, you just have one table, and you want to find out returning users (by comparing the duration between transaction dates for the same user).
AND a1.id <> a2.id portion of the syntax removes the same transactions, i.e. prevents the transactions with the same id to be included in the joined table.
There are two scenarios I can think of based on the id column values. Are id column values generated based on timely sequence ? If so, to answer your first question ,we can but don't have to use join syntax. Here is how to achieve your goal using a correlated subquery , with sample data created.
create table amazon_transactions(id int , user_id int , item varchar(20),created_at datetime , revenue int);
insert amazon_transactions (id,user_id,created_at) values
(1,1,'2020-01-05 15:33:22'),
(2,2,'2020-01-05 16:33:22'),
(3,1,'2020-01-08 18:33:22'),
(4,1,'2020-01-22 17:33:22'),
(5,2,'2020-02-05 15:33:22'),
(6,2,'2020-03-05 15:33:22');
select * from amazon_transactions;
-- sample set:
| id | user_id | item | created_at | revenue |
+------+---------+------+---------------------+---------+
| 1 | 1 | NULL | 2020-01-05 15:33:22 | NULL |
| 2 | 2 | NULL | 2020-01-05 16:33:22 | NULL |
| 3 | 1 | NULL | 2020-01-08 18:33:22 | NULL |
| 4 | 1 | NULL | 2020-01-22 17:33:22 | NULL |
| 5 | 2 | NULL | 2020-02-05 15:33:22 | NULL |
| 6 | 2 | NULL | 2020-03-05 15:33:22 | NULL |
-- Here is the answer using a correlated subquery:
select distinct user_id
from amazon_transactions t
where datediff(
(select created_at from amazon_transactions where user_id=t.user_id and id-t.id>=1 limit 1 ),
created_at
)<=7
;
-- result:
| user_id |
+---------+
| 1 |
However,what if the id values are NOT transaction time based? Then the id values are not at all helpful in our requirement. In this case, a JOIN is more capable than a correlated subquery and we need to arrange the order based on transaction time for each user in order to make the necessary join condition. And to answer your second question, the AND a1.id <> a2.id portion of syntax contribute by excluding two of the same transaction making a pair. However, to my understanding the matching scope is too high to be effective. We only care if CONSECUTIVE transactions have a within-7-day gap, but the AND a1.id <> a2.id overdoes the job. For instance, we want to check the gap between transaction1 and transaction2,transaction2 and transaction3, NOT transaction1 and transaction3
Note: by using the user variable row_id trick, we can produce the row id which is used to match consecutive transactions for each user, thus eliminating the wasteful job of random transaction check.
select distinct t1.user_id
from
(select user_id,created_at,#row_id:=#row_id+1 as row_id
from amazon_transactions ,(select #row_id:=0) t
order by user_id,created_at)t1
join
(select user_id,created_at,#row_num:=#row_num+1 as row_num
from amazon_transactions ,(select #row_num:=0) t
order by user_id,created_at)t2
on t1.user_id=t2.user_id and t2.row_num-t1.row_id=1 and datediff(t2.created_at,t1.created_at)<=7
;
-- result
| user_id |
+---------+
| 1 |

Mysql IN function

class_table
+----+-------+--------------+
| id |teac_id| student_id |
+----+-------+--------------+
| 1 | 1 | 1,2,3,4 |
+----+-------+--------------+
student_mark
+----+----------+--------+
| id |student_id| marks |
+----+----------+--------+
| 1 | 1 | 12 |
+----+----------+--------+
| 2 | 2 | 80 |
+----+----------+--------+
| 3 | 3 | 20 |
+----+----------+--------+
I have these two tables and i want to calculate the total marks of student and my sql is:
SELECT SUM(`marks`)
FROM `student_mark`
WHERE `student_id` IN
(SELECT `student_id` FROM `class_table` WHERE `teac_id` = '1')
But this will return null, please help!!
DB fiddle
Firstly, you should never store comma separated data in your column. You should really normalize your data. So basically, you could have a many-to-many table mapping teacher_to_student, which will have teac_id and student_id columns.
In this particular case, you can utilize Find_in_set() function.
From your current query, it seems that you are trying to getting total marks for a teacher (summing up marks of all his/her students).
Try:
SELECT SUM(sm.`marks`)
FROM `student_mark` AS sm
JOIN `class_table` AS ct
ON FIND_IN_SET(sm.`student_id`, ct.`student_id`) > 0
WHERE ct.`teac_id` = '1'
In case, you want to get total marks per student, you would need to add a Group By. The query would look like:
SELECT sm.`student_id`,
SUM(sm.`marks`)
FROM `student_mark` AS sm
JOIN `class_table` AS ct
ON FIND_IN_SET(sm.`student_id`, ct.`student_id`) > 0
WHERE ct.`teac_id` = '1'
GROUP BY sm.`student_id`
Just in case you want to know why, The reason it returned null is because the subquery returned as '1,2,3,4' as a whole. What you need is to make it returned 1,2,3,4 separately.
What your query returned
SELECT SUM(`marks`)
FROM `student_mark`
WHERE `student_id` IN ('1,2,3,4')
What you expect is
SELECT SUM(`marks`)
FROM `student_mark`
WHERE `student_id` IN (1,2,3,4)
The best way is it normalize as #madhur said. In your case you need to make the teacher and student as one to many link
+----+-------+--------------+
| id |teac_id| student_id |
+----+-------+--------------+
| 1 | 1 | 1 |
+----+-------+--------------+
| 2 | 1 | 2 |
+----+-------+--------------+
| 3 | 1 | 3 |
+----+-------+--------------+
| 4 | 1 | 4 |
+----+-------+--------------+
If you want to filter your table based on a comma separated list with ID, my approach is to
append extra commas at the beginning and at the end of a list as well as at the beginning and at the end of an ID, eg.
1 becomes ,1, and list would become ,1,2,3,4,. The reason for that is to avoid ambigious matches like 1 matches 21 or 12 in a list.
Also, EXISTS is well-suited in that situation, which together with INSTR function should work:
SELECT SUM(`marks`)
FROM `student_mark` sm
WHERE EXISTS(SELECT 1 FROM `class_table`
WHERE `teac_id` = '1' AND
INSTR(CONCAT(',', student_id, ','), CONCAT(',', sm.student_id, ',')) > 0)
Demo
BUT you shouldn't store related IDs in one cell as comma separated list - it should be foreign key column to form proper relation. Joins would become trivial then.

Transposing rows into columns (MySQL)

So, lets say I have a table called "imports" that looks like this:
| id | importer_id | total_m | total_f |
|====|=============|=========|=========|
| 1 | 1 | 100 | 200 |
| 1 | 1 | 0 | 200 |
And I need the query to return it pivoted or transposed (rows to columns) in this way:
| total_m | sum(total_m) |
| total_f | sum(total_f) |
I can't think on a way to do this without using another table (maybe a temporary table?) and using unions, but there should be a better way to this anyway (maybe with CASE or IF?).
Thanks in advance.
select 'total_m', sum(total_m) from imports
union
select 'total_f', sum(total_f) from imports
http://sqlfiddle.com/#!9/fc1c0/2/0
You can "unpivot" by first expanding the number of rows, which is done below by cross joining a 2 row subquery. Then on each of those rows use relevant case expression conditions to align the former columns to the new rows ("conditional aggregates").
SQL Fiddle
MySQL 5.6 Schema Setup:
CREATE TABLE imports
(`id` int, `importer_id` int, `total_m` int, `total_f` int)
;
INSERT INTO imports
(`id`, `importer_id`, `total_m`, `total_f`)
VALUES
(1, 1, 100, 200),
(1, 1, 0, 200)
;
Query 1:
select
*
from (
select
i.importer_id
, concat('total_',cj.unpiv) total_type
, sum(case when cj.unpiv = 'm' then total_m
when cj.unpiv = 'f' then total_f else 0 end) as total
from imports i
cross join (select 'm' as unpiv union all select 'f') cj
group by
i.importer_id
, cj.unpiv
) d
Results:
| importer_id | total_type | total |
|-------------|------------|-------|
| 1 | total_f | 400 |
| 1 | total_m | 100 |

SELECT N rows before and after the row matching the condition?

The behaviour I want to replicate is like grep with -A and -B flags .
eg grep -A 2 -B 2 "hello" myfile.txt will give me all the lines which have "hello" in them, but also 2 lines before and 2 lines after it.
Lets assume this table schema :
+--------+-------------------------+
| id | message |
+--------+-------------------------+
| 1 | One tow three |
| 2 | No error in this |
| 3 | My testing message |
| 4 | php module test |
| 5 | hello world |
| 6 | team spirit |
| 7 | puzzle game |
| 8 | social game |
| 9 | stackoverflow |
|10 | stackexchange |
+------------+---------------------+
Now a query like :
Select * from theTable where message like '%hello%' will result in :
5 | hello world
How can I put another parameter "N" which selects N rows before, and N rows after the matched record i.e. for N = 2, the result should be :
| 3 | My testing message |
| 4 | php module test |
| 5 | hello world |
| 6 | team spirit |
| 7 | puzzle game |
For simplicity assume 'like %TERM%' matches only 1 row .
Here the result is supposed to be sorted on auto-increment id field.
Right, this works for me:
SELECT child.*
FROM stack as child,
(SELECT idstack FROM stack WHERE message LIKE '%hello%') as parent
WHERE child.idstack BETWEEN parent.idstack-2 AND parent.idstack+2;
Don't know if this is at all valid MySQL but how about
SELECT t.*
FROM theTable t
INNER JOIN (
SELECT id FROM theTable where message like '%hello%'
) id ON id.id <= t.id
ORDER BY
ID DESC
LIMIT 3
UNION ALL
SELECT t.*
FROM theTable t
INNER JOIN (
SELECT id FROM theTable where message like '%hello%'
) id ON id.id > t.id
ORDER BY
ID
LIMIT 2
Try this simple one (edited) -
CREATE TABLE messages(
id INT(11) DEFAULT NULL,
message VARCHAR(255) DEFAULT NULL
);
INSERT INTO messages VALUES
(1, 'One tow three'),
(2, 'No error in this'),
(3, 'My testing message'),
(4, 'php module test'),
(5, 'hello world'),
(6, 'team spirit'),
(7, 'puzzle game'),
(8, 'social game'),
(9, 'stackoverflow'),
(10, 'stackexchange');
SET #text = 'hello world';
SELECT id, message FROM (
SELECT m.*, #n1:=#n1 + 1 num, #n2:=IF(message = #text, #n1, #n2) pos
FROM messages m, (SELECT #n1:=0, #n2:=0) n ORDER BY m.id
) t
WHERE #n2 >= num - 2 AND #n2 <= num + 2;
+------+--------------------+
| id | message |
+------+--------------------+
| 3 | My testing message |
| 4 | php module test |
| 5 | hello world |
| 6 | team spirit |
| 7 | puzzle game |
+------+--------------------+
N value can be specified as user variable; currently it is - '2'.
This query works with row numbers, and this guarantees that the nearest records will be returned.
Try
Select * from theTable
Where id >=
(Select id - variableHere from theTable where message like '%hello%')
Order by id
Limit (variableHere * 2) + 1
(MS SQL Server only)
The most reliable way would be to use the row_number function that way it doesn't matter if there are gaps in the id. This will also work if there are multiple occurances of the search result and properly return two above and below each result.
WITH
srt AS (
SELECT ROW_NUMBER() OVER (ORDER BY id) AS int_row, [id]
FROM theTable
),
result AS (
SELECT int_row - 2 AS int_bottom, int_row + 2 AS int_top
FROM theTable
INNER JOIN srt
ON theTable.id = srt.id
WHERE ([message] like '%hello%')
)
SELECT theTable.[id], theTable.[message]
FROM theTable
INNER JOIN srt
ON theTable.id = srt.id
INNER JOIN result
ON srt.int_row >= result.int_bottom
AND srt.int_row <= result.int_top
ORDER BY srt.int_row
Adding an answer using date instead of an id.
The use-case here is an on-call rotation table with one record pr week.
Due to edits the id might be out of order for the purpose intended.
Any use-case having several records pr week, pr date or other will of course have to be mended.
+----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| startdate| datetime | NO | | NULL | |
| person | int(11) | YES | MUL | NULL | |
+----------+--------------+------+-----+---------+----------------+
The query:
SELECT child.*
FROM rota-table as child,
(SELECT startdate
FROM rota-table
WHERE YEARWEEK(startdate, 3) = YEARWEEK(now(), 3) ) as parent
WHERE
YEARWEEK(child.startdate, 3) >= YEARWEEK(NOW() - INTERVAL 25 WEEK, 3)
AND YEARWEEK(child.startdate, 3) <= YEARWEEK(NOW() + INTERVAL 25 WEEK, 3)

Sorting some rows by average with SQL

All right, so here's a challenge for all you SQL pros:
I have a table with two columns of interest, group and birthdate. Only some rows have a group assigned to them.
I now want to print all rows sorted by birthdate, but I also want all rows with the same group to end up next to each other. The only semi-sensible way of doing this would be to use the groups' average birthdates for all the rows in the group when sorting. The question is, can this be done with pure SQL (MySQL in this instance), or will some scripting logic be required?
To illustrate, with the given table:
id | group | birthdate
---+-------+-----------
1 | 1 | 1989-12-07
2 | NULL | 1990-03-14
3 | 1 | 1987-05-25
4 | NULL | 1985-09-29
5 | NULL | 1988-11-11
and let's say that the "average" of 1987-05-25 and 1989-12-07 is 1988-08-30 (this can be found by averaging the UNIX timestamp equivalents of the dates and then converting back to a date. This average doesn't have to be completely correct!).
The output should then be:
id | group | birthdate | [sort_by_birthdate]
---+-------+------------+--------------------
4 | NULL | 1985-09-29 | 1985-09-29
3 | 1 | 1987-05-25 | 1988-08-30
1 | 1 | 1989-12-07 | 1988-08-30
5 | NULL | 1988-11-11 | 1988-11-11
2 | NULL | 1990-03-14 | 1990-03-14
Any ideas?
Cheers,
Jon
I normally program in T-SQL, so please forgive me if I don't translate the date functions perfectly to MySQL:
SELECT
T.id,
T.group
FROM
Some_Table T
LEFT OUTER JOIN (
SELECT
group,
'1970-01-01' +
INTERVAL AVG(DATEDIFF('1970-01-01', birthdate)) DAY AS avg_birthdate
FROM
Some_Table T2
GROUP BY
group
) SQ ON SQ.group = T.group
ORDER BY
COALESCE(SQ.avg_birthdate, T.birthdate),
T.group