mysql: need null fields when join doesn't match - mysql

I´m strugling with these two tables on MySQL.
select emp_id, contrato_id,date from organização_rh;
+--------+-------------+------------+
| emp_id | contrato_id | date |
+--------+-------------+------------+
| 1 | 1 | 2000-01-01 |
| 1 | 2 | 2000-01-10 |
| 1 | 3 | 2000-02-01 |
| 2 | 1 | 1999-01-01 |
+--------+-------------+------------+
select id, codigo from contratotipo;
+----+---------------+
| id | codigo |
+----+---------------+
| 2 | determinado |
| 3 | fim |
| 1 | indeterminado |
+----+---------------+
What I’m trying to do is to join them in a way that where an employee didn’t sign a contract the date field is set to NULL. That is, I’d like to have an output as the following:
+--------+-------------+------------+
| emp_id | contrato_id | date |
+--------+-------------+------------+
| 1 | 1 | 2000-01-01 |
| 1 | 2 | 2000-01-10 |
| 1 | 3 | 2000-02-01 |
| 2 | 1 | 1999-01-01 |
| 2 | 2 | NULL |
| 2 | 3 | NULL |
+--------+-------------+------------+
I’ve tried different joins to no avail, and, so far, none of them shows me a row with a NULL value in the date field. So, for example if I run
SELECT emp_id,contrato_id, date
FROM organização_rh as o right outer JOIN contratotipo c
ON o.contrato_id = c.id;
I don't get any NULL values when rows don’t match.
+--------+-------------+------------+
| emp_id | contrato_id | date |
+--------+-------------+------------+
| 1 | 3 | 2000-02-01 |
| 1 | 2 | 2000-01-10 |
| 1 | 1 | 2000-01-01 |
| 2 | 1 | 1999-01-01 |
+--------+-------------+------------+
Any help would be much appreciated!!
#John Ruddell thanks for your help.
I came up with this solution that is not that elegant as it uses cursors, but it works with any number of employees and contracts. Your solution seems more alegant as everything fits in a just one query. However, I wasn't able to adapted it to a more general case.
BEGIN
declare var_contrato_id int unsigned;
declare contrato_finished int default 0;
DECLARE contrato_cursor CURSOR FOR SELECT id FROM contratotipo;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET contrato_finished = 1;
OPEN contrato_cursor;
drop table if exists temp_sp_rh_turnover;
CREATE TEMPORARY TABLE temp_sp_rh_turnover AS (select distinct org_id, emp_id, cast( 0 AS unsigned) as contrato_id, cast('1000-1-1' as date) as data from organização_rh where emp_id=-1);
conts: LOOP
FETCH contrato_cursor INTO var_contrato_id;
IF contrato_finished = 1 THEN
LEAVE conts;
END IF;
insert into temp_sp_rh_turnover
select distinct org_id, emp_id, var_contrato_id, NULL from organização_rh;
end loop;
select
t.org_id, t.emp_id, t.contrato_id, o.data
from
temp_sp_rh_turnover as t
left join
organização_rh as o
on
t.org_id = o.org_id and
t.emp_id = o.emp_id and
t.contrato_id = o.contrato_id;
drop table if exists temp_sp_rh_turnover;
END

To do something like this is not easy in mysql and itll require a lot of manual work. from what I understand you want to show a new row per row in your other table..
meaning
for every emp_id in table o, you want there to be 3 rows related to it, one relating to each of the three rows in table c.
now to do something like that is not simple at all since there is no way to manually add rows, unless you are doing an insert or a bunch of unions. so with this dataset (which im assuming is small compared to what you are working with) you can get those desired results like this.
SET #row_num := 1;
INSERT INTO organização_rh (emp_id, contrato_id, date)
SELECT
emp_id, #ROW_NUM := #ROW_NUM + 1, NULL
FROM organização_rh
WHERE emp_id IN
( SELECT emp_id
FROM organização_rh
GROUP BY emp_id
HAVING COUNT(*) < 3
)
AND #ROW_NUM < 3;
run this insert query twice and i'll give you the output you want.
OUTPUT:
23:20:25 set #row_num := 1 0 row(s) affected 0.000 sec
23:20:30 INSERT INTO organização_rh (emp_id, contrato_id, date) SELECT emp_id, #ROW_NUM := #ROW_NUM + 1, NULL FROM organização_rh WHERE emp_id IN ( SELECT emp_id FROM organização_rh GROUP BY emp_id HAVING COUNT(*) < 3 ) AND #ROW_NUM < 3 1 row(s) affected Records: 1 Duplicates: 0 Warnings: 0 0.001 sec
23:20:31 INSERT INTO organização_rh (emp_id, contrato_id, date) SELECT emp_id, #ROW_NUM := #ROW_NUM + 1, NULL FROM organização_rh WHERE emp_id IN ( SELECT emp_id FROM organização_rh GROUP BY emp_id HAVING COUNT(*) < 3 ) AND #ROW_NUM < 3 1 row(s) affected Records: 1 Duplicates: 0 Warnings: 0 0.001 sec
23:20:32 INSERT INTO organização_rh (emp_id, contrato_id, date) SELECT emp_id, #ROW_NUM := #ROW_NUM + 1, NULL FROM organização_rh WHERE emp_id IN ( SELECT emp_id FROM organização_rh GROUP BY emp_id HAVING COUNT(*) < 3 ) AND #ROW_NUM < 3 0 row(s) affected Records: 0 Duplicates: 0 Warnings: 0 0.002 sec
if you notice the last time i tried to do it it wouldn't do it because the row num was max'd out. this will be some manual labor but its basically the only way to do it solely in mysql.
after inserting this is the result set.
+-------+---------------+---------------+
|emp_id | contrato_id, | date |
+-------+---------------+---------------+
|'1', | '1', | '2000-01-01'|
|'1', | '2', | '2000-01-10'|
|'1', | '3', | '2000-02-01'|
|'2', | '1', | '1999-01-01'|
|'2', | '2', | NULL |
|'2', | '3', | NULL |
+-------+---------------+---------------+
NOTE: this is for updating the table. if you wanted just a query with this output, you will need to union a new row for each one which isn't really recommended. to be honest you need to fix your table structure. add a column that you could do a cartesian product join on or something.
anyways here is the union query.
SELECT emp_id, contrato_id, date FROM organização_rh
CROSS JOIN(SELECT #ROW_NUM := 0)t
UNION ALL
SELECT
emp_id, #ROW_NUM := #ROW_NUM + 1, NULL
FROM
organização_rh
WHERE emp_id IN
( SELECT emp_id
FROM organização_rh
GROUP BY emp_id
HAVING COUNT(*) < 3
)
AND #ROW_NUM < 3
UNION ALL
SELECT
emp_id, #ROW_NUM := #ROW_NUM + 1, NULL
FROM
organização_rh
WHERE emp_id IN
( SELECT emp_id
FROM organização_rh
GROUP BY emp_id
HAVING COUNT(*) < 3
)
AND #ROW_NUM < 3;
OUTPUT is again the same as the insert
+-------+---------------+---------------+
|emp_id | contrato_id, | date |
+-------+---------------+---------------+
|'1', | '1', | '2000-01-01'|
|'1', | '2', | '2000-01-10'|
|'1', | '3', | '2000-02-01'|
|'2', | '1', | '1999-01-01'|
|'2', | '2', | NULL |
|'2', | '3', | NULL |
+-------+---------------+---------------+

The reason you are having trouble is that you cannot accomplish this without a third table listing all employees.
You might argue that you could get this information with
select distinct emp_id from organização_rh
but this would not cover the case where there is an employee that has not signed ANY contracts. Certainly you would want that employee to appear in the output with null dates for all contracts.
Create an employee table and then the join should be obvious.

Outer join the table to itself. Something like:
select emp_id, contrato_id,date from organização_rh as a
JOIN
SELECT DISTINCT contrato_id FROM organização_rh as b
WHERE a.contrato_id = b.contrato_id
should work.
Alternatively you could
select emp_id, contrato_id,date from organização_rh as a
JOIN
select DISTINCT id from contratotipo as b
WHERE a.contrato_id = b.id
If thats more to your liking.

#John Ruddell
This works great with the original values.
I added a new employee: emp_id=3, contrato_id=1, date=2001-1-1. If I run the query, I get that contract 3 (for emp_id=3)is missing.
|emp_id|contrato_id| date |
+------+-----------+------------+
|'1', | '1', |'2000-01-01'|
|'1', | '2', |'2000-01-10'|
|'1', | '3', |'2000-02-01'|
|'2', | '1', |'1999-01-01'|
|'2', | '2', | NULL |
|'2', | '3', | NULL |
|'3', | '1', |'2001-01-01'|
|'3', | '2', | NULL |
Do I have to run the query:
SELECT
emp_id, #ROW_NUM := #ROW_NUM + 1, NULL
FROM
organização_rh
WHERE emp_id IN
( SELECT emp_id
FROM organização_rh
GROUP BY emp_id
HAVING COUNT(*) < 3
)
AND #ROW_NUM < 3
for each user after resetting #ROW_NUM?

Related

Time difference between rows

In a mysql database (ver.5.7.15) I've got the following table named operation_list containing the following data: id (autoincrement integer, primary key), operation_date_time (datetime), operation (enumerated values: START,STOP), so the table looks like that:
+------+---------------------+-----------+
| id | operation_date_time | operation |
+------+---------------------+-----------+
| 1 | 2000-01-01 06:30:45 | START |
| 2 | 2000-01-01 07:45:00 | STOP |
| 3 | 2000-01-01 08:18:12 | START |
| 4 | 2000-01-01 11:23:58 | STOP |
| 5 | 2000-01-01 15:45:01 | START |
| 6 | 2000-01-01 19:01:33 | STOP |
+------+---------------------+-----------+
Now, assuming that the first row is always a START, the last ROW is always a STOP, the STOP is always placed after a START, I need to retrieve the time difference between START and STOP in seconds. Hence, I need to write an SQL that would produce the following recordset:
+------+---------------------+-----------+
| id | operation_date_time | duration |
+------+---------------------+-----------+
| 1 | 2000-01-01 06:30:45 | 4455 |
| 3 | 2000-01-01 08:78:12 | 11146 |
| 5 | 2000-01-01 15:45:01 | 11792 |
+------+---------------------+-----------+
Where 4455 is equivalent to 1 hour, 14 minutes and 15 seconds,
11146 is equivalent to 3 hours, 5 minutes and 46 seconds,
11792 is equivalent to 3 hours, 16 minutes and 32 seconds, and so on.
What's the best way to do it in a single SQL statement without creating additional tables or dedicated scripting?
This works IN mysql 5.X
But it is uglier as in 8.0
SELECT
MIN(id) id
,MIN(`operation_date_time`) `operation_date_time`
,MAX(diff) duration
FROM
(SELECT
id
, IF(`operation` = 'START', 0,TIME_TO_SEC(TIMEDIFF(`operation_date_time`, #datetime))) diff
,IF(`operation` = 'START', #count := #count + 1,#count := #count) groupby
,#datetime := `operation_date_time` `operation_date_time`
FROM
(SELECT * FROM timetable ORDER by `operation_date_time` ASC) t1, (SELECT #datetime := NOW()) a,
(SELECT #count := 0) b) t2
GROUP by groupby;
CREATE TABLE timetable (
`id` INTEGER,
`operation_date_time` VARCHAR(19),
`operation` VARCHAR(5)
);
INSERT INTO timetable
(`id`, `operation_date_time`, `operation`)
VALUES
('1', '2000-01-01 06:30:45', 'START'),
('2', '2000-01-01 07:45:00', 'STOP'),
('3', '2000-01-01 08:18:12', 'START'),
('4', '2000-01-01 11:23:58', 'STOP'),
('5', '2000-01-01 15:45:01', 'START'),
('6', '2000-01-01 19:01:33', 'STOP');
✓
✓
SELECT
MIN(id) id
,MIN(`operation_date_time`) `operation_date_time`
,MAX(diff) duration
FROM
(SELECT
id
, IF(`operation` = 'START', 0,TIME_TO_SEC(TIMEDIFF(`operation_date_time`, #datetime))) diff
,IF(`operation` = 'START', #count := #count + 1,#count := #count) groupby
,#datetime := `operation_date_time` `operation_date_time`
FROM
(SELECT * FROM timetable ORDER by `operation_date_time` ASC) t1, (SELECT #datetime := NOW()) a,
(SELECT #count := 0) b) t2
GROUP by groupby;
id | operation_date_time | duration
-: | :------------------ | -------:
1 | 2000-01-01 06:30:45 | 4455
3 | 2000-01-01 08:18:12 | 11146
5 | 2000-01-01 15:45:01 | 11792
db<>fiddle here
SELECT operation_date_time,DURATION FROM (
SELECT *,DATEDIFF(SECOND,operation_date_time,LEAD(operation_date_time)OVER(ORDER BY ID)) AS DURATION FROM PRACTICE
)A
WHERE operation='START'
Use window functions to get the ending time. I am going to use a cumulative conditional min:
select t.*,
timestampdiff(second, operation_date_time, stop_dt) as diff_seconds
from (select t.*,
min(case when operation = 'STOP' then operation_date_time end) over (order by operation_date_time) as stop_dt
from t
) t
where operation = 'START';
If the data really is interleaved, then you could just use lead():
select t.*,
timestampdiff(second, operation_date_time, stop_dt) as diff_seconds
from (select t.*,
lead(operation_date_time) over (order by operation_date_time) as stop_dt
from t
) t
where operation = 'START';
EDIT:
In MySQL pre-8.0:
select t.*,
timestampdiff(second, operation_date_time, stop_dt) as diff_seconds
from (select t.*,
(select min(t2.operation_date_time)
from t t2
where t2.operation = 'STOP' and
t2.operation_date_time > t.operation_date_time
) as stop_dt
from t
) t
where operation = 'START';

How to limit a query by column value

Following query...
SELECT event_id, user_id FROM EventUser WHERE user_id IN (1, 2)
...gives me the following result:
+----------+---------+
| event_id | user_id |
+----------+---------+
| 3 | 1 |
| 2 | 1 |
| 1 | 1 |
| 5 | 1 |
| 4 | 1 |
| 6 | 1 |
| 4 | 2 |
| 2 | 2 |
| 1 | 2 |
| 5 | 2 |
+----------+---------+
Now, I want to modify the above query so that I only get for example two rows for each user_id, eg:
+----------+---------+
| event_id | user_id |
+----------+---------+
| 3 | 1 |
| 2 | 1 |
| 4 | 2 |
| 5 | 2 |
+----------+---------+
I am thinking about something like this, which of course does not work:
SELECT event_id, user_id FROM EventUser WHERE user_id IN (1, 2) LIMIT 2 by user_id
Ideally, this should work with offsets as well because I want to use it for paginations.
For performance reasons it is essential to use the WHERE user_id IN (1, 2) part of the query.
One method -- assuming you have at least two rows for each user -- would be:
(select min(event_id) as event_id, user_id
from t
where user in (1, 2)
group by user_id
) union all
(select max(event_id) as event_id, user_id
from t
where user in (1, 2)
group by user_id
);
Admittedly, this is not a "general" solution, but it might be the simplest solution for what you want.
If you want the two biggest or smallest, then an alternative also works:
select t.*
from t
where t.user_id in (1, 2) and
t.event_id >= (select t2.event_id
from t t2
where t2.user_id = t.user_id
order by t2.event_id desc
limit 1, 1
);
Here is a dynamic example for such problems, Please note that this example is working in SQL Server, could not try on mysql for now. Please let me know how it works.
CREATE TABLE mytable
(
number INT,
score INT
)
INSERT INTO mytable VALUES ( 1, 100)
INSERT INTO mytable VALUES ( 2, 100)
INSERT INTO mytable VALUES ( 2, 120)
INSERT INTO mytable VALUES ( 2, 110)
INSERT INTO mytable VALUES ( 3, 120)
INSERT INTO mytable VALUES ( 3, 150)
SELECT *
FROM mytable m
WHERE
(
SELECT COUNT(*)
FROM mytable m2
WHERE m2.number = m.number AND
m2.score >= m.score
) <= 2
How about this?
SELECT event_id, user_id
FROM (
SELECT event_id, user_id, row_number() OVER (PARTITION BY user_id) AS row_num
FROM EventUser WHERE user_id in (1,2)) WHERE row_num <= n;
And n can be whatever
Later but help uses a derived table and the cross join.
For the example in this post the query will be this:
SELECT
#row_number:=CASE
WHEN #user_no = user_id
THEN
#row_number + 1
ELSE
1
END AS num,
#user_no:=user_id userid, event_id
FROM
EventUser,
(SELECT #user_no:=0,#row_number:=0) as t
group by user_id,event_id
having num < 3;
More information in this link.

SQL: transform rows into columns in MySQL (SELECT statement)

I got table orders and order_comments. Each order can have from 0 to n comments. I would like to get list of all orders with their comments in a sepcific order.
Table orders:
order_id | order_nr
1 | 5252
4 | 6783
5 | 6785
Table order_comments
id_order_comments | order_fk | created_at | email | content
1 | 4 | 2015-01-12 | jack | some text here
2 | 5 | 2015-01-13 | marta | some text here
3 | 5 | 2015-01-14 | beata | some text here
4 | 4 | 2015-01-16 | julia | some text here
As a result, I would like to get 1 row for each order. Comments should be shown in separate columns, starting from the oldest comment. So desired output in this case is:
order_id | 1_comment_created_at | 1_comment_author | 1_comment_content | 2_comment_created_at | 2_comment_author | 2_comment_content
1 | NULL | NULL | NULL | NULL | NULL | NULL
4 | 2015-01-12 | jack | some text here | 2015-01-16 | Julia | some text here
5 | 2015-01-13 | marta | some text here | 2015-01-14 | beata | some text here
I found this: MySQL - Rows to Columns - but I cannot use 'create view'.
I found this: http://dev.mysql.com/doc/refman/5.5/en/while.html - but I cannot create procedure in this db.
What I got:
SELECT #c := (SELECT count(*) FROM order_comments GROUP BY order_fk ORDER BY count(*) DESC LIMIT 1);
SET #rank=0;
SET #test=0;
SELECT
CASE WHEN #test < #c AND temp.comment_id = #test THEN temp.created_at END AS created,
CASE WHEN #test < #c AND temp.comment_id = #test THEN temp.author END AS author,
CASE WHEN #test < #c AND temp.comment_id = #test THEN temp.content END AS content
/*But I cannot set #test as +1. And I cannot name column with variable - like CONCAT(#test, '_created')*/
FROM (
SELECT #rank := #rank +1 AS comment_id, created_at, author, content
FROM order_comments
WHERE order_fk = 4
ORDER BY created_at
) AS temp
Problem: I would like to search more than 1 order. I should get orders with no comments too.
What can I do?
You can use variables for this type of pivot, but the query is a bit more complicated, because you need to enumerate the values for each order:
SELECT o.order_id,
MAX(case when rank = 1 then created_at end) as created_at_1,
MAX(case when rank = 1 then email end) as email_1,
MAX(case when rank = 1 then content end) as content_1,
MAX(case when rank = 2 then created_at end) as created_at_2,
MAX(case when rank = 2 then email end) as email_2,
MAX(case when rank = 2 then content end) as content_2,
FROM orders o LEFT JOIN
(SELECT oc.*,
(#rn := if(#o = order_fk, #rn + 1,
if(#o := order_fk, 1, 1)
)
) as rank
FROM order_comments oc CROSS JOIN
(SELECT #rn := 0, #o := 0) vars
ORDER BY order_fk, created_at
) oc
ON o.order_id = oc.order_fk
GROUP BY o.order_id;

Select quantity of record instances separated by weeks

I have a table like the below:
CompanyID | Logged | UniqueID
A | 2014-06-24 | 8
B | 2014-06-24 | 7
A | 2014-06-16 | 6
B | 2014-06-16 | 5
A | 2014-06-08 | 4
B | 2014-06-08 | 3
A | 2014-06-01 | 2
B | 2014-06-01 | 1
I'm stuck trying to create an SQL statement that will return the quantity of rows found for each unique CompanyID, separated into 4 week periods, so something like the below:
CompanyID | Period (week) | Quantity
A | 0 | 1
B | 0 | 1
A | 1 | 1
B | 1 | 1
A | 2 | 1
B | 2 | 1
A | 3 | 1
B | 3 | 1
I have done something similar before, except by the last 7 days instead of last 4 weeks, but am not sure if this can be reworked:
select CompanyID,
case DATE_FORMAT(Logged, '%Y%m%d')
when '20140618' then '0'
when '20140619' then '1'
when '20140620' then '2'
when '20140621' then '3'
when '20140622' then '4'
when '20140623' then '5'
when '20140624' then '6'
end as period ,
count(UniqueID) as quantity from TABLE
where DATE_FORMAT(Logged, '%Y%m%d')
in (20140618,20140619,20140620,20140621,20140622,20140623,20140624) group by CompanyID,
DATE_FORMAT(Logged, '%Y%m%d')
Is there a more straightforward way to obtain the output desired above?
Maybe something like this?
SQL FIDDLE to test with
Theres the original query that doesn't use any hard coding... that is generally a really bad practice. it will have the count inflated by 1 since it starts with one and you want it to start with zero so to fix this do a select of the original query where you fix the count and then also not show the user defined variable
SELECT CompanyID, Period - 1 as Period, Quantity FROM(
SELECT
CompanyID,
if(#a = Logged, #b, #b := #b + 1) as Period,
COUNT(*) as Quantity,
#a := Logged
FROM test
JOIN (SELECT #a := '', #b := 0) as temp
GROUP BY UniqueID
ORDER BY Period
) as subQuery
ORIGINAL QUERY
SELECT
CompanyID,
if(#a = Logged, #b, #b := #b + 1) as Period,
COUNT(*) as Quantity,
#a := Logged
FROM test
JOIN (SELECT #a := '', #b := 0) as temp
GROUP BY UniqueID
ORDER BY Period

Sql to find timediff between two rows based on ID

The subject of the question is not very explanatory, sorry for that.
Ya so the question follows:
I have a database structure as below where pk is primary key, id
is something which is multiple for many rows.
+------+------+---------------------+
| pk | id | value |
+------+------+---------------------+
| 99 | 1 | 2013-08-06 11:10:00 |
| 100 | 1 | 2013-08-06 11:15:00 |
| 101 | 1 | 2013-08-06 11:20:00 |
| 102 | 1 | 2013-08-06 11:25:00 |
| 103 | 2 | 2013-08-06 15:10:00 |
| 104 | 2 | 2013-08-06 15:15:00 |
| 105 | 2 | 2013-08-06 15:20:00 |
+------+------+---------------------+
What is really need to get is, value difference between first two rows (which is ordered by value) for each
group (where group is by id). So according to above structure I need
timediff(value100, value99) [ which is for id 1 group]
and timediff(value104, value103) [ which is for id 2 group]
i.e. value difference of time ordered by value for 1st two rows in each group.
One way i can think to do is by 3 self joins (or 3 sub queries) so as to find the
first two in 2 of them , and third query subtracting it. Any suggestions?
try this.. CTE is pretty powerfull!
WITH CTE AS (
SELECT
value, pk, id,
rnk = ROW_NUMBER() OVER ( PARTITION BY id order by id DESC)
, rownum = ROW_NUMBER() OVER (ORDER BY id, pk)
FROM test
)
SELECT
curr.rnk, prev.rnk, curr.rownum, prev.rownum, curr.pk, prev.pk, curr.id, prev.id, curr.value, prev.value, curr.value - prev.value
FROM CTE curr
INNER JOIN CTE prev on curr.rownum = prev.rownum -1 and curr.id = prev.id
and curr.rnk <=1
Looks a bit wierd... But you can try this way
SET #previous = 0;
SET #temp = 0;
SET #tempID = 0;
Above step may not be needed .. But just to make sure nothing goes wrong
SELECT pkid, id, diff, valtemp FROM (
SELECT IF(#previousID = id, #temp := #temp + 1, #temp := 1) occ, #previousID := id,
TIMEDIFF(`value`, #previous) diff, pk, id, `value`, #previous := `value`
FROM testtable) a WHERE occ = 2
Demo on sql fiddle