This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 3 years ago.
I have a log table in MySQL (5.7.14) with the following schema:
CREATE TABLE logs
(
id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
entry_date DATE NOT NULL,
original_date DATE NOT NULL,
ref_no VARCHAR(30) NOT NULL
) Engine=InnoDB;
INSERT INTO logs VALUES
(1,'2020-01-01','2020-01-01','XYZ'),
(2,'2020-01-01','2020-01-01','ABC'),
(3,'2020-01-02','2020-01-01','XYZ'),
(4,'2020-01-02','2020-01-01','ABC'),
(5,'2020-01-03','2020-01-02','XYZ'),
(6,'2020-01-03','2020-01-01','ABC');
I want to return the first row for each unique (original_date, ref_no) pairing, where 'first' is defined as 'lowest id'.
For example, if I had the following data:
id|entry_date|original_date|ref_no
--+----------+-------------+------
1 |2020-01-01|2020-01-01 |XYZ
2 |2020-01-01|2020-01-01 |ABC
3 |2020-01-02|2020-01-01 |XYZ
4 |2020-01-02|2020-01-01 |ABC
5 |2020-01-03|2020-01-02 |XYZ
6 |2020-01-03|2020-01-01 |ABC
I would want the query to return:
id|entry_date|original_date|ref_no
--+----------+-------------+------
1 |2020-01-01|2020-01-01 |XYZ
2 |2020-01-01|2020-01-01 |ABC
5 |2020-01-03|2020-01-02 |XYZ
In other words:
Row 1 is returned because we haven't seen 2020-01-01,XYZ before.
Row 2 is returned because we haven't seen 2020-01-01,ABC before.
Row 3 is not returned because we have seen 2020-01-01,XYZ before (row 1).
Row 4 is not returned because we have seen 2020-01-01,ABC before (row 2).
Row 5 is returned because we haven't seen 2020-01-02,XYZ before.
Row 6 is not returned because we have seen 2020-01-01,ABC before (row 2).
Is there a way to do this directly in SQL? I've considered DISTINCT but I think that only returns the distinct columns, whereas I want the full row.
To avoid a correlated subquery you can do:
select l.*
from logs l
join (
select original_date, ref_no, min(id) as min_id
from logs
group by original_date, ref_no
) x on l.id = x.min_id
You can use a correlated subquery:
select l.*
from logs l
where l.id = (select min(l2.id)
from logs l2
where l2.original_date = l.original_date and
l2.ref_no = l.ref_no
);
For performance, you want an index on logs(original_date, ref_no, id).
Try this:
select t1.*
from logs AS t1
left join logs AS t2 on
(
t2.original_date = t1.original_date and
t2.ref_no = t1.ref_no and
t2.id < t1.id
)
where
t2.original_date is null and
t2.ref_no is null
Related
This question already has answers here:
SQL select only rows with max value on a column [duplicate]
(27 answers)
Closed 2 years ago.
I have the following MySQL table:
[orders]
===
order_id BIGINT UNSIGNED AUTO INCREMENT (primary key)
order_ref_id VARCHAR(36) NOT NULL,
order_version BIGINT NOT NULL,
order_name VARCHAR(100) NOT NULL
...
lots of other fields
The orders are "versioned" meaning multiple orders can have the same identical order_ref_id but will have different versions (version 0, version 1, version 2, etc.).
I want to write a query that returns all order fields for an order based on its order_ref_id and its MAX version, so something like this:
SELECT
*
FROM
orders
WHERE
order_ref_id = '12345'
AND
MAX(order_version)
In other words: given the org_ref_id, give me the single orders record that has the highest version number. Hence if the following rows exist in the table:
order_id | order_ref_id |. order_version | order_name
======================================================================
1 | 12345 | 0 | "Hello"
2 | 12345 | 1 | "Goodbye"
3 | 12345 | 2 | "Wazzup"
I want a query that will return:
3 | 12345 | 2 | "Wazzup"
However the above query is invalid, and yields:
ERROR 1111 (HY000): Invalid use of group function
I know typical MAX() examples would have me writing something like:
SELECT
MAX(order_version)
FROM
orders
WHERE
order_ref_id = '12345';
But that just gives me order_version, not all (*) the fields. Can anyone help nudge me across the finish line here?
You could try using a subquery for max version group by order_ref_id
select * from orders o
inner join (
SELECT order_ref_id
, MAX(order_version) max_ver
FROM orders
group by order_ref_id
) t on t.order_ref_id = o.order_ref_id
and t.max_ver = o.order_version
i have a large MySQL Database with more than 1 Million rows. How can i find the missing eid's?
+----+-----+
| id | eid |
+----+-----+
| 1 | 1 |
+----+-----+
| 2 | 2 |
+----+-----+
| 3 | 4 |
+----+-----+
I like to list all missing eid's, the 3 in this example. I've tried many things but everything what i do need to much time.
I hope someone can help me.
Thanks
You can use NOT EXISTS to find the required rows.
create table t(id integer, eid integer);
insert into t values(1,1);
insert into t values(2,2);
insert into t values(3,4);
SELECT id
FROM t a
WHERE NOT EXISTS
( SELECT 1
FROM t b
WHERE b.eid = a.id );
or use NOT IN:
SELECT ID
FROM t
WHERE ID NOT IN
(SELECT EID
FROM t);
produces:
| id |
|----|
| 3 |
Try the below query
SELECT ID FROM table WHERE ID NOT IN(SELECT EID FROM table );
Finding duplicate numbers is easy:
select id, count() from sequence
group by id
having count() > 1;
In this case there are no duplicates, since I’m not concentrating on that in this post (finding duplicates is straightforward enough that I hope you can see how it’s done). I had to scratch my head for a second to find missing numbers in the sequence, though. Here is my first shot at it:
select l.id + 1 as start
from sequence as l
left outer join sequence as r on l.id + 1 = r.id
where r.id is null;
The idea is to exclusion join against the same sequence, but shifted by one position. Any number with an adjacent number will join successfully, and the WHERE clause will eliminate successful matches, leaving the missing numbers. Here is the result:
https://www.xaprb.com/blog/2005/12/06/find-missing-numbers-in-a-sequence-with-sql/
if you want a lighter way to search millions of rows of data,
I was try for search in more than 23 millions rows with old CPU (12.6Gb data need about 1gb of free ram):
Affected rows: 0 Found rows: 346.764 Warnings: 0 Duration for 2 queries: 00:04:48.0 (+ 2,656 sec. network)
SET #idBefore=0, #st=0,#diffSt=0,#diffEnd=0;
SELECT res.idBefore `betweenID`, res.ID `andNextID`
, res.startEID, res.endEID
, res.diff `diffEID`
-- DON'T USE this missingEIDfor more than a thousand of rows
-- this is just for sample view
, GROUP_CONCAT(b.aNum) `missingEID`
FROM (
SELECT
#idBefore `idBefore`
, #idBefore:=(a.id) `ID`
, #diffSt:=(#st) `startEID`
, #diffEnd:=(a.eid) `endEID`
, #st:=a.eid `end`
, #diffEnd-#diffSt-1 `diff`
FROM eid a
ORDER BY a.ID
) res
-- DON'T USE this integers for more than a thousand of rows
-- this is just for sample view
CROSS JOIN (SELECT a.ID + (b.ID * 10) + (c.ID * 100) AS aNum FROM integers a, integers b, integers c) b
WHERE res.diff>0 AND b.aNum BETWEEN res.startEID+1 AND res.endEID-1
GROUP BY res.ID;
check out this http://sqlfiddle.com/#!9/33deb3/9
and this is for missing ID http://sqlfiddle.com/#!9/3ea00c/9
I have an assigns table with the following columns:
id - int
id_lead - int
id_source - int
date_assigned - int (this represents a unix timestamp)
Now, lets say I have the following data in this table:
id id_lead id_source date_assigned
1 20 5 1462544612
2 20 6 1462544624
3 22 6 1462544615
4 22 5 1462544626
5 22 7 1462544632
6 25 6 1462544614
7 25 8 1462544621
Now, lets say I want to get a count of the rows whose id_source is 6, and is the first entry for each lead (sorted by date_assigned asc).
So in this case, the count would = 2, because there are 2 leads (id_lead 22 and 25) whose first id_source is 6.
How would I write this query so that it is fast and would work fine as a subquery select? I was thinking something like this which doesn't work:
select count(*) from `assigns` where `id_source`=6 order by `date_assigned` asc limit 1
I have no idea how to write this query in an optimal way. Any help would be appreciated.
Pseudocode:
select rows
with a.id_source = 6
but only if
there do not exist any row
with same id_lead
and smaller date_assigned
Translate it to SQL
select * -- select rows
from assigns a
where a.id_source = 6 -- with a.id_source = 6
and not exists ( -- but only if there do not exist any row
select 1
from assigns a1
where a1.id_lead = a.id_lead -- with same id_lead
and a1.date_assigned < a.date_assigned -- and smaller date_assigned
)
Now replace select * with select count(*) and you'll get your result.
http://sqlfiddle.com/#!9/3dc0f5/7
Update:
The NOT-EXIST query can be rewritten to an excluding LEFT JOIN query:
select count(*)
from assigns a
left join assigns a1
on a1.id_lead = a.id_lead
and a1.date_assigned < a.date_assigned
where a.id_source = 6
and a1.id_lead is null
If you want to get the count for all values of id_source, the folowing query might be the fastest:
select a.id_source, count(1)
from (
select a1.id_lead, min(a1.date_assigned) date_assigned
from assigns a1
group by a1.id_lead
) a1
join assigns a
on a.id_lead = a1.id_lead
and a.date_assigned = a1.date_assigned
group by a.id_source
You still can replace group by a.id_source with where a.id_source = 6.
The queries need indexes on assigns(id_source) and assigns(id_lead, date_assigned).
Simple query for that would be
check here http://sqlfiddle.com/#!9/8666e0/7
select count(*) from
(select * from assigns group by id_lead )t
where t.id_source=6
Just demo. I have two table
Table a (id, name)
--id---name----
1 John
2 Jack
3 Maria
4 Bill
Table b (id, empid, datewrk)
--id---empid----datewrk----
1 1 2012-12-12
2 2 2012-12-14
3 3 2012-12-16
4 4 2012-12-17
I want update all name = null in table a where date in table b <= '2012-12-14', the result is
--id---name--
1 NULL
2 NULL
I have try code bellow but no work (only work with SELECT statement). I try in MySql Workbench and SQL Server 2012
UPDATE a
SET name = NUll
WHERE id IN (SELECT a.id FROM a
JOIN b ON a.id = b.empid
WHERE b.datewrk <= '2012-12-14');
Thank.
For mysql
UPDATE a
JOIN b ON a.id = b.empid
SET a.name = NUll
WHERE b.datewrk <= '2012-12-14';
You don't need a subquery just join your table put set clause in right place then where clause
Fiddle Demo
Your Update Statement should work, but in order to compare you have to convert '2012-12-14' to date.
UPDATE a
SET name = NULL
WHERE id IN (
SELECT empid FROM b
WHERE datewrk <= STR_TO_DATE('2012-12-14', '%Y-%m-%d'));
Note that in your subquery you don't need table A.
Hope this helps.
I'm trying to join distinct ID's from a subquery in a FROM onto a table which has the same ID's, but non-distinct as they are repeated to create a whole entity. How can one do this? All of my tries are continuously amounting to single ID's in the non-distinct-id-table.
For example:
Table 1
ID val_string val_int val_datetime
1 null 3435 null
1 bla null null
1 null null 2013-08-27
2 null 428 null
2 blob null null
2 null null 2013-08-30
etc. etc. etc.
Virtual "v_table" from SubQuery
ID
1
2
Now, if I create the query along the lines of:
SELECT t.ID, t.val_string, t.val_int, t.val_datetime
FROM table1 AS t
JOIN (subquery) AS v_table
ON t.ID = v_table.ID
I get the result:
Result Table:
ID val_string val_int val_datetime
1 null 3436 null
2 null 428 null
What I'd like is to see the whole of Table 1 based on this example. (Actual query has some more parameters, but this is the issue I'm stuck on).
How would I go about making sure that I get everything from Table 1 where the ID's match the ID's from a virtual table?
SELECT t.ID, t.val_string, t.val_int, t.val_datetime
FROM table1 AS t
LEFT JOIN (subquery) AS v_table
ON t.ID = v_table.ID
Sample fiddle