Emulating row_number in MySQL 5.7 with date condition

Emulating row_number in MySQL 5.7 with date condition - mysql

I am working in MySQL 5.7.34 and I have the following:
create table employees (
employee_id int not null auto_increment,
first_name varchar(100) not null,
last_name varchar(100) not null,
primary key (employee_id)
);
create table documents (
document_id int not null auto_increment,
title varchar(100) not null,
last_modified datetime not null,
employee_id int not null,
primary key (document_id),
foreign key (employee_id) references employees(employee_id)
);
I have created a db-fiddle here. The columns shown in the fiddle are only a few, but in the real data set, the documents table will have 20 or more columns and all need to show.
SQL Query:
-- simply query
select e.employee_id,
e.first_name,
e.last_name,
d.title
from employees e
inner join documents d on e.employee_id = d.employee_id;
Current Output:
employee_id
first_name
last_name
title
1
John
Doe
JD_Doc_Updated
1
John
Doe
JD_Doc
2
Mike
Anderson
MA_Doc
Desired Output:
For each employee, I want to grab the most recent document only. In MS SQL Server (and recent versions of MySQL), I could use something like ROW_NUMBER() OVER(PARTITION BY d.employee_id ORDER BY d.last_modified DESC) AS num
employee_id
first_name
last_name
title
1
John
Doe
JD_Doc_Updated
2
Mike
Anderson
MA_Doc
I am not sure how to achieve the same here.
I had a read of this question, and I don't think the problem here is the same, that other question deals with data on the same table and isn't dealing with datetime as the data type. Unless I'm missing something?

A pretty simple method uses a correlated subquery:
select e.employee_id, e.first_name, e.last_name,
d.title
from employees e join
documents d
on e.employee_id = d.employee_id
where d.last_modified = (select max(d2.last_modified)
from documents d2
where d2.employee_id = d.employee_id
);
For performance, you want an index on documents(employee_id, last_modified).
Here is a db-fiddle.

You are using an old MySQL version which doesn't feature window functions. So, to get the latest document per employee you need two steps:
get the maximum document ID or date
get the row belonging to that ID or date
Let's say "latest document" refers to the one that was last modified:
select *
from employees e
join documents d
on d.employee_id = e.employee_id
and (d.employee_id, d.last_modified) in
(
select employee_id, max(last_modified)
from documents
group by employee_id
)
order by e.employee_id;
The IN clause ensures that the joined document is in the set of latest documents. There are several other ways to write this. You can replace the IN clause with a correlated clause to refer to the employee's ID. Or you could join the tables and have a criteria that NOT EXISTS a newer document. Or you could upgrade to MySQL 8 :-)

You can use the method. The first method is the same as row-number in MSSQL
select employee_id,first_name,last_name,title
from
(select *,
#row_number:=CASE
WHEN #empl_no = employee_id
THEN
#row_number + 1
ELSE
1
END AS num,
#empl_no := employee_id EmplNumber
from
(select e.employee_id,
e.first_name,
e.last_name,
d.title
from employees e
inner join documents d on e.employee_id = d.employee_id
order by d.last_modified desc) t1,(SELECT #empl_no :=0,#row_number:=0) as t2) T
where num = 1
order by employee_id
or you can use group by then select max last_modified. finally join the documents and fetch title column.
select t1.employee_id,t1.first_name,t1.last_name,title,t2.title
from
(select e.employee_id,
max(e.first_name) as first_name,
max(e.last_name) as last_name,
max(d.last_modified ) as last_modified
from employees e
inner join documents d on e.employee_id = d.employee_id
group by e.employee_id) t1
join documents t2 on t1.last_modified = t2.last_modified

Related

For ALL queries in SQL , Table division

I have a schema with three tables:
Project (project_id,proj_name,chief_arch)
Employee (emp_id,emp_name)
Assigned-to (project_id,emp_id)
I have created all tables with data on http://sqlfiddle.com/#!9/3f21e
You can view the all data (select * ...) on http://sqlfiddle.com/#!9/3f21e/1
Please first view the tables and data on SQLFIDDLE.
I have an existing query to get employee names who work on at least one project where employee 107 also worked:
select EMP_NAME from employee natural join `assigned-to`
WHERE EMP_ID<>'107' AND
PROJECT_ID IN(
SELECT PROJECT_ID FROM `assigned-to`
WHERE EMP_ID='107'
)
GROUP BY EMP_NAME;
SQLFiddle
But now I need to solve a slightly different problem. I need the employee names who on work on ALL projects that employee 107 works on.
How can I write a query for this problem?

Try this:
SELECT EMP_NAME
FROM EMPLOYEE NATURAL JOIN `ASSIGNED-TO`
WHERE EMP_ID<>'107' AND
PROJECT_ID IN (
SELECT PROJECT_ID FROM `ASSIGNED-TO`
WHERE EMP_ID='107'
)
GROUP BY EMP_NAME
HAVING COUNT(*)=(
SELECT COUNT(*)
FROM `ASSIGNED-TO`
WHERE EMP_ID='107'
);
See it run on SQL Fiddle.

You can do this by counting the projects other employees in common with the employee and then selecting only those where the count exactly matches the original employees count.
SELECT EMP_ID FROM `ASSIGNED-TO` WHERE PROJECT_ID IN
(SELECT PROJECT_ID FROM `ASSIGNED-TO` WHERE EMP_ID = '107')
AND EMP_ID <> '107'
GROUP BY EMP_ID
HAVING COUNT(*) = (SELECT COUNT(*) FROM `ASSIGNED-TO` WHERE EMP_ID = '107')

This will work too. I want to validate if the project id in assigned-to is found in project table.
select e.emp_name
from employee e
natural join `assigned-to` a
where emp_id <> 107
and a.project_id in (
select project_id
from (
select emp_id, project_id
from employee natural join `assigned-to` natural join project
where emp_id = 107 ) t
)
group by e.emp_id
having count(project_id) = (select count(project_id) from `assigned-to` where emp_id = 107)

Optimizing MySQL query removing subquery

Having these tables:
customers
---------------------
`id` smallint(5) unsigned NOT NULL auto_increment,
`name` varchar(100) collate utf8_unicode_ci default NOT NULL,
....
customers_subaccounts
-------------------------
`companies_id` mediumint(8) unsigned NOT NULL,
`customers_id` mediumint(8) unsigned NOT NULL,
`subaccount` int(10) unsigned NOT NULL
I need to get all the customers whom have been assigned more than one subaccount for the same company.
This is what I've got:
SELECT * FROM customers
WHERE id IN
(SELECT customers_id
FROM customers_subaccounts
GROUP BY customers_id, companies_id
HAVING COUNT(subaccount) > 1)
This query is too slow though. It's even slower if I add the DISTINCT modifier to customers_id in the SELECT of the subquery, which in the end retrieves the same customers list for the whole query. Maybe there's a better way without subquerying, anything faster will help, and I'm not sure whether it will retrieve an accurate correct list.
Any help?

You can replace the subquery with an INNER JOIN:
SELECT t1.id
FROM customers t1
INNER JOIN
(
SELECT DISTINCT customers_id
FROM customers_subaccounts
GROUP BY customers_id, companies_id
HAVING COUNT(*) > 1
) t2
ON t1.id = t2.customers_id

You can also try using EXISTS() which may be faster then a join :
SELECT * FROM customers t
WHERE EXISTS(SELECT 1 FROM customers_subaccounts s
WHERE s.customers_id = t.id
GROUP BY s.customers_id, s.companies_id
HAVING COUNT(subaccount) > 1)
You should also considering adding the following indexes(if not exists yet) :
customers_subaccounts (customers_id,companies_id,subaccount)
customers (id)

Assuming that you want different subaccounts for the company (or that they are guaranteed to be different anyway), then the following could be faster under some circumstances:
select c.*
from (select distinct cs.customers_id
from customers_subaccounts cs join
customers_subaccounts cs2
on cs.customers_id = cs2.customers_id and
cs.companies_id = cs2.companies_id and
cs.subaccount < cs2.subaccount
) cc join
customers c
on c.customers_id = cc.customers_id;
In particular, this can take advantage of an index on customers_subaccounts(customers_id, companies_id, subaccount).
Note: This assumes that the subaccounts are different for the rows you want. What is really needed is a way of defining unique rows in the customers_subaccounts table.

There is a way to speed up the query by using cache the sub-query result. A simple change in your query aware mysql that can cache the sub-query result:
SELECT * FROM customers
WHERE id IN
(select * from
(SELECT distinct customers_id
FROM customers_subaccounts
GROUP BY customers_id, companies_id
HAVING COUNT(subaccount) > 1) t1);
I used it many years ago and it helped me very much.

Try following;)
SELECT DISTINCT t1.*
FROM customers t1
INNER JOIN customers_subaccounts t2 ON t1.id = t2.customers_id
GROUP BY t1.id, t1.name, t2.companies_id
HAVING COUNT(t2.subaccount) > 1
Also you may add index on customers_id.

SQL : select record from table 1 which are not in table 2

I have two database tables: "employee" and "department".
employee's table has two columns Primary_key id and emp_name
and in department table emp_id, and dep_name.
There's supposed to be a relation b/w table as a foreign key,
but for some reason this relationship is virtual
Data in tables like
employee
id emp_name
1 'abc'
2 'efg'
4 'hij'
department
emp_id dept_name
1 'it'
2 'engineering'
3 'management'
5 'process'
want to select all records from the department table which are not in the employee table.
one solution is
select d.*
from department
where d.id not in(select id from employee);
is there any better optimized way?

You can use LEFT JOIN:
SELECT d.*
FROM department d
LEFT JOIN employee e
ON d.emp_id = e.id
WHERE e.id IS NULL;
You should compare execution plans to check which query has the best performance.

Using in, as you did, is fine. Using exists, however, may perform a tad faster:
SELECT *
FROM department d
WHERE NOT EXISTS (SELECT *
FROM employee e
WHERE d.emp_id = e.id)

Selecting top result from SQL

This is i think a simple problem but i can't seem to solve it.
I want to select the newest result from a table and join it with a single element in another table.
To put it better, here's a simple schema:
Table 1 - Person
personId -PK - INT - AUTO ID
name - VARCHAR
Table 2 - Event
eventId - PK - INT - AUTO ID
personId - FK
timestamp - DATETIME
event - ENUM ('Went Out', 'Came back')
What I'd like to do is return a list of all people and the latest action each person performed
Example Result:
name| personId | timestamp | eventId | event
bob | 1 | 2011-08-7 3 | 'went out'
I did a simple query joining the two tables and then did a group by personId and order by timestamp but the result that was returned was always the first action for the person, not their latest.
Any Ideas?

SELECT
t1.Name,
t1.PersonId,
t2.TimeStamp,
t2.EventId,
t2.Event
FROM Table1 t1
INNER JOIN Table2 t2 ON t2.PersonId = t1.PersonID
INNER JOIN (SELECT
PersonId,
MAX(TimeStamp) as LastEventDateTime
FROM Table2
GROUP BY PersonID) LE
ON LE.PersonID = t2.PersonID
AND LE.LastEventDateTime = t2.TimeStamp

SELECT p.name, p.personId, e.timestamp, e.eventId, e.event
FROM person p
INNER JOIN Event e
ON e.eventId =
( SELECT MAX(eventId)
FROM Event
WHERE personId = p.personId
GROUP BY personId
LIMIT 1 )
OR
SELECT p.Name, p.ID, me.timestamp, me.ID, me.event
FROM person p
INNER JOIN (
SELECT id, timestamp, event
FROM Event
WHERE personId = p.ID
ORDER BY timestamp DESC LIMIT 1
) me
ON p.ID = me.id
PS: sorry but can't test both queries right now

you'd want to do an
ORDER by `timestamp` DESC
(desc from descending) to get the highest timestamp value instead of the lowest

The ANSI standard way would be:
select name, personid, timestamp, eventid, event
from person
join event on event.personid=person.personid
and event.timestamp=(select max(timestamp) from event e2
where e2.personid=person.personid)
I haven't used MySQL in a while and I don't have an installation handy, but you might get what you want with:
select name, personid, timestamp, eventid, event
from person
join event on event.personid=person.personid
group by personid
order by personid, timestamp desc
It's non-standard because by the standard, anything in the select must be in the group-by or be an aggregate, and here we don't want to do either. But as I recall MySQL doesn't require that, so I'd give this a whirl and see what happens.

An alternative solution, making use of a covered key, assumes that order by Id would yield the same results as order by timestamp
SELECT p.Name, p.ID, me.timestamp, me.ID, me.event
FROM person p
JOIN (
SELECT personId, MAX(ID) id
FROM Event
WHERE personId = p.ID
GROUP BY personId
) me
ON p.ID = me.id
Order by timestamp is more natural and probably safer, but this is quicker.

Find duplicate records in MySQL

I want to pull out duplicate records in a MySQL Database. This can be done with:
SELECT address, count(id) as cnt FROM list
GROUP BY address HAVING cnt > 1
Which results in:
100 MAIN ST 2
I would like to pull it so that it shows each row that is a duplicate. Something like:
JIM JONES 100 MAIN ST
JOHN SMITH 100 MAIN ST
Any thoughts on how this can be done? I'm trying to avoid doing the first one then looking up the duplicates with a second query in the code.

The key is to rewrite this query so that it can be used as a subquery.
SELECT firstname,
lastname,
list.address
FROM list
INNER JOIN (SELECT address
FROM list
GROUP BY address
HAVING COUNT(id) > 1) dup
ON list.address = dup.address;

SELECT date FROM logs group by date having count(*) >= 2

Why not just INNER JOIN the table with itself?
SELECT a.firstname, a.lastname, a.address
FROM list a
INNER JOIN list b ON a.address = b.address
WHERE a.id <> b.id
A DISTINCT is needed if the address could exist more than two times.

I tried the best answer chosen for this question, but it confused me somewhat. I actually needed that just on a single field from my table. The following example from this link worked out very well for me:
SELECT COUNT(*) c,title FROM `data` GROUP BY title HAVING c > 1;

Isn't this easier :
SELECT *
FROM tc_tariff_groups
GROUP BY group_id
HAVING COUNT(group_id) >1
?

select `cityname` from `codcities` group by `cityname` having count(*)>=2
This is the similar query you have asked for and its 200% working and easy too.
Enjoy!!!

Find duplicate users by email address with this query...
SELECT users.name, users.uid, users.mail, from_unixtime(created)
FROM users
INNER JOIN (
SELECT mail
FROM users
GROUP BY mail
HAVING count(mail) > 1
) dupes ON users.mail = dupes.mail
ORDER BY users.mail;

we can found the duplicates depends on more then one fields also.For those cases you can use below format.
SELECT COUNT(*), column1, column2
FROM tablename
GROUP BY column1, column2
HAVING COUNT(*)>1;

Finding duplicate addresses is much more complex than it seems, especially if you require accuracy. A MySQL query is not enough in this case...
I work at SmartyStreets, where we do address validation and de-duplication and other stuff, and I've seen a lot of diverse challenges with similar problems.
There are several third-party services which will flag duplicates in a list for you. Doing this solely with a MySQL subquery will not account for differences in address formats and standards. The USPS (for US address) has certain guidelines to make these standard, but only a handful of vendors are certified to perform such operations.
So, I would recommend the best answer for you is to export the table into a CSV file, for instance, and submit it to a capable list processor. One such is LiveAddress which will have it done for you in a few seconds to a few minutes automatically. It will flag duplicate rows with a new field called "Duplicate" and a value of Y in it.

Another solution would be to use table aliases, like so:
SELECT p1.id, p2.id, p1.address
FROM list AS p1, list AS p2
WHERE p1.address = p2.address
AND p1.id != p2.id
All you're really doing in this case is taking the original list table, creating two pretend tables -- p1 and p2 -- out of that, and then performing a join on the address column (line 3). The 4th line makes sure that the same record doesn't show up multiple times in your set of results ("duplicate duplicates").

Not going to be very efficient, but it should work:
SELECT *
FROM list AS outer
WHERE (SELECT COUNT(*)
FROM list AS inner
WHERE inner.address = outer.address) > 1;

This will select duplicates in one table pass, no subqueries.
SELECT *
FROM (
SELECT ao.*, (#r := #r + 1) AS rn
FROM (
SELECT #_address := 'N'
) vars,
(
SELECT *
FROM
list a
ORDER BY
address, id
) ao
WHERE CASE WHEN #_address <> address THEN #r := 0 ELSE 0 END IS NOT NULL
AND (#_address := address ) IS NOT NULL
) aoo
WHERE rn > 1
This query actially emulates ROW_NUMBER() present in Oracle and SQL Server
See the article in my blog for details:
Analytic functions: SUM, AVG, ROW_NUMBER - emulating in MySQL.

This also will show you how many duplicates have and will order the results without joins
SELECT `Language` , id, COUNT( id ) AS how_many
FROM `languages`
GROUP BY `Language`
HAVING how_many >=2
ORDER BY how_many DESC

SELECT firstname, lastname, address FROM list
WHERE
Address in
(SELECT address FROM list
GROUP BY address
HAVING count(*) > 1)

select * from table_name t1 inner join (select distinct <attribute list> from table_name as temp)t2 where t1.attribute_name = t2.attribute_name
For your table it would be something like
select * from list l1 inner join (select distinct address from list as list2)l2 where l1.address=l2.address
This query will give you all the distinct address entries in your list table... I am not sure how this will work if you have any primary key values for name, etc..

Fastest duplicates removal queries procedure:
/* create temp table with one primary column id */
INSERT INTO temp(id) SELECT MIN(id) FROM list GROUP BY (isbn) HAVING COUNT(*)>1;
DELETE FROM list WHERE id IN (SELECT id FROM temp);
DELETE FROM temp;

Personally this query has solved my problem:
SELECT `SUB_ID`, COUNT(SRV_KW_ID) as subscriptions FROM `SUB_SUBSCR` group by SUB_ID, SRV_KW_ID HAVING subscriptions > 1;
What this script does is showing all the subscriber ID's that exists more than once into the table and the number of duplicates found.
This are the table columns:
| SUB_SUBSCR_ID | int(11) | NO | PRI | NULL | auto_increment |
| MSI_ALIAS | varchar(64) | YES | UNI | NULL | |
| SUB_ID | int(11) | NO | MUL | NULL | |
| SRV_KW_ID | int(11) | NO | MUL | NULL | |
Hope it will be helpful for you either!

SELECT t.*,(select count(*) from city as tt where tt.name=t.name) as count FROM `city` as t where (select count(*) from city as tt where tt.name=t.name) > 1 order by count desc
Replace city with your Table.
Replace name with your field name

SELECT id, count(*) as c
FROM 'list'
GROUP BY id HAVING c > 1
This will return you the id with the number of times that id is repeated, or nothing in which case you will not have repeated id.
Change the id in the group by (ex: address) and it will return the number of times an address is repeated identified by the first found id with that address.
SELECT id, count(*) as c
FROM 'list'
GROUP BY address HAVING c > 1
I hope it helps. Enjoy ;)

SELECT *
FROM (SELECT address, COUNT(id) AS cnt
FROM list
GROUP BY address
HAVING ( COUNT(id) > 1 ))

I use the following:
SELECT * FROM mytable
WHERE id IN (
SELECT id FROM mytable
GROUP BY column1, column2, column3
HAVING count(*) > 1
)

Most of the answers here don't cope with the case when you have MORE THAN ONE duplicate result and/or when you have MORE THAN ONE column to check for duplications. When you are in such case, you can use this query to get all duplicate ids:
SELECT address, email, COUNT(*) AS QUANTITY_DUPLICATES, GROUP_CONCAT(id) AS ID_DUPLICATES
FROM list
GROUP BY address, email
HAVING COUNT(*)>1;
If you want to list every result as a single line, you need a more complex query. This is the one I found working:
CREATE TEMPORARY TABLE IF NOT EXISTS temptable AS (
SELECT GROUP_CONCAT(id) AS ID_DUPLICATES
FROM list
GROUP BY address, email
HAVING COUNT(*)>1
);
SELECT d.*
FROM list AS d, temptable AS t
WHERE FIND_IN_SET(d.id, t.ID_DUPLICATES)
ORDER BY d.id;

Find duplicate Records:
Suppose we have table : Student
student_id int
student_name varchar
Records:
+------------+---------------------+
| student_id | student_name |
+------------+---------------------+
| 101 | usman |
| 101 | usman |
| 101 | usman |
| 102 | usmanyaqoob |
| 103 | muhammadusmanyaqoob |
| 103 | muhammadusmanyaqoob |
+------------+---------------------+
Now we want to see duplicate records
Use this query:
select student_name,student_id ,count(*) c from student group by student_id,student_name having c>1;
+--------------------+------------+---+
| student_name | student_id | c |
+---------------------+------------+---+
| usman | 101 | 3 |
| muhammadusmanyaqoob | 103 | 2 |
+---------------------+------------+---+

To quickly see the duplicate rows you can run a single simple query
Here I am querying the table and listing all duplicate rows with same user_id, market_place and sku:
select user_id, market_place,sku, count(id)as totals from sku_analytics group by user_id, market_place,sku having count(id)>1;
To delete the duplicate row you have to decide which row you want to delete. Eg the one with lower id (usually older) or maybe some other date information. In my case I just want to delete the lower id since the newer id is latest information.
First double check if the right records will be deleted. Here I am selecting the record among duplicates which will be deleted (by unique id).
select a.user_id, a.market_place,a.sku from sku_analytics a inner join sku_analytics b where a.id< b.id and a.user_id= b.user_id and a.market_place= b.market_place and a.sku = b.sku;
Then I run the delete query to delete the dupes:
delete a from sku_analytics a inner join sku_analytics b where a.id< b.id and a.user_id= b.user_id and a.market_place= b.market_place and a.sku = b.sku;
Backup, Double check, verify, verify backup then execute.

SELECT * FROM bookings
WHERE DATE(created_at) = '2022-01-11'
AND code IN (
SELECT code FROM bookings
GROUP BY code
HAVING COUNT(code) > 1
) ORDER BY id DESC

Would go with something like this:
SELECT t1.firstname t1.lastname t1.address FROM list t1
INNER JOIN list t2
WHERE
t1.id < t2.id AND
t1.address = t2.address;

select address from list where address = any (select address from (select address, count(id) cnt from list group by address having cnt > 1 ) as t1) order by address
the inner sub-query returns rows with duplicate address then
the outer sub-query returns the address column for address with duplicates.
the outer sub-query must return only one column because it used as operand for the operator '= any'

Powerlord answer is indeed the best and I would recommend one more change: use LIMIT to make sure db would not get overloaded:
SELECT firstname, lastname, list.address FROM list
INNER JOIN (SELECT address FROM list
GROUP BY address HAVING count(id) > 1) dup ON list.address = dup.address
LIMIT 10
It is a good habit to use LIMIT if there is no WHERE and when making joins. Start with small value, check how heavy the query is and then increase the limit.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Emulating row_number in MySQL 5.7 with date condition - mysql

Related

For ALL queries in SQL , Table division

Optimizing MySQL query removing subquery

SQL : select record from table 1 which are not in table 2

Selecting top result from SQL

Find duplicate records in MySQL

Categories

Resources