Optimizing MySQL query removing subquery

Optimizing MySQL query removing subquery - mysql

Having these tables:
customers
---------------------
`id` smallint(5) unsigned NOT NULL auto_increment,
`name` varchar(100) collate utf8_unicode_ci default NOT NULL,
....
customers_subaccounts
-------------------------
`companies_id` mediumint(8) unsigned NOT NULL,
`customers_id` mediumint(8) unsigned NOT NULL,
`subaccount` int(10) unsigned NOT NULL
I need to get all the customers whom have been assigned more than one subaccount for the same company.
This is what I've got:
SELECT * FROM customers
WHERE id IN
(SELECT customers_id
FROM customers_subaccounts
GROUP BY customers_id, companies_id
HAVING COUNT(subaccount) > 1)
This query is too slow though. It's even slower if I add the DISTINCT modifier to customers_id in the SELECT of the subquery, which in the end retrieves the same customers list for the whole query. Maybe there's a better way without subquerying, anything faster will help, and I'm not sure whether it will retrieve an accurate correct list.
Any help?

You can replace the subquery with an INNER JOIN:
SELECT t1.id
FROM customers t1
INNER JOIN
(
SELECT DISTINCT customers_id
FROM customers_subaccounts
GROUP BY customers_id, companies_id
HAVING COUNT(*) > 1
) t2
ON t1.id = t2.customers_id

You can also try using EXISTS() which may be faster then a join :
SELECT * FROM customers t
WHERE EXISTS(SELECT 1 FROM customers_subaccounts s
WHERE s.customers_id = t.id
GROUP BY s.customers_id, s.companies_id
HAVING COUNT(subaccount) > 1)
You should also considering adding the following indexes(if not exists yet) :
customers_subaccounts (customers_id,companies_id,subaccount)
customers (id)

Assuming that you want different subaccounts for the company (or that they are guaranteed to be different anyway), then the following could be faster under some circumstances:
select c.*
from (select distinct cs.customers_id
from customers_subaccounts cs join
customers_subaccounts cs2
on cs.customers_id = cs2.customers_id and
cs.companies_id = cs2.companies_id and
cs.subaccount < cs2.subaccount
) cc join
customers c
on c.customers_id = cc.customers_id;
In particular, this can take advantage of an index on customers_subaccounts(customers_id, companies_id, subaccount).
Note: This assumes that the subaccounts are different for the rows you want. What is really needed is a way of defining unique rows in the customers_subaccounts table.

There is a way to speed up the query by using cache the sub-query result. A simple change in your query aware mysql that can cache the sub-query result:
SELECT * FROM customers
WHERE id IN
(select * from
(SELECT distinct customers_id
FROM customers_subaccounts
GROUP BY customers_id, companies_id
HAVING COUNT(subaccount) > 1) t1);
I used it many years ago and it helped me very much.

Try following;)
SELECT DISTINCT t1.*
FROM customers t1
INNER JOIN customers_subaccounts t2 ON t1.id = t2.customers_id
GROUP BY t1.id, t1.name, t2.companies_id
HAVING COUNT(t2.subaccount) > 1
Also you may add index on customers_id.

Related

Issue with all MySQL SELECT queries containing EXISTS subquery and LEFT JOIN with ON where ON has reference to external SELECT

This is the problem. When running any query of that type
SELECT field1
FROM table1
WHERE EXISTS (SELECT table2.field2, table3.field3, table3.field4
FROM table2 LEFT JOIN table3 ON table3.field3 = table2.field2
AND table3.field4 = table1.field1
WHERE "some condition");
I get this error:
Unknown column 'table1.field1' in 'on clause'
On the other hand, this query
SELECT field1
FROM table1
WHERE EXISTS (SELECT table2.field2, table3.field3, table3.field4
FROM table2 LEFT JOIN table3 ON table3.field3 = table2.field2
WHERE "some condition"
AND table3.field4 = table1.field1);
works fine.
There are possible alternatives, for example it can be inner join rather than outer join, negative subquery check (not exists), where clause is not necessary and field list can be different. The only critical part is EXISTS subquery and reference to table1.field1 under ON condition from JOIN.
I tried it on several MySQL and MariaDB servers with the same result! Also tried to find exactly the same issue online and here on SO - no success.
As per suggestion given in one of the comments, I modify the question with a real example.
Tables:
CREATE TABLE `sessions` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(11) DEFAULT NULL,
`browser` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`)
)
CREATE TABLE `browsers` (
`id` int(11) NOT NULL DEFAULT '0',
`browser` varchar(255) DEFAULT NULL
)
And to get all users who used all browsers, I run this query
select distinct user_id
from sessions as t1
where not exists (select t2.id, browsers.id
from sessions as t2 LEFT JOIN browsers ON t2.browser = browsers.browser
AND t2.user_id = t1.user_id
where browsers.id IS NULL);
Error message I get:
Error Code : 1054
Unknown column 't1.user_id' in 'on clause'
And of course the desired output I need is select query result set with a listing of users.
I know how to rewrite the query for this particular task, so this is not a problem. The problem is to run the query with this pattern for any other task since it seems very logical and good SQL.
My question is what I am doing wrong and if that is a bug, how I can avoid it keeping the same query structure.

I think you have met bug#96946, MySQL does not allow outer references in the JOIN ON clause.
If I am not mistaken, this is a rewrite of a double-nested NOT EXISTS query, and I think this statement will actually be accepted in MySQL:
SELECT DISTINCT user_id
FROM sessions AS s1
WHERE NOT EXISTS (SELECT *
FROM browsers AS b
WHERE NOT EXISTS (SELECT *
FROM sessions s2
WHERE s1.user_id = s2.user_id AND
s2.browser = b.browser
)
);

Update mysql table based with group_concat

UPDATE BELOW!
Who can help me out
I have a table:
CREATE TABLE `group_c` (
`parent_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`child_id` int(11) DEFAULT NULL,
`number` int(11) DEFAULT NULL,
PRIMARY KEY (`parent_id`)
) ENGINE=InnoDB;
INSERT INTO group_c(parent_id,child_id)
VALUES (1,1),(2,2),(3,3),(4,1),(5,4),(6,4),(7,6),(8,1),(9,2),(10,1),(11,1),(12,1),(13,0);
I want to update the number field to 1 for each child that has multiple parents:
SELECT group_concat(parent_id), count(*) as c FROM group_c group by child_id having c>1
Result:
GROUP_CONCAT(PARENT_ID) C
12,11,10,8,1,4 6
9,2 2
6,5 2
So all rows with parent_id 12,11,10,8,1,4,9,2,6,5 should be updated to number =1
I've tried something like:
UPDATE group_c SET number=1 WHERE FIND_IN_SET(parent_id, SELECT pid FROM (select group_concat(parent_id), count(*) as c FROM group_c group by child_id having c>1));
but that is not working.
How can I do this?
SQLFIDDLE: http://sqlfiddle.com/#!2/acb75/5
[edit]
I tried to make the example simple but the real thing is a bit more complicated since I'm grouping by multiple fields. Here is a new fiddle: http://sqlfiddle.com/#!2/7aed0/11

Why use GROUP_CONCAT() and then try to do something with it's result via FIND_IN_SET() ? That's not how SQL is intended to work. You may use simple JOIN to retrieve your records:
SELECT
parent_id
FROM
group_c
INNER JOIN
(SELECT
child_id,
count(*) as c
FROM
group_c
group by
child_id
having c>1) AS childs
ON childs.child_id=group_c.child_id
-check your modified demo. If you want UPDATE, then just use:
UPDATE
group_c
INNER JOIN
(SELECT
child_id,
count(*) as c
FROM
group_c
group by
child_id
having c>1) AS childs
ON childs.child_id=group_c.child_id
SET
group_c.number=1

For anyone interested. This is how I solved it. It's in two queries but in my case it's not really an issue.
UPDATE group_c INNER JOIN (
SELECT parent_id, count( * ) AS c
FROM `group_c`
GROUP BY child1,child2
HAVING c >1
) AS cc ON cc.parent_id = group_c.parent_id
SET group_c.number =1 WHERE number =0;
UPDATE group_c INNER JOIN group_c as gc ON
(gc.child1=group_c.child1 AND gc.child2=group_c.child2 AND gc.number=1)
SET group_c.number=1;
fiddle: http://sqlfiddle.com/#!2/46d0b4/1/0

Here's a similar solution...
UPDATE group_c a
JOIN
( SELECT DISTINCT x.child_id candidate
FROM group_c x
JOIN group_c y
ON y.child_id = x.child_id
AND y.parent_id < x.parent_id
) b
ON b.candidate = a.child_id
SET number = 1;
http://sqlfiddle.com/#!2/bc532/1

Is there a more efficent way to write this query?

Ok imagine the following DB structure
USERS:
id | name | company_id
1 John 1
2 Jane 1
3 Jack 2
4 Jill 3
COMPANIES:
id | name
1 CompanyA
2 CompanyB
3 CompanyC
4 CompanyD
First I want to SELECT all the companies that have more than one user
SELECT
`c`.`name`
FROM `companies` AS `c`
LEFT JOIN `users` AS `u` ON `c`.`id` = `u`.`company_id`
GROUP BY `c`.`id`
HAVING COUNT(`u`.`id`) > 1
Easy enough. Now I want to SELECT all the users that belong to a company that has more than one user. I have this combined query but I think this is not efficent
SELECT * FROM `users` WHERE `company_id` = (
SELECT
`c`.`id`
FROM `companies` AS `c`
LEFT JOIN `users` AS `u` ON `c`.`id` = `u`.`company_id`
GROUP BY `c`.`id`
HAVING COUNT(`u`.`id`) > 1
)
Basically I take the id returned from the first query (companies that have more than 1 user) and then query the users table to find all users with that company.

Why not
SELECT * FROM users u GROUP BY u.company_id HAVING COUNT(u.id) > 1
You don't really need any information from the companies table according to the data you say needs returning. "Now I want to SELECT all the users that belong to a company that has more than one user."

try this:
SELECT u.id,u.name,u.company_id FROM users u
inner join companies c on u.company_id = c.id
group by c.id
having count(u.id) > 1

Simplest way to get the users only is probably to keep the subquery but eliminate the join; since it's not a correlated subquery, it should be fairly efficient (obviously an index on company_id helps here);
SELECT u.* FROM USERS u WHERE company_id IN (
SELECT company_id FROM USERS GROUP BY company_id HAVING COUNT(*)>1
);
You could for example rewrite it as a LEFT JOIN, but I suspect it will actually be less efficient since you'd most likely need to use a DISTINCT when using a JOIN;
SELECT DISTINCT u.*
FROM USERS u
LEFT JOIN USERS u2
ON u.company_id=u2.company_id AND u.id<>u2.id
WHERE u2.id IS NOT NULL;
An SQLfiddle to test both.

Try also a semi-join query:
SELECT *
FROM users u
WHERE EXISTS (
SELECT null FROM users u1
WHERE u.company_id=u1.company_id
AND u.id <> u1.id
)
demo --> http://www.sqlfiddle.com/#!2/12dc34/2
Assumming that id is a primary key column, creating an index on company_id column gives better performance.
If you are really obsessed with the performance of this query, create a composite index on columns company_id + id:
CREATE INDEX very_fast ON users( company_id, id );

Could you try this?
SELECT users.*
FROM users INNER JOIN
(
SELECT company_id
FROM users
GROUP BY company_id
HAVING COUNT(*) > 1
) x USING(company_id);
You should have an index INDEX(company_id)
Peformance Test
I have tested 3 queries in answers.
Q1 = sub-query (with GROUP BY) and INNER JOIN
Q2 = LEFT JOIN and IS NOT NULL
Q3 = EXISTS
All queries return same result. Test was done with TPC-H lineitem table. And The problem is "find lineitem have more than 1 item"
Test Results
It depends on what you want is retrieving FIRST N row or entire rows.
Q1 (get FIRST 10K rows) : 2.85 sec
Q2 (get FIRST 10K rows) : 0.03 sec
Q3 (get FIRST 10K rows) : 0.03 sec
Q1 (get all rows) : 8.19 sec
Q2 (get all rows) : 34.12 sec
Q3 (get all rows) : 29.54 sec
Schema and DATA
mysql> SELECT SQL_NO_CACHE COUNT(*) FROM lineitem\G
*************************** 1. row ***************************
COUNT(*): 11997996
1 row in set (1.68 sec)
mysql> SHOW CREATE TABLE lineitem\G
*************************** 1. row ***************************
Table: lineitem
Create Table: CREATE TABLE `lineitem` (
`l_orderkey` int(11) NOT NULL,
`l_partkey` int(11) NOT NULL,
`l_suppkey` int(11) NOT NULL,
`l_linenumber` int(11) NOT NULL,
`l_quantity` decimal(15,2) NOT NULL,
`l_extendedprice` decimal(15,2) NOT NULL,
`l_discount` decimal(15,2) NOT NULL,
`l_tax` decimal(15,2) NOT NULL,
`l_returnflag` char(1) NOT NULL,
`l_linestatus` char(1) NOT NULL,
`l_shipDATE` date NOT NULL,
`l_commitDATE` date NOT NULL,
`l_receiptDATE` date NOT NULL,
`l_shipinstruct` char(25) NOT NULL,
`l_shipmode` char(10) NOT NULL,
`l_comment` varchar(44) NOT NULL,
PRIMARY KEY (`l_orderkey`,`l_linenumber`),
KEY `l_orderkey` (`l_orderkey`),
KEY `l_partkey` (`l_partkey`,`l_suppkey`),
CONSTRAINT `lineitem_ibfk_1` FOREIGN KEY (`l_orderkey`) REFERENCES `orders` (`o_orderkey`),
CONSTRAINT `lineitem_ibfk_2` FOREIGN KEY (`l_partkey`, `l_suppkey`) REFERENCES `partsupp` (`ps_partkey`, `ps_suppkey`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
Queries
Q1 FIRST 10K
SELECT SQL_NO_CACHE DISTINCT u.l_orderkey, u.l_linenumber
FROM lineitem u INNER JOIN
(
SELECT l_orderkey
FROM lineitem
GROUP BY l_orderkey
HAVING COUNT(*) > 1
) x USING (l_orderkey)
LIMIT 10000;
Q2 FIRST 10K
SELECT SQL_NO_CACHE DISTINCT u.l_orderkey, u.l_linenumber
FROM lineitem u
LEFT JOIN lineitem u2
ON u.l_orderkey=u2.l_orderkey AND u.l_linenumber<>u2.l_linenumber
WHERE u2.l_linenumber IS NOT NULL
LIMIT 10000;
Q3 FIRST 10K
SELECT SQL_NO_CACHE DISTINCT u.l_orderkey, u.l_linenumber
FROM lineitem u
WHERE EXISTS (
SELECT null FROM lineitem u1
WHERE u.l_orderkey=u1.l_orderkey
AND u.l_linenumber <> u1.l_linenumber
)
LIMIT 10000;
retrieve entire rows
Q1 ALL
SELECT SQL_NO_CACHE COUNT(*)
FROM lineitem u INNER JOIN
(
SELECT l_orderkey
FROM lineitem
GROUP BY l_orderkey
HAVING COUNT(*) > 1
) x USING (l_orderkey);
Q2 ALL
SELECT SQL_NO_CACHE COUNT(*)
FROM lineitem u
LEFT JOIN lineitem u2
ON u.l_orderkey=u2.l_orderkey AND u.l_linenumber<>u2.l_linenumber
WHERE u2.l_linenumber IS NOT NULL;
Q3 ALL
SELECT SQL_NO_CACHE COUNT(*)
FROM lineitem u
WHERE EXISTS (
SELECT null FROM lineitem u1
WHERE u.l_orderkey=u1.l_orderkey
AND u.l_linenumber <> u1.l_linenumber
);

Two select statements in one query

I need to get the users age by his ID. Easy.
The problem is, at the first time I don't know their IDs, the only thing I know is that it is in a specific table, let's name it "second".
SELECT `age` FROM `users` WHERE `userid`=(SELECT `id` FROM `second`)
How can I do that?

SELECT age FROM users WHERE userid IN (SELECT id FROM second)
This should work

Your example
SELECT `age` FROM `users` WHERE `userid`=
(SELECT `id` FROM `second`
WHERE `second`.`name` = 'Berna')
should have worked as long as you add a where criteria. This is called subqueries, and is supported in MySQL 5. Reference http://dev.mysql.com/doc/refman/5.1/en/comparisons-using-subqueries.html

SELECT
age
FROM
users
inner join
Second
on
users.UserID = second.ID
An inner join will be more efficient than a sub-select

SELECT age FROM users WHERE userid IN (SELECT id FROM second)
but preferably
SELECT u.age FROM users u INNER JOIN second s ON u.userid = s.id

You want to use the 'in' statement:
select * from a
where x=8 and y=1 and z in (
select z from b where x=8 and active > '2010-01-07 00:00:00' group by z
)

How do I write this kind of query (returning the latest avaiable data for each row)

I have a table defined like this:
CREATE TABLE mytable (id INT NOT NULL AUTO_INCREMENT, PRIMARY KEY(id),
user_id INT REFERENCES user(id) ON UPDATE CASCASE ON DELETE RESTRICT,
amount REAL NOT NULL CHECK (amount > 0),
record_date DATE NOT NULL
);
CREATE UNIQUE INDEX idxu_mybl_key ON mytable (user_id, amount, record_date);
I want to write a query that will have two columns:
user_id
amount
There should be only ONE entry in the returned result set for a given user. Furthermore, the amount figure returned should be the last recoreded amount for the user (i.e. MAX(record_date).
The complication arises because weights are recorded on different dates for different users, so there is no single LAST record_date for all users.
How may I write (preferably an ANSI SQL) query to return the columns mentioned previously, but ensuring that its only the amount for the last recorded amount for the user that is returned?
As an aside, it is probably a good idea to return the 'record_date' column as well in the query, so that it is eas(ier) to verify that the query is working as required.
I am using MySQL as my backend db, but ideally the query should be db agnostic (i.e. ANSI SQL) if possible.

First you need the last record_date for each user:
select user_id, max(record_date) as last_record_date
from mytable
group by user_id
Now, you can join previous query with mytable itself to get amount for this record_date:
select
t1.user_id, last_record_date, amount
from
mytable t1
inner join
( select user_id, max(record_date) as last_record_date
from mytable
group by user_id
) t2
on t1.user_id = t2.user_id
and t1.record_date = t2.last_record_date
A problem appears becuase a user can have several rows for same last_record_date (with different amounts). Then you should get one of them, sample (getting the max of the different amounts):
select
t1.user_id, t1.record_date as last_record_date, max(t1.amount)
from
mytable t1
inner join
( select user_id, max(record_date) as last_record_date
from mytable
group by user_id
) t2
on t1.user_id = t2.user_id
and t1.record_date = t2.last_record_date
group by t1.user_id, t1.record_date

I do not now about MySQL but in general SQL you need a sub-query for that. You must join the query that calculates the greatest record_date with the original one that calculates the corresponding amount. Roughly like this:
SELECT B.*
FROM
(select user_id, max(record_date) max_date from mytable group by user_id) A
join
mytable B
on A.user_id = B.user_id and A.max_date = B.record_date

SELECT datatable.* FROM
mytable AS datatable
INNER JOIN (
SELECT user_id,max(record_date) AS max_record_date FROM mytable GROUP BS user_id
) AS selectortable ON
selectortable.user_id=datatable.user_id
AND
selectortable.max_record_date=datatable.record_date
in some SQLs you might need
SELECT MAX(user_id), ...
in the selectortable view instead of simply SELECT user_id,...

The definition of maximum: there is no larger(or: "more recent") value than this one. This naturally leads to a NOT EXISTS query, which should be available in any DBMS.
SELECT user_id, amount
FROM mytable mt
WHERE mt.user_id = $user
AND NOT EXISTS ( SELECT *
FROM mytable nx
WHERE nx.user_id = mt.user_id
AND nx.record_date > mt.record_date
)
;
BTW: your table definition allows more than one record to exist for a given {id,date}, but with different amounts. This query will return them all.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Optimizing MySQL query removing subquery - mysql

You can replace the subquery with an INNER JOIN: SELECT t1.id FROM customers t1 INNER JOIN ( SELECT DISTINCT customers_id FROM customers_subaccounts GROUP BY customers_id, companies_id HAVING COUNT(*) > 1 ) t2 ON t1.id = t2.customers_id

Try following;) SELECT DISTINCT t1.* FROM customers t1 INNER JOIN customers_subaccounts t2 ON t1.id = t2.customers_id GROUP BY t1.id, t1.name, t2.companies_id HAVING COUNT(t2.subaccount) > 1 Also you may add index on customers_id.

Related

Issue with all MySQL SELECT queries containing EXISTS subquery and LEFT JOIN with ON where ON has reference to external SELECT

Update mysql table based with group_concat

Is there a more efficent way to write this query?

Two select statements in one query

How do I write this kind of query (returning the latest avaiable data for each row)

Categories

Resources