First İ use wherehas but then I decided use this way. This way result better than wherehas but It isn't satisfy me. Query response time is a 873ms. So I have 400k+ data in the table.
select count(*) as aggregate
from `orders`
where (`pickup_address_id` in (
select `id`
from `addresses`
where `region_id` = 12)
or `delivery_address_id` in (
select `id`
from `addresses`
where `region_id` = 12)
) and `orders`.`status` = 2
Try this:
select count(distinct o.`id`) as aggregate
from `orders` o
inner join `addresses` a ON a.`id` IN (o.`pickup_address_id`, o.`delivery_address_id`)
AND a.`region_id` = 12
where o.`status` = 2
Alternatively:
SELECT count(distinct id) as aggregate
FROM (
select o.`id`
from `orders` o
inner join `addresses` a ON a.`id` = o.`pickup_address_id`
AND a.`region_id` = 12
where o.`status` = 2
UNION
select o.`id`
from `orders` o
inner join `addresses` a ON a.`id` = o.`delivery_address_id`
AND a.`region_id` = 12
where o.`status` = 2
) t
But I don't know you'll improve much to look through 400K rows in less than a second.
First, you can try to eliminate multiple (twice, to be more precise) same subquery evaluation using a Common Table Expression
WITH CTE(id) AS (
SELECT id
FROM addresses
WHERE region_id = 12
)
This CTE would be evaluated once.
Second, get row count from orders table joined with cte on existence of pickup_address_id and delivery_address_id in cte.
WITH CTE(id) AS (
SELECT id
FROM addresses
WHERE region_id = 12
)
SELECT COUNT (*)
FROM orders
CROSS JOIN CTE ON CTE.id = orders.delivery_address_id
OR CTE.id = orders.pickup_address_id
Finally, add filter by status = 2 and query would be like
WITH CTE(id) AS (
SELECT id
FROM addresses
WHERE region_id = 12
)
SELECT COUNT (*)
FROM orders
CROSS JOIN CTE ON CTE.id = orders.delivery_address_id
OR CTE.id = orders.pickup_address_id
WHERE orders.status = 2
Also you should have the following indexes:
addresses table:
INDEX (region_id)
orders table:
INDEX (pickup_address_id),
INDEX (delivery_address_id),
INDEX (status)
Give it a try.
With empty tables I've got this
Schema (MySQL v8.0)
create table addresses (
id int primary key,
region_id int not null,
index(region_id)
);
create table orders (
id int primary key,
pickup_address_id int,
delivery_address_id int,
status int not null,
index (pickup_address_id),
index (delivery_address_id),
index(status),
foreign key (pickup_address_id) references addresses(id),
foreign key (delivery_address_id) references addresses(id)
);
Query #1
explain with cte(id) as (
select id from addresses where region_id = 12)
select count(*) from orders
cross join cte on cte.id = orders.delivery_address_id
or cte.id = orders.pickup_address_id
where status = 2;
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
orders
ref
pickup_address_id,delivery_address_id,status
status
4
const
1
100
1
SIMPLE
addresses
ref
PRIMARY,region_id
region_id
4
const
1
100
Using where; Using index
Related
I'm trying to get this to work. When I run the SELECT on the whole dataset I know that the record with cust_number shows up in position 6 (When Using ORDER BY) but this code returns position 37327 which is it's non ordered by position.
SELECT
x.position,
x.cust_number,
x.company,
x.surname,
x.first_name,
x.title
FROM
(SELECT
#rownum:=#rownum + 1 AS position,
c.cust_number,
company,
surname,
first_name,
title
FROM
1_customer_records c
LEFT JOIN addresses a ON c.fk_addresses_id = a.id
JOIN (SELECT #rownum:=0) r
ORDER BY a.company , c.surname , c.first_name , c.title) x
WHERE
x.cust_number = 43246;
Here is another approach using a temp table
CREATE TEMPORARY TABLE row_calc (id INT AUTO_INCREMENT, fk INT NULL, PRIMARY KEY (id)) ENGINE=MEMORY;
INSERT INTO row_calc(fk)
SELECT
cust_number
FROM
1_customer_records c
LEFT JOIN
addresses a ON c.fk_addresses_id = a.id
ORDER BY company,surname,first_name,title;
SELECT
id
FROM
row_calc
WHERE
fk = 43246 LIMIT 1;
DROP TABLE row_calc;
I have a base query which uses a view which uses another view, like this.
SELECT a,b,c,DEBIT_AMOUNT, CREDIT_AMOUNT FROM MAIN_VIEW WHERE a='foo' AND c='bar';
Here's the schema
create table BASE_TABLE (
id int not null auto_increment,
a varchar(20),
b varchar(20),
c varchar(20),
primary key (id));
create table OTHER_TABLE (
oid int not null auto_increment,
id int not null,
mtype varchar(10),
amount varchar(20),
primary key (oid));
create or replace view `MAIN_VIEW` AS
SELECT BT.a, BT.b, BT.c,SUB_VIEW.DEBIT_AMOUNT, SUB_VIEW.CREDIT_AMOUNT
FROM BASE_TABLE BT
LEFT JOIN SUB_VIEW ON SUB_VIEW.id = BT.id
create or replace view `SUB_VIEW` AS
SELECT BT.id,
( SELECT SUM(O.amount)
FROM OTHER_TABLE O
WHERE O.mtype = 'DR'
AND O.id = BT.id
) AS DEBIT_AMOUNT,
( SELECT SUM(O.amount)
FROM OTHER_TABLE O
WHERE O.mtype = 'CR'
AND O.id = BT.id
) AS CREDIT_AMOUNT
FROM BASE_TABLE BT
My query is permformance is very slow, to speed up query execution, i've modified the MAIN_VIEW like this
since the BASE_TABLE is already available on MAIN_VIEW, i thought fetching DEBIT_AMOUNT and CREDIT_AMOUNT from then and there rather than going into the SUB_VIEW
-- MAIN_VIEW ---
create or replace view `MAIN_VIEW` AS
SELECT BT.a, BT.b, BT.c,
( SELECT SUM(O.amount)
FROM OTHER_TABLE O
WHERE O.mtype = 'DR'
AND O.id = BT.id
) AS DEBIT_AMOUNT,
( SELECT SUM(O.amount)
FROM OTHER_TABLE O
WHERE O.mtype = 'CR'
AND O.id = BT.id
) AS CREDIT_AMOUNT
FROM BASE_TABLE BT
But after this modification, query performance is even worse.. can any one help? I thought subviews are be bad for performance...
You need INDEX(id, mtype) (in either order). This should make the subqueries faster, hence the entire query faster.
Have a table containing form data. Each row contains a section_id and field_id. There are 50 distinct fields for each section. As users update an existing field, a new row is inserted with an updated date_modified. This keeps a rolling archive of changes.
The problem is that I'm getting erratic results when pulling the most recent set of fields to display on a page.
I've narrowed down the problem to a couple of fields, and have recreated a portion of the table in question on SQLFiddle.
Schema:
CREATE TABLE IF NOT EXISTS `cTable` (
`section_id` int(5) NOT NULL,
`field_id` int(5) DEFAULT NULL,
`content` text,
`user_id` int(11) NOT NULL,
`date_modified` datetime NOT NULL,
KEY `section_id` (`section_id`),
KEY `field_id` (`field_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
This query shows all previously edited rows for field_id 39. There are five rows returned:
SELECT cT.*
FROM cTable cT
WHERE
cT.section_id = 123 AND
cT.field_id=39;
Here's what I'm trying to do to pull the most recent row for field_id 39. No rows returned:
SELECT cT.*
FROM cTable cT
INNER JOIN (
SELECT field_id, MAX(date_modified) AS date_modified
FROM cTable GROUP BY field_id
) AS max USING (field_id, date_modified)
WHERE
cT.section_id = 123 AND
cT.field_id=39;
Record Count: 0;
If I try the same query on a different field_id, say 54, I get the correct result:
SELECT cT.*
FROM cTable cT
INNER JOIN (
SELECT field_id, MAX(date_modified) AS date_modified
FROM cTable GROUP BY field_id
) AS max USING (field_id, date_modified)
WHERE
cT.section_id = 123 AND
cT.field_id=54;
Record Count: 1;
Why would same query work on one field_id, but not the other?
In your subquery from where you are getting maxima you need to GROUP BY section_id,field_id using just GROUP BY field_id is skipping the section id, on which you are applying filter
SELECT cT.*
FROM cTable cT
INNER JOIN (
SELECT section_id,field_id, MAX(date_modified) AS date_modified
FROM cTable GROUP BY section_id,field_id
) AS max
ON(max.field_id =cT.field_id
AND max.date_modified=cT.date_modified
AND max.section_id=cT.section_id
)
WHERE
cT.section_id = 123 AND
cT.field_id=39;
See Fiddle Demo
You are looking for the max(date_modified) per field_id. But you should look for the max(date_modified) per field_id where the section_id is 123. Otherwise you may find a date for which you find no match later.
SELECT cT.*
FROM cTable cT
INNER JOIN (
SELECT field_id, MAX(date_modified) AS date_modified
FROM cTable
WHERE section_id = 123
GROUP BY field_id
) AS max USING (field_id, date_modified)
WHERE
cT.section_id = 123 AND
cT.field_id=39;
Here is the SQL fiddle: http://www.sqlfiddle.com/#!2/0cefd8/19.
Ok imagine the following DB structure
USERS:
id | name | company_id
1 John 1
2 Jane 1
3 Jack 2
4 Jill 3
COMPANIES:
id | name
1 CompanyA
2 CompanyB
3 CompanyC
4 CompanyD
First I want to SELECT all the companies that have more than one user
SELECT
`c`.`name`
FROM `companies` AS `c`
LEFT JOIN `users` AS `u` ON `c`.`id` = `u`.`company_id`
GROUP BY `c`.`id`
HAVING COUNT(`u`.`id`) > 1
Easy enough. Now I want to SELECT all the users that belong to a company that has more than one user. I have this combined query but I think this is not efficent
SELECT * FROM `users` WHERE `company_id` = (
SELECT
`c`.`id`
FROM `companies` AS `c`
LEFT JOIN `users` AS `u` ON `c`.`id` = `u`.`company_id`
GROUP BY `c`.`id`
HAVING COUNT(`u`.`id`) > 1
)
Basically I take the id returned from the first query (companies that have more than 1 user) and then query the users table to find all users with that company.
Why not
SELECT * FROM users u GROUP BY u.company_id HAVING COUNT(u.id) > 1
You don't really need any information from the companies table according to the data you say needs returning. "Now I want to SELECT all the users that belong to a company that has more than one user."
try this:
SELECT u.id,u.name,u.company_id FROM users u
inner join companies c on u.company_id = c.id
group by c.id
having count(u.id) > 1
Simplest way to get the users only is probably to keep the subquery but eliminate the join; since it's not a correlated subquery, it should be fairly efficient (obviously an index on company_id helps here);
SELECT u.* FROM USERS u WHERE company_id IN (
SELECT company_id FROM USERS GROUP BY company_id HAVING COUNT(*)>1
);
You could for example rewrite it as a LEFT JOIN, but I suspect it will actually be less efficient since you'd most likely need to use a DISTINCT when using a JOIN;
SELECT DISTINCT u.*
FROM USERS u
LEFT JOIN USERS u2
ON u.company_id=u2.company_id AND u.id<>u2.id
WHERE u2.id IS NOT NULL;
An SQLfiddle to test both.
Try also a semi-join query:
SELECT *
FROM users u
WHERE EXISTS (
SELECT null FROM users u1
WHERE u.company_id=u1.company_id
AND u.id <> u1.id
)
demo --> http://www.sqlfiddle.com/#!2/12dc34/2
Assumming that id is a primary key column, creating an index on company_id column gives better performance.
If you are really obsessed with the performance of this query, create a composite index on columns company_id + id:
CREATE INDEX very_fast ON users( company_id, id );
Could you try this?
SELECT users.*
FROM users INNER JOIN
(
SELECT company_id
FROM users
GROUP BY company_id
HAVING COUNT(*) > 1
) x USING(company_id);
You should have an index INDEX(company_id)
Peformance Test
I have tested 3 queries in answers.
Q1 = sub-query (with GROUP BY) and INNER JOIN
Q2 = LEFT JOIN and IS NOT NULL
Q3 = EXISTS
All queries return same result. Test was done with TPC-H lineitem table. And The problem is "find lineitem have more than 1 item"
Test Results
It depends on what you want is retrieving FIRST N row or entire rows.
Q1 (get FIRST 10K rows) : 2.85 sec
Q2 (get FIRST 10K rows) : 0.03 sec
Q3 (get FIRST 10K rows) : 0.03 sec
Q1 (get all rows) : 8.19 sec
Q2 (get all rows) : 34.12 sec
Q3 (get all rows) : 29.54 sec
Schema and DATA
mysql> SELECT SQL_NO_CACHE COUNT(*) FROM lineitem\G
*************************** 1. row ***************************
COUNT(*): 11997996
1 row in set (1.68 sec)
mysql> SHOW CREATE TABLE lineitem\G
*************************** 1. row ***************************
Table: lineitem
Create Table: CREATE TABLE `lineitem` (
`l_orderkey` int(11) NOT NULL,
`l_partkey` int(11) NOT NULL,
`l_suppkey` int(11) NOT NULL,
`l_linenumber` int(11) NOT NULL,
`l_quantity` decimal(15,2) NOT NULL,
`l_extendedprice` decimal(15,2) NOT NULL,
`l_discount` decimal(15,2) NOT NULL,
`l_tax` decimal(15,2) NOT NULL,
`l_returnflag` char(1) NOT NULL,
`l_linestatus` char(1) NOT NULL,
`l_shipDATE` date NOT NULL,
`l_commitDATE` date NOT NULL,
`l_receiptDATE` date NOT NULL,
`l_shipinstruct` char(25) NOT NULL,
`l_shipmode` char(10) NOT NULL,
`l_comment` varchar(44) NOT NULL,
PRIMARY KEY (`l_orderkey`,`l_linenumber`),
KEY `l_orderkey` (`l_orderkey`),
KEY `l_partkey` (`l_partkey`,`l_suppkey`),
CONSTRAINT `lineitem_ibfk_1` FOREIGN KEY (`l_orderkey`) REFERENCES `orders` (`o_orderkey`),
CONSTRAINT `lineitem_ibfk_2` FOREIGN KEY (`l_partkey`, `l_suppkey`) REFERENCES `partsupp` (`ps_partkey`, `ps_suppkey`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
Queries
Q1 FIRST 10K
SELECT SQL_NO_CACHE DISTINCT u.l_orderkey, u.l_linenumber
FROM lineitem u INNER JOIN
(
SELECT l_orderkey
FROM lineitem
GROUP BY l_orderkey
HAVING COUNT(*) > 1
) x USING (l_orderkey)
LIMIT 10000;
Q2 FIRST 10K
SELECT SQL_NO_CACHE DISTINCT u.l_orderkey, u.l_linenumber
FROM lineitem u
LEFT JOIN lineitem u2
ON u.l_orderkey=u2.l_orderkey AND u.l_linenumber<>u2.l_linenumber
WHERE u2.l_linenumber IS NOT NULL
LIMIT 10000;
Q3 FIRST 10K
SELECT SQL_NO_CACHE DISTINCT u.l_orderkey, u.l_linenumber
FROM lineitem u
WHERE EXISTS (
SELECT null FROM lineitem u1
WHERE u.l_orderkey=u1.l_orderkey
AND u.l_linenumber <> u1.l_linenumber
)
LIMIT 10000;
retrieve entire rows
Q1 ALL
SELECT SQL_NO_CACHE COUNT(*)
FROM lineitem u INNER JOIN
(
SELECT l_orderkey
FROM lineitem
GROUP BY l_orderkey
HAVING COUNT(*) > 1
) x USING (l_orderkey);
Q2 ALL
SELECT SQL_NO_CACHE COUNT(*)
FROM lineitem u
LEFT JOIN lineitem u2
ON u.l_orderkey=u2.l_orderkey AND u.l_linenumber<>u2.l_linenumber
WHERE u2.l_linenumber IS NOT NULL;
Q3 ALL
SELECT SQL_NO_CACHE COUNT(*)
FROM lineitem u
WHERE EXISTS (
SELECT null FROM lineitem u1
WHERE u.l_orderkey=u1.l_orderkey
AND u.l_linenumber <> u1.l_linenumber
);
(query updated for cdhowie comments)
Here's the situation.
I want to "count the number of tasks assigned to each worker within kind 1,2 of task location AND kind 3,4 of task department".
Suppose I have the following tables
Task : id, Name
Task_Worker_Combi : Task_id, Worker_id
Worker : id, Name
Task_Location_Combi : Task_id, Location_id
Task_Department_Combi : Task_id, Department_id
Location : id, Name
Department : id, Name
I got as far as the following:(however since it takes forever there must be something wrong with the query)
SELECT W.id, W.Name, COUNT(TWC.Task_id) AS Count
FROM Worker AS W
LEFT JOIN Task_Worker_Combi AS TWC
ON (W.id=TWC.Worker_id)
WHERE W.id>0 AND TWC.Task_id IN
(
SELECT T.id
FROM Task as T
LEFT JOIN (Task_Location_Combi AS TLC, Task_Department_Combi AS TDC)
ON (T.id=TLC.Task_id AND T.id=TDC.Task_id)
WHERE 1 AND TLC.Location_id IN (1,2) AND TDC.Department_id IN (3,4)
GROUP BY T.id
)
GROUP BY W.id
ORDER BY W.Name
Without this subquery it returns "the number of tasks assigned to each worker unconditionally" fine.
AND TWC.Task_id IN
(
SELECT T.id
FROM Task as T
LEFT JOIN (Task_Location_Combi AS TLC, Task_Department_Combi AS TDC)
ON (T.id=TLC.Task_id AND T.id=TDC.Task_id)
WHERE 1 AND TLC.Location_id IN (1,2) AND TDC.Department_id IN (3,4)
GROUP BY T.id
)
Where went wrong, and how would you rewrite this query to efficiently work? Please help me somebody. I'm stuck here for over a week now!
The actual query is the following. (assume Job as Task and Worker as Industry from above simplified query)
EXPLAIN SELECT id, Name, COUNT( J.Industry ) AS Count
FROM industry_db.industry AS I
LEFT JOIN job_db.industry AS J ON ( I.id = J.Industry )
WHERE I.id >0
AND J.Job
IN (
SELECT t1.id
FROM job_db.job AS t1
LEFT JOIN (
company_db.company AS t2, job_db.industry AS t3, location_db.city AS t4, job_db.function AS t5, job_db.tag AS t6, job_db.degree AS t7, location_db.state AS t8, location_db.group AS t9
) ON ( t1.Company = t2.id
AND t1.id = t3.Job
AND t1.City = t4.id
AND t1.id = t5.Job
AND t1.id = t6.Job
AND t1.id = t7.Job
AND t1.State = t8.id
AND t1.State_Group = t9.id )
WHERE 1
AND t1.Open = '1'
GROUP BY t1.id)
GROUP BY id
HAVING Count >0
ORDER BY Name
And the Explain result from phpmyadmin is the following.
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY I range PRIMARY,id PRIMARY 1 NULL 39 Using where; Using temporary; Using filesort
1 PRIMARY J ref Industry Industry 1 industry_db.I.id 403 Using where
2 DEPENDENT SUBQUERY t1 index NULL PRIMARY 4 NULL 2868 Using where
2 DEPENDENT SUBQUERY t9 eq_ref PRIMARY PRIMARY 1 job_db.t1.State_Group 1 Using index
2 DEPENDENT SUBQUERY t2 eq_ref PRIMARY PRIMARY 2 job_db.t1.Company 1 Using index
2 DEPENDENT SUBQUERY t8 eq_ref PRIMARY PRIMARY 1 job_db.t1.State 1 Using index
2 DEPENDENT SUBQUERY t4 eq_ref PRIMARY PRIMARY 4 job_db.t1.City 1 Using index
2 DEPENDENT SUBQUERY t7 ref PRIMARY PRIMARY 4 job_db.t1.id 1 Using index
2 DEPENDENT SUBQUERY t3 ref Job Job 4 job_db.t1.id 1 Using index
2 DEPENDENT SUBQUERY t5 ref PRIMARY PRIMARY 4 job_db.t7.Job 1 Using index
2 DEPENDENT SUBQUERY t6 ref PRIMARY PRIMARY 4 job_db.t7.Job 2 Using index
Try this:
SELECT w.id, w.Name, COUNT( tw.Task_id )
FROM Worker AS w
LEFT JOIN Task_Worker_Combi AS tw
ON(
w.id = tw.Worker_id AND
EXISTS( SELECT Task_id FROM Task_Location_Combi
WHERE Task_id = tw.TaskId AND Location_id IN(1, 2) ) AND
EXISTS( SELECT Task_id FROM Task_Department_Combi
WHERE Task_id = tw_TaskId AND Department_id IN(3, 4) )
)