I want to optimize my sql query. Becasuse long response time - mysql

First İ use wherehas but then I decided use this way. This way result better than wherehas but It isn't satisfy me. Query response time is a 873ms. So I have 400k+ data in the table.
select count(*) as aggregate
from `orders`
where (`pickup_address_id` in (
select `id`
from `addresses`
where `region_id` = 12)
or `delivery_address_id` in (
select `id`
from `addresses`
where `region_id` = 12)
) and `orders`.`status` = 2

Try this:
select count(distinct o.`id`) as aggregate
from `orders` o
inner join `addresses` a ON a.`id` IN (o.`pickup_address_id`, o.`delivery_address_id`)
AND a.`region_id` = 12
where o.`status` = 2
Alternatively:
SELECT count(distinct id) as aggregate
FROM (
select o.`id`
from `orders` o
inner join `addresses` a ON a.`id` = o.`pickup_address_id`
AND a.`region_id` = 12
where o.`status` = 2
UNION
select o.`id`
from `orders` o
inner join `addresses` a ON a.`id` = o.`delivery_address_id`
AND a.`region_id` = 12
where o.`status` = 2
) t
But I don't know you'll improve much to look through 400K rows in less than a second.

First, you can try to eliminate multiple (twice, to be more precise) same subquery evaluation using a Common Table Expression
WITH CTE(id) AS (
SELECT id
FROM addresses
WHERE region_id = 12
)
This CTE would be evaluated once.
Second, get row count from orders table joined with cte on existence of pickup_address_id and delivery_address_id in cte.
WITH CTE(id) AS (
SELECT id
FROM addresses
WHERE region_id = 12
)
SELECT COUNT (*)
FROM orders
CROSS JOIN CTE ON CTE.id = orders.delivery_address_id
OR CTE.id = orders.pickup_address_id
Finally, add filter by status = 2 and query would be like
WITH CTE(id) AS (
SELECT id
FROM addresses
WHERE region_id = 12
)
SELECT COUNT (*)
FROM orders
CROSS JOIN CTE ON CTE.id = orders.delivery_address_id
OR CTE.id = orders.pickup_address_id
WHERE orders.status = 2
Also you should have the following indexes:
addresses table:
INDEX (region_id)
orders table:
INDEX (pickup_address_id),
INDEX (delivery_address_id),
INDEX (status)
Give it a try.
With empty tables I've got this
Schema (MySQL v8.0)
create table addresses (
id int primary key,
region_id int not null,
index(region_id)
);
create table orders (
id int primary key,
pickup_address_id int,
delivery_address_id int,
status int not null,
index (pickup_address_id),
index (delivery_address_id),
index(status),
foreign key (pickup_address_id) references addresses(id),
foreign key (delivery_address_id) references addresses(id)
);
Query #1
explain with cte(id) as (
select id from addresses where region_id = 12)
select count(*) from orders
cross join cte on cte.id = orders.delivery_address_id
or cte.id = orders.pickup_address_id
where status = 2;
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
orders
ref
pickup_address_id,delivery_address_id,status
status
4
const
1
100
1
SIMPLE
addresses
ref
PRIMARY,region_id
region_id
4
const
1
100
Using where; Using index

Related

MySQL Row Position Using ORDER BY

I'm trying to get this to work. When I run the SELECT on the whole dataset I know that the record with cust_number shows up in position 6 (When Using ORDER BY) but this code returns position 37327 which is it's non ordered by position.
SELECT
x.position,
x.cust_number,
x.company,
x.surname,
x.first_name,
x.title
FROM
(SELECT
#rownum:=#rownum + 1 AS position,
c.cust_number,
company,
surname,
first_name,
title
FROM
1_customer_records c
LEFT JOIN addresses a ON c.fk_addresses_id = a.id
JOIN (SELECT #rownum:=0) r
ORDER BY a.company , c.surname , c.first_name , c.title) x
WHERE
x.cust_number = 43246;
Here is another approach using a temp table
CREATE TEMPORARY TABLE row_calc (id INT AUTO_INCREMENT, fk INT NULL, PRIMARY KEY (id)) ENGINE=MEMORY;
INSERT INTO row_calc(fk)
SELECT
cust_number
FROM
1_customer_records c
LEFT JOIN
addresses a ON c.fk_addresses_id = a.id
ORDER BY company,surname,first_name,title;
SELECT
id
FROM
row_calc
WHERE
fk = 43246 LIMIT 1;
DROP TABLE row_calc;

Mysql View vs Query performance

I have a base query which uses a view which uses another view, like this.
SELECT a,b,c,DEBIT_AMOUNT, CREDIT_AMOUNT FROM MAIN_VIEW WHERE a='foo' AND c='bar';
Here's the schema
create table BASE_TABLE (
id int not null auto_increment,
a varchar(20),
b varchar(20),
c varchar(20),
primary key (id));
create table OTHER_TABLE (
oid int not null auto_increment,
id int not null,
mtype varchar(10),
amount varchar(20),
primary key (oid));
create or replace view `MAIN_VIEW` AS
SELECT BT.a, BT.b, BT.c,SUB_VIEW.DEBIT_AMOUNT, SUB_VIEW.CREDIT_AMOUNT
FROM BASE_TABLE BT
LEFT JOIN SUB_VIEW ON SUB_VIEW.id = BT.id
create or replace view `SUB_VIEW` AS
SELECT BT.id,
( SELECT SUM(O.amount)
FROM OTHER_TABLE O
WHERE O.mtype = 'DR'
AND O.id = BT.id
) AS DEBIT_AMOUNT,
( SELECT SUM(O.amount)
FROM OTHER_TABLE O
WHERE O.mtype = 'CR'
AND O.id = BT.id
) AS CREDIT_AMOUNT
FROM BASE_TABLE BT
My query is permformance is very slow, to speed up query execution, i've modified the MAIN_VIEW like this
since the BASE_TABLE is already available on MAIN_VIEW, i thought fetching DEBIT_AMOUNT and CREDIT_AMOUNT from then and there rather than going into the SUB_VIEW
-- MAIN_VIEW ---
create or replace view `MAIN_VIEW` AS
SELECT BT.a, BT.b, BT.c,
( SELECT SUM(O.amount)
FROM OTHER_TABLE O
WHERE O.mtype = 'DR'
AND O.id = BT.id
) AS DEBIT_AMOUNT,
( SELECT SUM(O.amount)
FROM OTHER_TABLE O
WHERE O.mtype = 'CR'
AND O.id = BT.id
) AS CREDIT_AMOUNT
FROM BASE_TABLE BT
But after this modification, query performance is even worse.. can any one help? I thought subviews are be bad for performance...
You need INDEX(id, mtype) (in either order). This should make the subqueries faster, hence the entire query faster.

GROUP BY with MAX date field - erratic results

Have a table containing form data. Each row contains a section_id and field_id. There are 50 distinct fields for each section. As users update an existing field, a new row is inserted with an updated date_modified. This keeps a rolling archive of changes.
The problem is that I'm getting erratic results when pulling the most recent set of fields to display on a page.
I've narrowed down the problem to a couple of fields, and have recreated a portion of the table in question on SQLFiddle.
Schema:
CREATE TABLE IF NOT EXISTS `cTable` (
`section_id` int(5) NOT NULL,
`field_id` int(5) DEFAULT NULL,
`content` text,
`user_id` int(11) NOT NULL,
`date_modified` datetime NOT NULL,
KEY `section_id` (`section_id`),
KEY `field_id` (`field_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
This query shows all previously edited rows for field_id 39. There are five rows returned:
SELECT cT.*
FROM cTable cT
WHERE
cT.section_id = 123 AND
cT.field_id=39;
Here's what I'm trying to do to pull the most recent row for field_id 39. No rows returned:
SELECT cT.*
FROM cTable cT
INNER JOIN (
SELECT field_id, MAX(date_modified) AS date_modified
FROM cTable GROUP BY field_id
) AS max USING (field_id, date_modified)
WHERE
cT.section_id = 123 AND
cT.field_id=39;
Record Count: 0;
If I try the same query on a different field_id, say 54, I get the correct result:
SELECT cT.*
FROM cTable cT
INNER JOIN (
SELECT field_id, MAX(date_modified) AS date_modified
FROM cTable GROUP BY field_id
) AS max USING (field_id, date_modified)
WHERE
cT.section_id = 123 AND
cT.field_id=54;
Record Count: 1;
Why would same query work on one field_id, but not the other?
In your subquery from where you are getting maxima you need to GROUP BY section_id,field_id using just GROUP BY field_id is skipping the section id, on which you are applying filter
SELECT cT.*
FROM cTable cT
INNER JOIN (
SELECT section_id,field_id, MAX(date_modified) AS date_modified
FROM cTable GROUP BY section_id,field_id
) AS max
ON(max.field_id =cT.field_id
AND max.date_modified=cT.date_modified
AND max.section_id=cT.section_id
)
WHERE
cT.section_id = 123 AND
cT.field_id=39;
See Fiddle Demo
You are looking for the max(date_modified) per field_id. But you should look for the max(date_modified) per field_id where the section_id is 123. Otherwise you may find a date for which you find no match later.
SELECT cT.*
FROM cTable cT
INNER JOIN (
SELECT field_id, MAX(date_modified) AS date_modified
FROM cTable
WHERE section_id = 123
GROUP BY field_id
) AS max USING (field_id, date_modified)
WHERE
cT.section_id = 123 AND
cT.field_id=39;
Here is the SQL fiddle: http://www.sqlfiddle.com/#!2/0cefd8/19.

Is there a more efficent way to write this query?

Ok imagine the following DB structure
USERS:
id | name | company_id
1 John 1
2 Jane 1
3 Jack 2
4 Jill 3
COMPANIES:
id | name
1 CompanyA
2 CompanyB
3 CompanyC
4 CompanyD
First I want to SELECT all the companies that have more than one user
SELECT
`c`.`name`
FROM `companies` AS `c`
LEFT JOIN `users` AS `u` ON `c`.`id` = `u`.`company_id`
GROUP BY `c`.`id`
HAVING COUNT(`u`.`id`) > 1
Easy enough. Now I want to SELECT all the users that belong to a company that has more than one user. I have this combined query but I think this is not efficent
SELECT * FROM `users` WHERE `company_id` = (
SELECT
`c`.`id`
FROM `companies` AS `c`
LEFT JOIN `users` AS `u` ON `c`.`id` = `u`.`company_id`
GROUP BY `c`.`id`
HAVING COUNT(`u`.`id`) > 1
)
Basically I take the id returned from the first query (companies that have more than 1 user) and then query the users table to find all users with that company.
Why not
SELECT * FROM users u GROUP BY u.company_id HAVING COUNT(u.id) > 1
You don't really need any information from the companies table according to the data you say needs returning. "Now I want to SELECT all the users that belong to a company that has more than one user."
try this:
SELECT u.id,u.name,u.company_id FROM users u
inner join companies c on u.company_id = c.id
group by c.id
having count(u.id) > 1
Simplest way to get the users only is probably to keep the subquery but eliminate the join; since it's not a correlated subquery, it should be fairly efficient (obviously an index on company_id helps here);
SELECT u.* FROM USERS u WHERE company_id IN (
SELECT company_id FROM USERS GROUP BY company_id HAVING COUNT(*)>1
);
You could for example rewrite it as a LEFT JOIN, but I suspect it will actually be less efficient since you'd most likely need to use a DISTINCT when using a JOIN;
SELECT DISTINCT u.*
FROM USERS u
LEFT JOIN USERS u2
ON u.company_id=u2.company_id AND u.id<>u2.id
WHERE u2.id IS NOT NULL;
An SQLfiddle to test both.
Try also a semi-join query:
SELECT *
FROM users u
WHERE EXISTS (
SELECT null FROM users u1
WHERE u.company_id=u1.company_id
AND u.id <> u1.id
)
demo --> http://www.sqlfiddle.com/#!2/12dc34/2
Assumming that id is a primary key column, creating an index on company_id column gives better performance.
If you are really obsessed with the performance of this query, create a composite index on columns company_id + id:
CREATE INDEX very_fast ON users( company_id, id );
Could you try this?
SELECT users.*
FROM users INNER JOIN
(
SELECT company_id
FROM users
GROUP BY company_id
HAVING COUNT(*) > 1
) x USING(company_id);
You should have an index INDEX(company_id)
Peformance Test
I have tested 3 queries in answers.
Q1 = sub-query (with GROUP BY) and INNER JOIN
Q2 = LEFT JOIN and IS NOT NULL
Q3 = EXISTS
All queries return same result. Test was done with TPC-H lineitem table. And The problem is "find lineitem have more than 1 item"
Test Results
It depends on what you want is retrieving FIRST N row or entire rows.
Q1 (get FIRST 10K rows) : 2.85 sec
Q2 (get FIRST 10K rows) : 0.03 sec
Q3 (get FIRST 10K rows) : 0.03 sec
Q1 (get all rows) : 8.19 sec
Q2 (get all rows) : 34.12 sec
Q3 (get all rows) : 29.54 sec
Schema and DATA
mysql> SELECT SQL_NO_CACHE COUNT(*) FROM lineitem\G
*************************** 1. row ***************************
COUNT(*): 11997996
1 row in set (1.68 sec)
mysql> SHOW CREATE TABLE lineitem\G
*************************** 1. row ***************************
Table: lineitem
Create Table: CREATE TABLE `lineitem` (
`l_orderkey` int(11) NOT NULL,
`l_partkey` int(11) NOT NULL,
`l_suppkey` int(11) NOT NULL,
`l_linenumber` int(11) NOT NULL,
`l_quantity` decimal(15,2) NOT NULL,
`l_extendedprice` decimal(15,2) NOT NULL,
`l_discount` decimal(15,2) NOT NULL,
`l_tax` decimal(15,2) NOT NULL,
`l_returnflag` char(1) NOT NULL,
`l_linestatus` char(1) NOT NULL,
`l_shipDATE` date NOT NULL,
`l_commitDATE` date NOT NULL,
`l_receiptDATE` date NOT NULL,
`l_shipinstruct` char(25) NOT NULL,
`l_shipmode` char(10) NOT NULL,
`l_comment` varchar(44) NOT NULL,
PRIMARY KEY (`l_orderkey`,`l_linenumber`),
KEY `l_orderkey` (`l_orderkey`),
KEY `l_partkey` (`l_partkey`,`l_suppkey`),
CONSTRAINT `lineitem_ibfk_1` FOREIGN KEY (`l_orderkey`) REFERENCES `orders` (`o_orderkey`),
CONSTRAINT `lineitem_ibfk_2` FOREIGN KEY (`l_partkey`, `l_suppkey`) REFERENCES `partsupp` (`ps_partkey`, `ps_suppkey`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
Queries
Q1 FIRST 10K
SELECT SQL_NO_CACHE DISTINCT u.l_orderkey, u.l_linenumber
FROM lineitem u INNER JOIN
(
SELECT l_orderkey
FROM lineitem
GROUP BY l_orderkey
HAVING COUNT(*) > 1
) x USING (l_orderkey)
LIMIT 10000;
Q2 FIRST 10K
SELECT SQL_NO_CACHE DISTINCT u.l_orderkey, u.l_linenumber
FROM lineitem u
LEFT JOIN lineitem u2
ON u.l_orderkey=u2.l_orderkey AND u.l_linenumber<>u2.l_linenumber
WHERE u2.l_linenumber IS NOT NULL
LIMIT 10000;
Q3 FIRST 10K
SELECT SQL_NO_CACHE DISTINCT u.l_orderkey, u.l_linenumber
FROM lineitem u
WHERE EXISTS (
SELECT null FROM lineitem u1
WHERE u.l_orderkey=u1.l_orderkey
AND u.l_linenumber <> u1.l_linenumber
)
LIMIT 10000;
retrieve entire rows
Q1 ALL
SELECT SQL_NO_CACHE COUNT(*)
FROM lineitem u INNER JOIN
(
SELECT l_orderkey
FROM lineitem
GROUP BY l_orderkey
HAVING COUNT(*) > 1
) x USING (l_orderkey);
Q2 ALL
SELECT SQL_NO_CACHE COUNT(*)
FROM lineitem u
LEFT JOIN lineitem u2
ON u.l_orderkey=u2.l_orderkey AND u.l_linenumber<>u2.l_linenumber
WHERE u2.l_linenumber IS NOT NULL;
Q3 ALL
SELECT SQL_NO_CACHE COUNT(*)
FROM lineitem u
WHERE EXISTS (
SELECT null FROM lineitem u1
WHERE u.l_orderkey=u1.l_orderkey
AND u.l_linenumber <> u1.l_linenumber
);

mysql counting with subquery rewriting problem

(query updated for cdhowie comments)
Here's the situation.
I want to "count the number of tasks assigned to each worker within kind 1,2 of task location AND kind 3,4 of task department".
Suppose I have the following tables
Task : id, Name
Task_Worker_Combi : Task_id, Worker_id
Worker : id, Name
Task_Location_Combi : Task_id, Location_id
Task_Department_Combi : Task_id, Department_id
Location : id, Name
Department : id, Name
I got as far as the following:(however since it takes forever there must be something wrong with the query)
SELECT W.id, W.Name, COUNT(TWC.Task_id) AS Count
FROM Worker AS W
LEFT JOIN Task_Worker_Combi AS TWC
ON (W.id=TWC.Worker_id)
WHERE W.id>0 AND TWC.Task_id IN
(
SELECT T.id
FROM Task as T
LEFT JOIN (Task_Location_Combi AS TLC, Task_Department_Combi AS TDC)
ON (T.id=TLC.Task_id AND T.id=TDC.Task_id)
WHERE 1 AND TLC.Location_id IN (1,2) AND TDC.Department_id IN (3,4)
GROUP BY T.id
)
GROUP BY W.id
ORDER BY W.Name
Without this subquery it returns "the number of tasks assigned to each worker unconditionally" fine.
AND TWC.Task_id IN
(
SELECT T.id
FROM Task as T
LEFT JOIN (Task_Location_Combi AS TLC, Task_Department_Combi AS TDC)
ON (T.id=TLC.Task_id AND T.id=TDC.Task_id)
WHERE 1 AND TLC.Location_id IN (1,2) AND TDC.Department_id IN (3,4)
GROUP BY T.id
)
Where went wrong, and how would you rewrite this query to efficiently work? Please help me somebody. I'm stuck here for over a week now!
The actual query is the following. (assume Job as Task and Worker as Industry from above simplified query)
EXPLAIN SELECT id, Name, COUNT( J.Industry ) AS Count
FROM industry_db.industry AS I
LEFT JOIN job_db.industry AS J ON ( I.id = J.Industry )
WHERE I.id >0
AND J.Job
IN (
SELECT t1.id
FROM job_db.job AS t1
LEFT JOIN (
company_db.company AS t2, job_db.industry AS t3, location_db.city AS t4, job_db.function AS t5, job_db.tag AS t6, job_db.degree AS t7, location_db.state AS t8, location_db.group AS t9
) ON ( t1.Company = t2.id
AND t1.id = t3.Job
AND t1.City = t4.id
AND t1.id = t5.Job
AND t1.id = t6.Job
AND t1.id = t7.Job
AND t1.State = t8.id
AND t1.State_Group = t9.id )
WHERE 1
AND t1.Open = '1'
GROUP BY t1.id)
GROUP BY id
HAVING Count >0
ORDER BY Name
And the Explain result from phpmyadmin is the following.
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY I range PRIMARY,id PRIMARY 1 NULL 39 Using where; Using temporary; Using filesort
1 PRIMARY J ref Industry Industry 1 industry_db.I.id 403 Using where
2 DEPENDENT SUBQUERY t1 index NULL PRIMARY 4 NULL 2868 Using where
2 DEPENDENT SUBQUERY t9 eq_ref PRIMARY PRIMARY 1 job_db.t1.State_Group 1 Using index
2 DEPENDENT SUBQUERY t2 eq_ref PRIMARY PRIMARY 2 job_db.t1.Company 1 Using index
2 DEPENDENT SUBQUERY t8 eq_ref PRIMARY PRIMARY 1 job_db.t1.State 1 Using index
2 DEPENDENT SUBQUERY t4 eq_ref PRIMARY PRIMARY 4 job_db.t1.City 1 Using index
2 DEPENDENT SUBQUERY t7 ref PRIMARY PRIMARY 4 job_db.t1.id 1 Using index
2 DEPENDENT SUBQUERY t3 ref Job Job 4 job_db.t1.id 1 Using index
2 DEPENDENT SUBQUERY t5 ref PRIMARY PRIMARY 4 job_db.t7.Job 1 Using index
2 DEPENDENT SUBQUERY t6 ref PRIMARY PRIMARY 4 job_db.t7.Job 2 Using index
Try this:
SELECT w.id, w.Name, COUNT( tw.Task_id )
FROM Worker AS w
LEFT JOIN Task_Worker_Combi AS tw
ON(
w.id = tw.Worker_id AND
EXISTS( SELECT Task_id FROM Task_Location_Combi
WHERE Task_id = tw.TaskId AND Location_id IN(1, 2) ) AND
EXISTS( SELECT Task_id FROM Task_Department_Combi
WHERE Task_id = tw_TaskId AND Department_id IN(3, 4) )
)