sql query chokes the server - mysql

When I run this query mysql server cpu usages stays at 100% and chokes the server. What am I doing wrong?
SELECT *
FROM projects p, orders o, invoices i
WHERE p.project_state = 'product'
AND (
p.status = 'expired'
OR p.status = 'finished'
OR p.status = 'open'
)
AND p.user_id = '12'
AND i.projectid =0
GROUP BY i.invoiceid
LIMIT 0 , 30

You are including the orders table but not joining to it. This will make a full cross join that can potentially produce millions of rows.

Use EXPLAIN to find out the query plan. From that you can work out what indexes will be required. Those indexes will vastly improve performance.
Also you are not limiting the orders in any way.

You didn't put any joins on the tables. I believe by default that will do a cross join. That means if you have 1000 projects, 100,000 orders and 100,000 invoices the resultset will be 1,000,000,000,000 (1 trillion) records.
You probably want to put some inner joins between those tables.

Related

MySQL slow left join on sort

I have millions of customers and when I use left join and then I sort by a column it takes 4-5sec here is my query:
SELECT c.id AS id, o.description AS office_description, ... , d.type AS document_type, d.number AS document_number
FROM customers c INNER JOIN offices o ON (c.id_office = o.id)
INNER JOIN company cp ON (o.id_company = cp.id)
LEFT JOIN documents d ON (C.id = d.id_customer)
WHERE c.archive = 0
ORDER BY office_description
LIMIT 10
So when I remove documents columns in my SELECT the query is very fast.
Here is the query explain :
I have 1 millions customers and other tables I have only 1 row (for company / office / documents)
I set index on c.archive / o.description and primary keys / foreigns keys ofc. Here is the structures of these tables: http://sqlfiddle.com/#!9/a222f9
So I tried to build my query like this:
SELECT A.*, d.*
FROM (
SELECT c.id AS id, o.description AS office_description, ...
FROM customers c INNER JOIN offices o ON (c.id_office = o.id)
INNER JOIN company cp ON (o.id_company = cp.id)
WHERE c.archive = 0
ORDER BY o.description
LIMIT 10
) A LEFT JOIN documents d ON (A.id = d.id_customer)
And now, wow, it's very fast.
But I don't know if it's the best way to reduce the lag and if I'm doing wrong. I'd like to know if you know a better way to do that.
I hope there is an easier way because it will be complicated to use this query in my Phalcon project
An explanation...
Your faster query can find the 10 rows before looking in documents. So, it needs only 10 probes into that table.
In the original query, the Optimizer was not too smart. It planned to execute the query as if there were no LIMIT. Instead, it decided to optimizer the join to documents by fetching the entire table into the "join buffer" into RAM and built a hash index into it. While this would help some queries like yours, it was a big waste for the mere 10 rows that you needed.
So, your reformulation convinced the Optimizer to do it a better way.
If you had needed only one column from d, there is another way:
SELECT ...,
( SELECT col FROM d WHERE ... ) AS col,
... ((without the LEFT JOIN at all))
As for an "easier" way, especially one that can be reverse-engineered into some 3rd package, I doubt it. (Packages tend to be cruxes for getting started in databases. As you are finding out, you eventually need to learn more than they can teach you.)
A separate inefficiency:
WHERE c.archive = 0
ORDER BY o.office_description
LIMIT ...
If the archived rows had been removed from c, then the optimal execution would be to find the first 10 rows of o. Instead it must do a lengthy JOIN before sorting and limiting. (This is a common problem with "soft deletes". Neither MySQL nor the 3rd party package can optimize it.)

MySQL group by + order by incomprehensibly slow

I am not sure what is happening, but MySQL should handle this just fine in my opinion.
I have SQL like this.
SELECT u.id AS user_id, SUM(t.amount) AS total
FROM user u
INNER JOIN transaction t ON t.user_id = u.id
WHERE u.condition = true
GROUP BY u.id
ORDER BY total DESC;
This query runs for 10 seconds.
If I remove ORDER BY clause, the time is around 4 seconds.
Tables are very large but after GROUP BY I have only 40 rows. Does it really take 6 seconds to sort 40 rows? I would say this should be handled by the optimizer.
However if I run the query like this:
SELECT *
FROM (
SELECT u.id AS user_id, SUM(t.amount) AS total
FROM user u
INNER JOIN transaction t ON t.user_id = u.id
WHERE u.condition = true
GROUP BY u.id
) data
ORDER BY total DESC;
This query runs for 4 seconds. I understand I forced MySQL to sort only the 40 records retrieved from inner select.
I really do not understand one thing. MySQL cannot sort by total before GROUP BY.. So what is slowing the query so much?
In this case I can use the second query, but if I had another inner SQL, MySQL would start creating temporary tables and it would kill the performance maybe more than ORDER BY. Another "problem" is that I use ORM and using raw SQL is really painful.
Thanks for suggestions.
EDIT:
Execution plan with ORDER BY
Execution plan without ORDER BY
I can see in execution plan, there is additional filesort + temporary when using ORDER BY..

Related products approach suggestion, MySQL InnoDB/PHP

I want to expand UI on my CodeIgniter shop with suggestions on what other people bought with the current product (either when viewing product or when product is put in the cart, irrelevant now for the question).
I have came up with this query (orders table contains order details, while order items contains products that are in specific order via foreign key, prd alias is for products table where all important info about prduct is stored).
Query looks like this
SELECT
pr.product_id,
COUNT(*) AS num,
prd.*
FROM
orders AS o
INNER JOIN order_items AS po ON o.id = po.order_id
INNER JOIN order_items AS pr ON o.id = pr.order_id
INNER JOIN products AS prd ON pr.product_id = prd.id
WHERE
po.product_id = '14211'
AND pr.product_id <> '14211'
GROUP BY
pr.product_id
ORDER BY
num DESC
LIMIT 3
It works nice and dandy, query time is 0.030ish seconds and it returns the products that bought together with the one I am currently viewing.
As for the questions and considerations, Percona query analyzer complains about this two things, Non-deterministic GROUP BY and GROUP BY or ORDER BY on different tables, which both I need so that I can get items on top that are actually relevant for the related query, but absolutely have no idea how to fix it, or even should I be really bothered with this notice from query analyzer.
Second question is regarding performace, since for this query, it using temporary and filesort, I was thinking of creating a view out of this query, and use it instead of actually executing the query each time some product is opened.
Mind you that I am not asking for CI model/view/controller tips, just tips on how to optimize this query, and/or suggestions regarding performance and going for views approach...
Any help is much than appreciated.
SELECT p.num, prd.*
FROM
(
SELECT a.product_id, COUNT(*) AS num
FROM orders AS o
INNER JOIN order_items AS b ON o.id = b.order_id
INNER JOIN order_items AS a ON o.id = a.order_id
WHERE b.product_id = '14211'
AND a.product_id <> '14211'
GROUP BY a.product_id
ORDER BY num DESC
LIMIT 3
) AS p
JOIN products AS prd ON p.product_id = prd.id
ORDER BY p.num DESC
This should
Run faster (especially as your data grows),
Avoid the group by complaint,
not over-inflate the count,
etc
Ignore the complaint about GROUP BY and ORDER BY coming from different tables -- that is a performance issue; you need it.
As for translating that back to CodeIgniter, good luck.

JOINS instead of Sub-SELECT

Is it true that SUBSELECTs are less performant than JOINs?
I got this query
SELECT categories_id,
products_id
FROM products_to_categories a
WHERE date_added = (
SELECT MIN(date_added)
FROM products_to_categories b
WHERE a.products_id = b.products_id
)
AND categories_id != 0
GROUP BY products_id
and would like to change it into a query with JOIN.
Is it true that SUBSELECTs are less performant than JOINs?
Possibly. This depends entirely on the query in question. Many constructs that are frequently implemented with a subquery, which can just as easily be achieved with a join, are actually executed as a join internally by the query optimizer... in database systems with an enterprise grade query optimizer, like SQL Server and Oracle. MySQL's query optimizer is notably worse at these kinds of optimizations, you'd have to look into the explain output to see whether it is smart enough for your specific case or not. It could even decide not to apply this optimization even if it sees it, just because system load is sufficiently low that optimization would be slower than just executing the slower version.
Even if it is executed as a subquery, it depends on the query itself and the system load. A subquery might cause a quicker lock escalation, potentially causing table locks and thus slower execution in the case of more simultaneous queries on the same table. Without concurrency, extra locks don't cause noticeable extra slowdowns.
In general, try to use joins whenever possible instead of subqueries, but don't overdo it - subqueries usually perform perfectly fine and the query optimizer will do a good job of keeping the server alive. But also keep in mind that MySQL isn't exactly an 'enterprise grade RDBMS' and as such can be rather dumb in its optimizations.
SELECT DISTINCT a.products_id,
b.MinDate
FROM products_to_categories a
JOIN (SELECT b.products_id,
MIN(b.date_added) AS MinDate
FROM products_to_categories b
GROUP BY b.products_id ) AS B
ON a.products_id = b.products_id
AND a.date_added = b.MinDate
WHERE a.categories_id != 0
Switching this to a join without a subquery or aggregation is not obvious.
The idea is to do a left outer join with a condition on the date_added condition. When this condition does not match, then you have the minimum:
SELECT categories_id, products_id
FROM products_to_categories a left outer join
products_to_categories b
on a.products_id = b.products_id and
b.date_added < a.date_added
WHERE b.date_added is null and a.categories_id != 0;
Select products_to_catergoriesa.categories_id,
products_to_catergoriesa.products_id, min(products_to_categories b.date_added)
from products_to_categories a
join products_to_categories b
on products_to_categories b.products_id = products_to_categories a.product_id
where [table_name_here].catergory_id !=0
Yes, subqueries are more process-intensive because every query around the subquery needs to wait until that subquery is finished processing. This is not necessarily the case with Joins.
Do you need help with the syntax of Joins? Or was my answer all you needed?
Here's what you're looking for:
SELECT a.categories_id,
a.products_id
FROM products_to_categories a
LEFT JOIN products_to_categories b
ON a.products_id = b.products_id
WHERE a.date_added = MIN(b.date_added)
AND a.categories_id != 0
GROUP BY a.products_id, a.categories_id

SQL Nested query interpereted as correlated incorrectly

I've got a serious problem with a nested query, which I suspect MySQL is interpreting as a correlated subquery when in fact it should be uncorrelated. The query spans two tables, one being a list of products and the other being their price at various points in time. My aim is to return each price record for products that have a price range above a certain value for the whole time. My query looks like this:
SELECT oP.id, oP.title, oCR.price, oC.timestamp
FROM Crawl_Results AS oCR
JOIN Products AS oP
ON oCR.product = oP.id
JOIN Crawls AS oC
ON oCR.crawl = oC.id
WHERE oP.id
IN (
SELECT iP.id
FROM Products AS iP
JOIN Crawl_Results AS iCR
ON iP.id = iCR.product
WHERE iP.category =2
GROUP BY iP.id
HAVING (
MAX( iCR.price ) - MIN( iCR.price )
) >1
)
ORDER BY oP.id ASC
Taken alone, the inner query executes fine and returns a list of the id's of the products with a price range above the criterion. The outer query also works fine if I provide a simple list of ids in the IN clause. When I run them together however, the query takes ~3min to return ~1500 rows, so I think it's executing the inner query for every row of the outer, which is not ideal. I did have the columns aliased the same in the inner and outer queries, so I thought that aliasing them differently in the inner and outer as above would fix it, but it didn't.
Any ideas as to what's going on here?
MySQL might think it could use indexes to execute the query faster by running it once for every OP.id. The first thing to check is if your statistics are up to date.
You could rewrite the where ... in as a filtering inner join. This is less likely to be "optimized" for seeks:
SELECT *
FROM Crawl_Results AS oCR
JOIN Products AS oP
ON oCR.product = oP.id
JOIN Crawls AS oC
ON oCR.crawl = oC.id
JOIN (
SELECT iP.id
FROM Products AS iP
JOIN Crawl_Results AS iCR
ON iP.id = iCR.product
WHERE iP.category =2
GROUP BY
iP.id
HAVING (MAX(iCR.price) - MIN(iCR.price)) > 1
) filter
ON OP.id = filter.id
Another option is to use a temporary table. You store the result of the subquery in a temporary table and join on that. That really forces MySQL not to execute the subquery as a correlated query.