I'm unfamiliar with MySQL, and I don't know where to begin with solving this query issue.
SELECT *
FROM `rmedspa`.`clients` c
inner join `rmedspa`.`tickets` t on c.clientid = t.clientid
where c.fldclass is not null
AND t.ticketID > 0
This query returns just fine in MySQL Workbench, in 30 seconds, and the IDE is limiting the query results to 1000 records. The database is not on my own machine, but on a server that is in a different location (in other words, it's going out to the internet and it's slow). If I add an order by at the end, the query never returns.
SELECT *
FROM `rmedspa`.`clients` c
inner join `rmedspa`.`tickets` t on c.clientid = t.clientid
where c.fldclass is not null
AND t.ticketID > 0
ORDER BY t.ticketid
There are "many" tickets for 1 client. t.ticketid is an int. clientid is an int, too.
I don't know where to begin to find out why the ORDER BY is causing this query to never return. It doesn't fail, it just doesn't return.
This Post is a short review of previous comments plus some hints on SQL-Query for you.
Query building
Whenever the column names of a join condition match, you can rewrite the join statement as follows:
INNER JOIN {database}.{table} t USING({joinColumn})
Check your indexes - in your case: Are the columns of the join condition and the where statement indexed?
mySQL result
The ORDER BY statement often requires a reordering of the result set in memory. If you expected large results (many rows) and/or large rows (many columns) but see no result the mySQL Server is not properly configured for that case - How MySQL Uses Memory, Optimizing Queries with EXPLAIN
Related
We are checking if we can upgrade our project database from MySQL 5.7 to v.8. The system is 7 years old and has tons of code... Today we got a slightly strange bug which did not appear on 5.7 (I wonder why). The buggy request is the following:
SELECT TableA.Amount, SUM(TableB.Amount) AS Amount2
FROM
TableA LEFT JOIN TableB ON TableA.ReservID = TableB.ReservID
WHERE
TableB.InvoiceID IS NULL
AND TableB.InvoiceStatusID = 2
AND TableB.PersonID = 389
AND TableB.PersonTypeID = 1
AND TableA.ReservID = 4657;
There is one record in TableA and no records in TableB for the given conditions.
I know that WHERE conditions are applied after joining the tables. So it is not a suprise for me that the query return NULL, NULL on MySQL8. But our developer (who's still sure that this query is Ok) just showed me that it returns 67667.65, NULL on MySQL 5.7!
So I got 2 questions at ones. 1. Why it works on 5.7 when all data must be filtered out by the WHERE conditions on non-existent (all null in joint table) Table2 fields? 2. Is there a way to make MySQL8 work in the same 'tolerant' way as I am sure there are many such 'genius' queries all over our old code?
The problem in your query is not the (left) join. While it makes it less clear to the reader that your left join is treated as a join, having the comparisons in the where clause is completely valid sql. Every database will treat your left join correctly as a join, and I don't think that MySQL (5.7 or 8.0) would give you a different result if you replace left join with a join, as the internal representation would not change.
Your query has a problem with the aggregation. select colA, sum(colB) without using group by colA will leave the value of colA unclear, see MySQL Handling of GROUP BY:
SELECT name, MAX(age) FROM t;
Without GROUP BY, there is a single group and it is nondeterministic which name value to choose for the group.
MySQL is about the only database system that will even allow you to run this query, a very special behaviour that generates a lot of questions on stackoverflow. Most other databases will complain about that column listed in the select - exactly for the reason you face right now: they don't really know what to return.
So the value you get for tablea.amount is basically random. While it usually depends on how MySQL executes the query internally (so it could depend on some optimization setting), it looks unlikely that you can convince MySQL 8 to return the number value there - and especially to make sure it is consistent in all similar queries you may have.
And I want to emphasize: the value null in MySQL 8 is also not deterministic - it could be something different.
To make a proper query, use an aggregate function for tablea.amount too, and depending on your requirements, fix your left join, e.g.
SELECT MAX(TableA.Amount) AS Amount,
SUM(TableB.Amount) AS Amount2
FROM TableA LEFT JOIN TableB
ON TableA.ReservID = TableB.ReservID
AND TableB.InvoiceID IS NULL
AND TableB.InvoiceStatusID = 2
AND TableB.PersonID = 389
AND TableB.PersonTypeID = 1
WHERE TableA.ReservID = 4657
This should give you the behaviour from MySQL 5.7, e.g. <value for amount>, null. If you use join, you should get null, null. Both cases will be deterministic.
Although using just SELECT TableA.Amount, SUM(...) ... LEFT JOIN ... (with a proper left join) will return the amount instead of null for MySQL 8 too, it is still not valid sql! MySQL will only allow it because ReservID = 4657 limits it to a single row in TableA using the primary key. So if you have to check all queries anyway, fix it properly.
i dont know the reason, why its working on 5.7. but to get your expected result you can do:
SELECT
TableA.Amount,
SUM(TableB.Amount) AS Amount2
FROM TableA
LEFT JOIN TableB
ON TableA.ReservID = TableB.ReservID
AND TableB.InvoiceID IS NULL
AND TableB.InvoiceStatusID = 2
AND TableB.PersonID = 389
AND TableB.PersonTypeID = 1
WHERE TableA.ReservID = 4657;
A MySQL database contains two tables: customer and custmomer_orders
The customer table contains 80 million entries and contains 80 fields. Some of them I am interested in:
Id (PK, int(10))
Location (varchar 255, nullable).
Registration_Date (DateTime, nullable). Indexed.
The customer_orders table constains 40 million entries and contains only 3 fields:
Id (PK, int(10))
Customer_Id (int(10), FK to customer table)
Order_Date (DateTime, nullable)
When I run such query, it takes ~800 seconds to execute and returns 40 million entries:
SELECT o.*
FROM customer_orders o
LEFT JOIN customer c ON (c.Id = o.Customer_Id)
WHERE NOT (ISNULL(c.Location)) AND c.Registration_Date < '2018-01-01 00:00:00';
Machine with MySQL server has 32GB of RAM, 28GB assigned to MySQL.
MySQL version: 5.6.39.
Is it normal for MySQL to execute such query for this amount of time on the tables with such amount of records?
How can I improve the performance?
Update:
The customer_orders table does not contain any vital data we would like to store. It is some kind of copied table with orders made within last 10 days.
Every day we run a stored procedure, which deletes orders older than 10 days in scope of a transaction.
In some moment of time, this stored procedure ended up with a timeout due to not optimized query, and number of orders was growing every day.
Previous query contained also COUNT method, which, I suppose, exceeded the timeout.
Nevertheless, it surprised me, that it can take up to 15 minutes for MySQL to fetch 40m of records with additional conditions.
I think it's normal. It would be helpful if you share what explain returns for that query.
In order to optimize the query, it might not be a good idea to start with customer_orders, as you are not filtering it in anyway (so it's performing a full table scan over 40M records). Also, as pointed in the comments, a LEFT JOIN is not needed here.
I would write your query like this:
SELECT o.*
FROM customers c, customer_orders o
WHERE c.id = o.Customer_Id
AND c.Location IS NOT NULL
AND c.Registration_Date < '2018-01-01'
This will (depending on how many records satisfy the clause Registration_Date < '2018-01-01') filter the customers table first and then join with the customer_orders table which has and index by customer_id
Also, maybe not related but, is it normal for you that the query returns 40M records? I mean, it's like the whole customer_orders table. If I am right that means that all orders are from customer registered before '2018-01-01'
This is to long for a comment...
The first thing to note about your query is that it is not actually performing a LEFT JOIN, since it has conditions in the WHERE clause that refer to the LEFT JOINed table.
It could be rewritten as :
SELECT o.*
FROM customer_orders o
INNER JOIN customer c
ON c.Id = o.Customer_Id
AND c.Location is NOT NULL
AND c.Registration_Date < '2018-01-01 00:00:00';
Being explicit about the join type is better for readability and may help MySQL to find a better execution path for the query.
When it comes to performance, the basic advice is that, for this query, you would need a compound index on all three columns being searched, in the same sequence as the one being used in the query (usually, you want to put the more restrictive condition at the beginning, so you might want to adjust this) :
ALTER TABLE mytable ADD INDEX (Id, Location, Registration_Date );
For more advices on performance, you might want to update your question with the CREATE TABLE statements of your tables and the execution plan of your query.
If my comment, and GMB's answer don't end up helping performance much; you can always try writing the query with a different approach. I usually prefer joins to subqueries, but occasionally they turn out to be the best option for the data being handled.
Since you've said the customers table is relatively large compared to the orders table, this could be one of those situations.
SELECT o.*
FROM customer_orders AS o
WHERE o.Customer_Id IN (
SELECT Id
FROM customer
WHERE Location IS NOT NULL
AND Registration_Date < '2018-01-01 00:00:00'
);
I wanted to put a comment, but changed my mind to go with answer.
Because main issue is your question itself.
I don't know how many columns your customer_orders has, but if you are getting
40 million entries
back. I would say you are doing something wrong.
And probably that is not the query itself is slow, but data fetching.
To prove that try to execute EXPLAIN against your query:
EXPLAIN SELECT ...your query here... ;
Then execute
EXPLAIN SELECT ...your query here... LIMIT 1;
Try to LIMIT your results to 1000 for example:
SELECT ...your query here... LIMIT 1000;
When you have answers, outputs and stats for these queries we can discuss your following steps.
I had an issue with a SQL query that was slow and lead me to a 500 Internal Server Error.
After playing a bit with the query I had found out that moving join conditions outside makes the query work and the error vanish.
Now I wonder:
is the second query is equivalent to the first one ?
what was my bug mistake, why it was so slow ?
Here the queries.
Original (slow) SQL:
SELECT *
FROM `TABLE_1` AS `main_table`
INNER JOIN `TABLE_2` AS `w`
ON main_table.entity_id = w.product_id
LEFT JOIN `TABLE_3` AS `ur`
ON main_table.entity_id = ur.product_id
AND ur.category_id IS NULL
AND ur.store_id = 1
AND ur.is_system = 1
WHERE (w.website_id = '1')
Faster SQL:
SELECT *
FROM `TABLE_1` AS `main_table`
INNER JOIN `TABLE_2` AS `w`
ON main_table.entity_id = w.product_id
LEFT JOIN `TABLE_3` AS `ur`
ON main_table.entity_id = ur.product_id
WHERE (w.website_id = '1')
AND ur.category_id IS NULL
AND ur.store_id = 1
AND ur.is_system = 1
Is the 2nd query equivalent to the 1st?
Depending on the data in TABLE_3, the 2nd query is NOT equivalent to the first, since you're using a LEFT JOIN.
Consider what happens when you have a record in TABLE_1, with an entity_id that does not match any rows in TABLE_3: Your first query would still return this record from TABLE_1, with NULL values for all the columns of TABLE_3. The 2nd query applies filters to the columns from TABLE_3, but since these are all NULLs, the record now gets filtered out.
In general, query 2 would thus return fewer records than query 1 (unless, of course, all your entity_id's match a product_id from TABLE_3).
What was my bug mistake, why it was so slow?
Both queries are valid SQL, so none of them should give you a 500 Internal Server Error. This does not even look like a MySQL error, so maybe the bug arose elsewhere? Perhaps you have a web application that is unable to handle the NULL's returned from the 1st query? It's impossible to answer the question without more details about the error.
As for the speed of the queries, this depends a lot on what indexes are defined. Generally, I would not expect the 2nd query to be faster than the 1st, but that depends. As I mentioned above, the 1st query might return more records than the 2nd. If you're not querying the database directly, but instead through some application, could it be the application slowing down your query?
I have an SQL query(see below) that returns exactly what I need but when ran through phpMyAdmin takes anywhere from 0.0009 seconds to 0.1149 seconds and occasionally all the way up to 7.4983 seconds.
Query:
SELECT
e.id,
e.title,
e.special_flag,
CASE WHEN a.date >= '2013-03-29' THEN a.date ELSE '9999-99-99' END as date
CASE WHEN a.date >= '2013-03-29' THEN a.time ELSE '99-99-99' END as time,
cat.lastname,
FROM e_table as e
LEFT JOIN a_table as a ON (a.e_id=e.id)
LEFT JOIN c_table as c ON (e.c_id=c.id)
LEFT JOIN cat_table as cat ON (cat.id=e.cat_id)
LEFT JOIN m_table as m ON (cat.name=m.name AND cat.lastname=m.lastname)
JOIN (
SELECT DISTINCT innere.id
FROM e_table as innere
LEFT JOIN a_table as innera ON (innera.e_id=innere.id AND
innera.date >= '2013-03-29')
LEFT JOIN c_table as innerc ON (innere.c_id=innerc.id)
WHERE (
(
innera.date >= '2013-03-29' AND
innera.flag_two=1
) OR
innere.special_flag=1
) AND
innere.flag_three=1 AND
innere.flag_four=1
ORDER BY COALESCE(innera.date, '9999-99-99') ASC,
innera.time ASC,
innere.id DESC LIMIT 0, 10
) AS elist ON (e.id=elist.id)
WHERE (a.flag_two=1 OR e.special_flag) AND e.flag_three=1 AND e.flag_four=1
ORDER BY a.date ASC, a.time ASC, e.id DESC
Explain Plan:
The question is:
Which part of this query could be causing the wide range of difference in performance?
To specifically answer your question: it's not a specific part of the query that's causing the wide range of performance. That's MySQL doing what it's supposed to do - being a Relational Database Management System (RDBMS), not just a dumb SQL wrapper around comma separated files.
When you execute a query, the following things happen:
The query is compiled to a 'parametrized' query, eliminating all variables down to the pure structural SQL.
The compilation cache is checked to find whether a recent usable execution plan is found for the query.
The query is compiled into an execution plan if needed (this is what the 'EXPLAIN' shows)
For each execution plan element, the memory caches are checked whether they contain fresh and usable data, otherwise the intermediate data is assembled from master table data.
The final result is assembled by putting all the intermediate data together.
What you are seeing is that when the query costs 0.0009 seconds, the cache was fresh enough to supply all data together, and when it peaks at 7.5 seconds either something was changed in the queried tables, or other queries 'pushed' the in-memory cache data out, or the DBMS has other reasons to suspect it needs to recompile the query or fetch all data again. Probably some of the other variations have to do with used indexes still being cached freshly enough in memory or not.
Concluding this, the query is ridiculously slow, you're just sometimes lucky that caching makes it appear fast.
To solve this, I'd recommend looking into 2 things:
First and foremost - a query this size should not have a single line in its execution plan reading "No possible keys". Research how indexes work, make sure you realize the impact of MySQL's limitation of using a single index per joined table, and tweak your database so that each line of the plan has an entry under 'key'.
Secondly, review the query in itself. DBMS's are at their fastest when all they have to do is combine raw data. Using programmatic elements like CASE and COALESCE are by all means often useful, but they do force the database to evaluate more things at runtime than just take raw table data. Try to eliminate such statements, or move them to the business logic as post-processing with the retrieved data.
Finally, never forget that MySQL is actually a rather stupid DBMS. It is optimized for performance in simple data fetching queries such as most websites require. As such it is much faster than SQL Server and Oracle for most generic problems. Once you start complicating things with functions, cases, huge join or matching conditions etc., the competitors are frequently much better optimized, and have better optimization in their query compilers. As such, when MySQL starts becoming slow in a specific query, consider splitting it up in 2 or more smaller queries just so it doesn't become confused, and do some postprocessing in PHP or whatever language you are calling with. I've seen many cases where this increased performance a LOT, just by not confusing MySQL, especially in cases where subqueries were involved (as in your case). Especially the fact that your subquery is a derived table, and not just a subquery, is known to complicate stuff for MySQL beyond what it can cope with.
Lets start that both your outer and inner query are working with the "e" table WITH a minimum requirement of flag_three = 1 AND flag_four = 1 (regardless of your inner query's (( x and y ) or z) condition. Also, your outer WHERE clause has explicit reference to the a.Flag_two, but no NULL which forces your LEFT JOIN to actually become an (INNER) JOIN. Also, it appears every "e" record MUST have a category as you are looking for the "cat.lastname" and no coalesce() if none found. This makes sense at it appears to be a "lookup" table reference. As for the "m_table" and "c_table", you are not getting or doing anything with it, so they can be removed completely.
Would the following query get you the same results?
select
e1.id,
e1.Title,
e1.Special_Flag,
e1.cat_id,
coalesce( a1.date, '9999-99-99' ) ADate,
coalesce( a1.time, '99-99-99' ) ATime
cat.LastName
from
e_table e1
LEFT JOIN a_table as a1
ON e1.id = a1.e_id
AND a1.flag_two = 1
AND a1.date >= '2013-03-29'
JOIN cat_table as cat
ON e1.cat_id = cat.id
where
e1.flag_three = 1
and e1.flag_four = 1
and ( e1.special_flag = 1
OR a1.id IS NOT NULL )
order by
IF( a1.id is null, 2, 1 ),
ADate,
ATime,
e1.ID Desc
limit
0, 10
The Main WHERE clause qualifies for ONLY those that have the "three and four" flags set to 1 PLUS EITHER the ( special flag exists OR there is a valid "a" record that is on/after the given date in question).
From that, simple order by and limit.
As for getting the date and time, it appears that you only want records on/after the date to be included, otherwise ignore them (such as they are old and not applicable, you don't want to see them).
The order by, I am testing FIRST for a NULL value for the "a" ID. If so, we know they will all be forced to a date of '9999-99-99' and time of '99-99-99' and want them pushed to the bottom (hence 2), otherwise, there IS an "a" record and you want those first (hence 1). Then, sort by the date/time respectively and then the ID descending in case many within the same date/time.
Finally, to help on the indexes, I would ensure your "e" table has an index on
( id, flag_three, flag_four, special_flag ).
For the "a" table, index on
(e_id, flag_two, date)
I'm having a serious problem with MySQL (innoDB) 5.0.
A very simple SQL query is executed with a very unexpected query plan.
The query:
SELECT
SQL_NO_CACHE
mbCategory.*
FROM
MBCategory mbCategory
INNER JOIN ResourcePermission as rp
ON rp.primKey = mbCategory.categoryId
where mbCategory.groupId = 12345 AND mbCategory.parentCategoryId = 0
limit 20;
MBCategory - contains 216583 rows
ResourcePermission - contains 3098354 rows.
In MBCategory I've multiple indexes (columns order as in index):
Primary (categoryId)
A (groupId,parentCategoryId,categoryId)
B (groupId,parentCategoryId)
In ResourcePermission I've multiple indexes (columns order as in index):
Primary - on some column
A (primKey).
When I look into query plan Mysql changes tables sequence and selects rows from ResourcePermission at first and then it joins the MBCategory table (crazy idea) and it takes ages. So I added STRAIGHT_JOIN to force the innodb engine to use correct table sequence:
SELECT
STRAIGHT_JOIN SQL_NO_CACHE
mbCategory.*
FROM
MBCategory
mbCategory
INNER JOIN ResourcePermission as rp
ON rp.primKey = mbCategory.categoryId
where mbCategory.groupId = 12345 AND mbCategory.parentCategoryId = 0
limit 20;
But here the second problem materialzie:
In my opinion mysql should use index A (primKey) on the join operation instead it performs Range checked for each record (index map: 0x400) and it again takes ages !
Force index doesn't help, mysql still performing Range checked for each record .
There are only 23 rows in the MBCategory which fulfill where criteria, and after join there are only 75 rows.
How can I make mysql to choose correct index on this operation ?
Ok,
elementary problem.
I owe myself a beer.
The system I'm recently tunning is not a system I've developted - I've been assigned to it by my management to improve performance (originall team doesn't have knowledge on this topic).
After fee weeks of improving SQL queries, indexes, number of sql queries that are beeing executed by application I didn't check one of the most important things in this case !!
COLUMN TYPES ARE DIFFERENT !
Developer who have written than kind of code should get quite a big TALK.
Thanks for help !
I had the same problem with a different cause. I was joining a large table, and the ON clause used OR to compare the primary key (ii.itemid) to two different columns:
SELECT *
FROM share_detail sd
JOIN box_view bv ON sd.container_id = bv.id
JOIN boxes b ON b.id = bv.shared_id
JOIN item_index ii ON ii.itemid = bv.shared_id OR b.parent_itemid = ii.itemid;
Fortunately, it turned out the parent_itemid comparison was redundant, so I was able to remove it. Now the index is being used as expected. Otherwise, I was going to try splitting the item_index join into two separate joins.