What can cause mysql performance degradation after move? - mysql

I recently started moving my application from one host to another. From my home computer, to a virtual machine in the cloud. When testing the performance on the new node I noticed severe degradation. Comparing the results of the same query, with the same data, with the same version of mysql.
On my home computer:
mysql> SELECT id FROM events WHERE id in (SELECT distinct event AS id FROM results WHERE status='Inactive') AND (DATEDIFF(NOW(), startdate) < 30) AND (DATEDIFF(NOW(), startdate) > -1) AND status <> 10 AND (form = 'IndSingleDay' OR form = 'IndMultiDay');
+------+
| id |
+------+
| 8238 |
| 8369 |
+------+
2 rows in set (0,57 sec)
and on the new machine:
mysql> SELECT id FROM events WHERE id in (SELECT distinct event AS id FROM results WHERE status='Inactive') AND (DATEDIFF(NOW(), startdate) < 30) AND (DATEDIFF(NOW(), startdate) > -1) AND status <> 10 AND (form = 'IndSingleDay' OR form = 'IndMultiDay');
+------+
| id |
+------+
| 8369 |
+------+
1 row in set (26.70 sec)
Which means 46 times slower. That is not okay. I tried to get an explanation to why it was so slow. For my home computer:
mysql> explain SELECT id FROM events WHERE id in (SELECT distinct event AS id FROM results WHERE status='Inactive') AND (DATEDIFF(NOW(), startdate) < 30) AND (DATEDIFF(NOW(), startdate) > -1) AND status <> 10 AND (form = 'IndSingleDay' OR form = 'IndMultiDay');
+----+--------------+-------------+--------+---------------+------------+---------+-------------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+-------------+--------+---------------+------------+---------+-------------------+---------+-------------+
| 1 | SIMPLE | events | ALL | PRIMARY | NULL | NULL | NULL | 5370 | Using where |
| 1 | SIMPLE | <subquery2> | eq_ref | <auto_key> | <auto_key> | 5 | eventor.events.id | 1 | NULL |
| 2 | MATERIALIZED | results | ALL | idx_event | NULL | NULL | NULL | 1319428 | Using where |
+----+--------------+-------------+--------+---------------+------------+---------+-------------------+---------+-------------+
3 rows in set (0,00 sec)
And for my virtual node:
mysql> explain SELECT id FROM events WHERE id in (SELECT distinct event AS id FROM results WHERE status='Inactive') AND (DATEDIFF(NOW(), startdate) < 30) AND (DATEDIFF(NOW(), startdate) > -1) AND status <> 10 AND (form = 'IndSingleDay' OR form = 'IndMultiDay');
+----+--------------------+---------+----------------+---------------+-----------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------+----------------+---------------+-----------+---------+------+------+-------------+
| 1 | PRIMARY | events | ALL | NULL | NULL | NULL | NULL | 7297 | Using where |
| 2 | DEPENDENT SUBQUERY | results | index_subquery | idx_event | idx_event | 5 | func | 199 | Using where |
+----+--------------------+---------+----------------+---------------+-----------+---------+------+------+-------------+
2 rows in set (0.00 sec)
As you can see the results differ. I have not been able to figure out what the difference is. From all other point of views, the two system setups look similar.

In this case, the most likely problem is the processing of the subquery. This changed between some recent versions of MySQL (older versions do a poor job of optimizing the subqueries, the newest version does a better job).
One simple solution is to replace the in with exists and a correlated subquery:
SELECT id
FROM events
WHERE exists (SELECT 1
FROM results
WHERE status='Inactive' and results.event = events.id
) AND
(DATEDIFF(NOW(), startdate) < 30) AND (DATEDIFF(NOW(), startdate) > -1) AND status <> 10 AND (form = 'IndSingleDay' OR form = 'IndMultiDay');
This should work well in both versions, especially if you have an index on results(status, event).

The difference between 5.5 and 5.6 because of the new optimizations for handling subqueries explains (as discussed in comments) the difference in performance, but this conclusion also masks the fact that the original query is not written optimally to begin with. There does not seem to be a need for a subquery here at all.
The "events" table needs an index on (status,form,startdate) and the "results" table needs an index on (status) and another index on (event).
SELECT DISTINCT e.id
FROM events e
JOIN results r ON r.event = e.id AND r.status = 'Inactive'
WHERE (e.form = 'IndSingleDay' OR e.form = 'IndMultiDay')
AND e.status != 10
AND start_date > DATE_SUB(DATE(NOW()), INTERVAL 30 DAY)
AND start_date < DATE_SUB(DATE(NOW()), INTERVAL 2 DAY);
You might have to tweak the values "30" and "2" to get precisely the same logic, but the important principle here is that you never want to use a column as an argument to a function in the WHERE clause if it can be avoided by rewriting the expression another way, because the optimizer can't look "backwards" through the function to discover the actual range of raw values that you are wanting it to find. Instead, it has to evaluate the function against all of the possible data that it can't otherwise eliminate.
Using functions to derive constant values for comparison to the column, as shown above, allows the optimizer to realize that it's actually looking for a range of start_date values, and narrow down the possible rows accordingly, assuming an index exists on the values in question.
If I've decoded your query correctly, this version should be faster than any subquery if the indexes are in place.

Related

mysql query returns empty results intermittently

I have the following query that sometimes returns an empty set on the master but NEVER on the read replica and there is data that is there that match on both databases. It is random and am wondering if there is a mysql setting or something with query cache. Running mysql 5.6.40-log on rds.
I have tried doing optimizer_switch="index_merge_intersection=off" but it didn't work.
UPDATE optimizer_switch="index_merge_intersection=off seems to have worked, but I cleared the query cache after making this change and the problem seems to have resolved itself).
One really odd issue that happened is the query worked via mysql command line 100% of the time; but the web application didn't work until I cleared the query cache (even though it connects as the same user).
Once I do optimize table phppos_items it fixes it for a little bit (3 minutes) and then it goes back to being erratic (mostly empty sets). These are all innodb tables.
settings:
https://gist.github.com/blasto333/82b18ef979438b93e4c39624bbf489d7
Seems to return empty set more often during busy time of day. Server is rds m4.large with 500 databases with 100 tables each
Query:
SELECT SUM( phppos_sales_items.damaged_qty ) AS damaged_qty,
SUM( phppos_sales_items.subtotal ) AS subtotal,
SUM( phppos_sales_items.total ) AS total,
SUM( phppos_sales_items.tax ) AS tax,
SUM( phppos_sales_items.profit ) AS profit
FROM `phppos_sales`
JOIN `phppos_sales_items` ON `phppos_sales_items`.`sale_id` = `phppos_sales`.`sale_id`
JOIN `phppos_items` ON `phppos_sales_items`.`item_id` = `phppos_items`.`item_id`
WHERE `phppos_sales`.`deleted` =0
AND `sale_time` BETWEEN '2019-01-01 00:00:00' AND '2019-12-31 23:59:59'
AND `phppos_sales`.`location_id` IN ( 1 )
AND `phppos_sales`.`store_account_payment` =0
AND `suspended` <2
AND `phppos_items`.`deleted` =0
AND `phppos_items`.`supplier_id` = '485'
GROUP BY `phppos_sales_items`.`sale_id`
Explain:
+----+-------------+--------------------+-------------+-----------------------------------------------------------------------------------------------+-----------------------------+---------+-------------------------------------------------------+------+---------------------------------------------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------+-------------+-----------------------------------------------------------------------------------------------+-----------------------------+---------+-------------------------------------------------------+------+---------------------------------------------------------------------------------------------------------+
| 1 | SIMPLE | phppos_items | index_merge | PRIMARY,phppos_items_ibfk_1,deleted,deleted_system_item | phppos_items_ibfk_1,deleted | 5,4 | NULL | 44 | Using intersect(phppos_items_ibfk_1,deleted); Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | phppos_sales_items | ref | PRIMARY,item_id,phppos_sales_items_ibfk_3,phppos_sales_items_ibfk_4,phppos_sales_items_ibfk_5 | item_id | 4 | phppoint_customer.phppos_items.item_id | 16 | NULL |
| 1 | SIMPLE | phppos_sales | eq_ref | PRIMARY,deleted,location_id,sales_search,phppos_sales_ibfk_10 | PRIMARY | 4 | phppoint_customer.phppos_sales_items.sale_id | 1 | Using where |
+----+-------------+--------------------+-------------+-----------------------------------------------------------------------------------------------+-----------------------------+---------+-------------------------------------------------------+------+---------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)

MySQL doesn't use indexes in a SELECT clause subquery

I have an "events" table
table events
id (pk, auto inc, unsigned int)
field1,
field2,
...
date DATETIME (indexed)
I am trying to analyse holes in the trafic (the moments where there is 0 event in a day)
I try this kind of request
SELECT
e1.date AS date1,
(
SELECT date
FROM events AS e2
WHERE e2.date > e1.date
LIMIT 1
) AS date2
FROM events AS e1
WHERE e1.date > NOW() -INTERVAL 10 DAY
It takes a very huge amount of time
Here is the explain
+----+--------------------+-------+-------+---------------------+---------------------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+-------+---------------------+---------------------+---------+------+----------+-------------+
| 1 | PRIMARY | t1 | range | DATE | DATE | 6 | NULL | 1 | Using where |
| 2 | DEPENDENT SUBQUERY | t2 | ALL | DATE | NULL | NULL | NULL | 58678524 | Using where |
+----+--------------------+-------+-------+---------------------+---------------------+---------+------+----------+-------------+
2 rows in set (0.00 sec)
Tested on MySQL 5.5
Why can't mysql use the DATE indexe? is it because of a subquery?
Your query suffers from the problem shown here which also presents a quick solution with temp tables. That is a mysql forum page, all of which I unearthed thru finding this Stackoverflow question.
You may find that the creation and populating such a new table on the fly yields bearable performance and is easy to implement with the range of datetimes now() less 10 days.
If you need assistance in crafting anything, let me know. I will see if I can help.
You are looking for dates with no events?
First build a table Days with all possible dates (dy). This will give you the uneventful days:
SELECT dy
FROM Days
WHERE NOT EXISTS ( SELECT * FROM events
WHERE date >= days.day
AND date < days.day + INTERVAL 1 DAY )
AND dy > NOW() -INTERVAL 10 DAY
Please note that 5.6 has some optimizations in this general area.

Indexes and optimization

I'm not brilliant when it comes to going beyond the basics with MySQL, however, I'm trying to optimize a query:
SELECT DATE_FORMAT(t.completed, '%H') AS hour, t.orderId, t.completed as stamp,
t.deadline as deadline, t.completedBy as user, p.largeFormat as largeFormat
FROM tasks t
JOIN orders o ON o.id=t.orderId
JOIN products p ON p.id=o.productId
WHERE DATE(t.completed) = '2013-09-11'
AND t.type = 7
AND t.completedBy IN ('user1', 'user2')
AND t.suspended = '0'
AND o.shanleys = 0
LIMIT 0,100
+----+-------------+-------+--------+----------------------------+-----------+---------+-----------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+----------------------------+-----------+---------+-----------------+-------+-------------+
| 1 | SIMPLE | o | ref | PRIMARY,productId,shanleys | shanleys | 2 | const | 54464 | Using where |
| 1 | SIMPLE | p | eq_ref | PRIMARY | PRIMARY | 4 | sfp.o.productId | 1 | |
| 1 | SIMPLE | t | ref | NewIndex1 | NewIndex1 | 5 | sfp.o.id | 6 | Using where |
+----+-------------+-------+--------+----------------------------+-----------+---------+-----------------+-------+-------------+
Before some of the indexes were added it was performing full table scans on both the p table and the o table.
Basically, I thought that MySQL would:
limit down the rows from the tasks table with the where clauses (should be 84 rows without the joins)
then go through the orders table to the products table to get a flag (largeFormat).
My questions are why does MySQL look up 50000+ rows when it's only got 84 different ids to look for, and is there a way that I can optimize the query?
I'm not able to add new fields or new tables.
Thank you in advance!
SQL needs to work on available indexes to best qualify the query
I would have a compound index on
( type, suspended, completedby, completed)
to match the criteria you have... Your orders and products tables appear ok with their existing indexes.
SELECT
DATE_FORMAT(t.completed, '%H') AS hour,
t.orderId,
t.completed as stamp,
t.deadline,
t.completedBy as user,
p.largeFormat as largeFormat
FROM
tasks t
JOIN orders o
ON t.orderId = o.id
AND o.shanleys = 0
JOIN products p
ON o.productId = p.id
WHERE
t.type = 7
AND t.suspended = 0
AND t.completedBy IN ('user1', 'user2')
AND t.completed >= '2013-09-11'
AND t.completed < '2013-09-12'
LIMIT
0,100
I suspect that suspended is a flag and is numeric (int) based, if so, leave the
criteria as a numeric and not string by wrapping in '0' quotes.
FOR datetime fields, if you try TO apply functions TO it, it cant utilize the index
well... so, if you only care about the one DAY(or range in other queries),
notice I have the datetime field >= '2013-09-11' which is implied of 12:00:00 AM,
AND the datetime field is LESS THAN '2013-09-12' which allows up to 11:59:59PM on the 2013-09-11
which is the entire day AND the index can take advantage of it.

MySQL performance using where

A simple query like the one below, properly indexed on a table populated with roughly 2M rows is taking 95 rows in set (2.06 sec) a lot longer to complete than I was hoping for.
As this is my first experience with tables this size, am I looking into normal behavior?
Query:
SELECT t.id, t.symbol, t.feed, t.time,
FLOOR(UNIX_TIMESTAMP(t.time)/(60*15)) as diff
FROM data as t
WHERE t.symbol = 'XYZ'
AND DATE(t.time) = '2011-06-02'
AND t.feed = '1M'
GROUP BY diff
ORDER BY t.time ASC;
...and Explain:
+----+-------------+-------+------+--------------------+--------+---------+-------+--------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+--------------------+--------+---------+-------+--------+----------------------------------------------+
| 1 | SIMPLE | t | ref | unique,feed,symbol | symbol | 1 | const | 346392 | Using where; Using temporary; Using filesort |
+----+-------------+-------+------+--------------------+--------+---------+-------+--------+----------------------------------------------+
Try this:
...
AND t.time >= '2011-06-02' AND t.time < '2011-06-03'
...
Otherwise, your index(es) are wrong for this query. I'd expect one on (symbol, feed, time, id) or (feed, symbol, time, id) to cover it.
Edit, after comment:
If you put a function or processing on a column, any index is liable to be ignored. The index is on x not f(x) basically.
This change allows the index to be used because we now do a <= x < y to ignore the time part, not takeofftime(x)

Big SQL SELECT performance difference when using <= against using < on a DATETIME column

Given the following table:
desc exchange_rates;
+------------------+----------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| time | datetime | NO | MUL | NULL | |
| base_currency | varchar(3) | NO | MUL | NULL | |
| counter_currency | varchar(3) | NO | MUL | NULL | |
| rate | decimal(32,16) | NO | | NULL | |
+------------------+----------------+------+-----+---------+----------------+
I have added indexes on time, base_currency and counter_currency, as well as a composite index on (time, base_currency, counter_currency), but I'm seeing a big performance difference when I perform a SELECT using <= against using <.
The first SELECT is:
ExchangeRate Load (95.5ms)
SELECT * FROM `exchange_rates` WHERE (time <= '2009-12-30 14:42:02' and base_currency = 'GBP' and counter_currency = 'USD') LIMIT 1
As you can see this is taking 95ms.
If I change the query such that I compare time using < rather than <= I see this:
ExchangeRate Load (0.8ms)
SELECT * FROM `exchange_rates` WHERE (time < '2009-12-30 14:42:02' and base_currency = 'GBP' and counter_currency = 'USD') LIMIT 1
Now it takes less than 1 millisecond, which sounds right to me. Is there a rational explanation for this behaviour?
The output from EXPLAIN provides further details, but I'm not 100% sure how to intepret this:
-- Output from the first, slow, select
SIMPLE | 5,5 | exchange_rates | 1 | index_exchange_rates_on_time,index_exchange_rates_on_base_currency,index_exchange_rates_on_counter_currency,time_and_currency | index_merge | Using intersect(index_exchange_rates_on_counter_currency,index_exchange_rates_on_base_currency); Using where | 813 | | index_exchange_rates_on_counter_currency,index_exchange_rates_on_base_currency
-- Output from the second, fast, select
SIMPLE | 5 | exchange_rates | 1 | index_exchange_rates_on_time,index_exchange_rates_on_base_currency,index_exchange_rates_on_counter_currency,time_and_currency | ref | Using where | 4988 | const | index_exchange_rates_on_counter_currency
(Note: I'm producing these queries through ActiveRecord (in a Rails app) but these are ultimately the queries which are being executed)
In the first case, MySQL tries to combine results from all indexes. It fetches all records from both indexes and joins them on the value of the row pointer (table offset in MyISAM, PRIMARY KEY in InnoDB).
In the second case, it just uses a single index, which, considering LIMIT 1, is the best decision.
You need to create a composite index on (base_currency, counter_currency, time) (in this order) for this query to work as fast as possible.
The engine will use the index for filtering on the leading columns (base_currency, counter_currency) and for ordering on the trailing column (time).
It also seems you want to add something like ORDER BY time DESC to your query to get the last exchange rate.
In general, any LIMIT without ORDER BY should ring the bell.