MySql optimize group by when group by uses a function

MySql optimize group by when group by uses a function - mysql

I have a table hardBrake with the following schema :
-------------------- --------------------
| Column Name | Type |
-------------------- --------------------
| id | CHAR(36) primary |
-------------------- --------------------
| vehicleId | CHAR(36) |
-------------------- --------------------
| address | VARCHAR(50) |
-------------------- --------------------
| time | TIMESTAMP |
-------------------- --------------------
Now when I am running my query and trying to group it using time column with day() function and used EXPLAIN EXTENDED on the select query it shows using temporary and filesort along with using where and using index.
I am using time + vehicleId column as index and my select query is :
select count(1),CONVERT_TZ(time,'+00:00', :offset) as dateOfIncident
from hardBrake
where vehicleId in (vehicleIds)
and time between NOW() - INTERVAL 30 DAY and NOW()
group by day(dateOfIncident )
order by time DESC;
offset field and vehicleId field I am passing from my java code that is the time zone difference between users time zone and db time zone that is in GMT and vehicleIds of customer respectively.
Is it possible to remove temporary and filesort for a query that uses group by on a function???
A sneak peak of my problem :
I want to get hardBrake incident for a customer date wise in his time zone. As an alternative can I do any changes in java side without having to go for time zone in mysql.

I am afraid this query cannot avoid the temporary table.
Refer to this link: http://dev.mysql.com/doc/refman/5.7/en/group-by-optimization.html
The most important preconditions for using indexes for GROUP BY are that all GROUP BY columns reference attributes from the same index, and that the index stores its keys in order .....
Since the query is using a function CONVERT_TZ(time,'+00:00', :offset) to obtain GROUP BY values, MySql cannot retrieve these values from the index, it must calculate them "on the fly" using the function, and must store them in the temporary table.
The same problem is with the filesort,
read this link: http://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html
In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause. These cases include the following:
..........
..........
- You have different ORDER BY and GROUP BY expressions.
The query has different GROUP BY and ORDER BY expressions:
group by day(dateOfIncident )
order by time DESC;
therefore MySql cannot use the index, and must use the filesort.

Related

(mysql query performance issues) Indexing of large historical share price database

this might be a trivial question for some of you but I haven't found/understood a solution to the following problem:
I have a large c 60 GB database structured the following way:
| Field | Type | Null | Key | Default | Extra |
+------------+----------+------+-----+---------+-------+
| date | datetime | YES | MUL | NULL | |
| chgpct1d | double | YES | | NULL | |
| pair | text | YES | | NULL | |
The database stores the last 10 years of daily percentage changes for c 200k different pair-trades. Thus, neither date nor pair is a unique key (a combination of date + pair would be). There are c 2600 distinct date entries and c 200k distinct pairs which generate > 520 MM rows.
The following query takes c multiple minutes to return a result.
SELECT date, chgpct1d, pair FROM db WHERE date = '2018-12-20';
What can I do to speed things up?
I've read about multiple-column indices but I'm not sure if that would help in my case given that all of the WHERE-queries will only ever point to the 'date' field.

MySQL probably does a full table scan to satisfy your query. That's like looking up a word in a dictionary that has its entries in random order: very slow.
Two things:
Create an index on these columns: (date, chgpct1d, pair).
Because the column named date has the DATETIME data type, it can potentially contain values like 2018-12-20 10:17:20. When you say WHERE date = '2018-12-20' it actually means WHERE date = '2018-12-20 00:00:00'. So, use this instead
WHERE date >= '2018-12-20'
AND date < '2018-12-21`
That will capture all the date values at any time on your chosen date.
Why does this help? Because your multicolumn index starts with date, MySQL can do a range scan on it given the WHERE statement you have. And, because the index contains everything needed by your query the database server doesn't have to look anywhere else, but can satisfy the query directly from the index. That index is said to cover the query.
Notice that with half a gigarow in your table, creating the index will take a while. Do it overnight or something.

DISTINCT COUNT with GROUP BY query is too slow despite indexes

I have the following query that counts the number of vessels in each zone for each week:
SELECT zone,
DATE_FORMAT(creation_date, '%Y%u') AS date,
COUNT(DISTINCT vessel_imo) AS vessel_count
FROM vessel_position
WHERE zone IS NOT NULL
AND creation_date >= DATE_SUB(CURDATE(), INTERVAL 12 MONTH)
GROUP BY zone, date;
The table has about 40 million rows. The execution plan for this is:
+----+-------------+-----------------+------------+-------+--------------------+------+---------+------+----------+----------+------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------------+------------+-------+--------------------+------+---------+------+----------+----------+------------------------------------------+
| 1 | SIMPLE | vessel_position | NULL | range | creation_date,zone | zone | 5 | NULL | 21190904 | 50.00 | Using where; Using index; Using filesort |
+----+-------------+-----------------+------------+-------+--------------------+------+---------+------+----------+----------+------------------------------------------+
Columns vessel_imo, zone and creation_date each indexed. Primary key is the composite key (vessel_imo, creation_date).
When I look at the query profile, I can see that a large amount of time i spent doing Creating sort index.
Is there anything I can do to improve this query further?

Assuming the data, once inserted, does not change, then build and maintain a Summary Table.
The table would have three columns: the zone, the week, and the count-distinct for that week. At the start of each week, build only the rows for the previous week (one per zone; skip NULL). Then build a query to work against that table -- it will be extremely fast since it will be fetching far fewer rows.
Meanwhile, the INDEX(creation_date, zone, vessel_imo) as a secondary index, will make the weekly task reasonably efficient (~52 times as fast as your current query).

It depends on how selective your filtering condition is, and your table structure. Does the filtering condition selects 20% of the rows, 5%, 1%, 0.1%?
If your answer is less than 5% then the following index could help:
create index ix1_date_zone on vessel_position (creation_date, zone);
If your table has many and/or heavy columns, then this option could still be slow, depending on how selective your filtering condition is.
Otherwise, you could try using a more expensive index, to avoid using the table and do:
create index ix2_date_zone_imo on vessel_position
(creation_date, zone, vessel_imo);
This index is more expensive to maintain -- read insert, update, delete rows -- but it would be faster for your select.
Try both options and pick the best for your needs.

SET #mystartdate = DATE_SUB(CURDATE(), INTERVAL 12 MONTH);
SELECT zone, DATE_FORMAT(creation_date, '%Y%u') AS date,
COUNT(DISTINCT vessel_imo) AS vessel_count
FROM vessel_position
WHERE creation_date >= #mystartdate
AND zone > 0
GROUP BY zone, date;
may provide results in less time, please post your comparative times of second run of each ( old and suggested )
Please post new EXPLAIN SELECT … to confirm index of creation date is now used.
Unless old data is allowed to change, why do you have to gather 12 months history, the numbers more than 1 month ago are NOT going to change.

Why is MySQL slow when using LIMIT in my query?

I'm trying to figure out why is one of my query slow and how I can fix it but I'm a bit puzzled on my results.
I have an orders table with around 80 columns and 775179 rows and I'm doing the following request :
SELECT * FROM orders WHERE id_state = 2 AND id_mp IS NOT NULL ORDER BY creation_date DESC LIMIT 200
which returns 38 rows in 4.5s
When removing the ORDER BY I'm getting a nice improvement :
SELECT * FROM orders WHERE id_state = 2 AND id_mp IS NOT NULL LIMIT 200
38 rows in 0.30s
But when removing the LIMIT without touching the ORDER BY I'm getting an even better result :
SELECT * FROM orders WHERE id_state = 2 AND id_mp IS NOT NULL ORDER BY creation_date DESC
38 rows in 0.10s (??)
Why is my LIMIT so hungry ?
GOING FURTHER
I was trying a few things before sending my answer and after noticing that I had an index on creation_date (which is a datetime) I removed it and the first query now runs in 0.10s. Why is that ?
EDIT
Good guess, I have indexes on the others columns part of the where.
mysql> explain SELECT * FROM orders WHERE id_state = 2 AND id_mp IS NOT NULL ORDER BY creation_date DESC LIMIT 200;
+----+-------------+--------+-------+------------------------+---------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+------------------------+---------------+---------+------+------+-------------+
| 1 | SIMPLE | orders | index | id_state_idx,id_mp_idx | creation_date | 5 | NULL | 1719 | Using where |
+----+-------------+--------+-------+------------------------+---------------+---------+------+------+-------------+
1 row in set (0.00 sec)
mysql> explain SELECT * FROM orders WHERE id_state = 2 AND id_mp IS NOT NULL ORDER BY creation_date DESC;
+----+-------------+--------+-------+------------------------+-----------+---------+------+-------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+------------------------+-----------+---------+------+-------+----------------------------------------------------+
| 1 | SIMPLE | orders | range | id_state_idx,id_mp_idx | id_mp_idx | 3 | NULL | 87502 | Using index condition; Using where; Using filesort |
+----+-------------+--------+-------+------------------------+-----------+---------+------+-------+----------------------------------------------------+

Indexes do not necessarily improve performance. To better understand what is happening, it would help if you included the explain for the different queries.
My best guess would be that you have an index in id_state or even id_state, id_mp that can be used to satisfy the where clause. If so, the first query without the order by would use this index. It should be pretty fast. Even without an index, this requires a sequential scan of the pages in the orders table, which can still be pretty fast.
Then when you add the index on creation_date, MySQL decides to use that index instead for the order by. This requires reading each row in the index, then fetching the corresponding data page to check the where conditions and return the columns (if there is a match). This reading is highly inefficient, because it is not in "page" order but rather as specified by the index. Random reads can be quite inefficient.
Worse, even though you have a limit, you still have to read the entire table because the entire result set is needed. Although you have saved a sort on 38 records, you have created a massively inefficient query.
By the way, this situation gets significantly worse if the orders table does not fit in available memory. Then you have a condition called "thrashing", where each new record tends to generate a new I/O read. So, if a page has 100 records on it, the page might have to be read 100 times.
You can make all these queries run faster by having an index on orders(id_state, id_mp, creation_date). The where clause will use the first two columns and the order by will use the last.

Same problem happened in my project,
I did some test, and found out that LIMIT is slow because of row lookups
See:
MySQL ORDER BY / LIMIT performance: late row lookups
So, the solution is:
(A)when using LIMIT, select not all columns, but only the PK columns
(B)Select all columns you need, and then join with the result set of (A)
SQL should likes:
SELECT
*
FROM
orders O1 <=== this is what you want
JOIN
(
SELECT
ID <== fetch the PK column only, this should be fast
FROM
orders
WHERE
[your query condition] <== filter record by condition
ORDER BY
[your order by condition] <== control the record order
LIMIT 2000, 50 <== filter record by paging condition
) as O2
ON
O1.ID = O2.ID
ORDER BY
[your order by condition] <== control the record order
in my DB,
the old SQL which select all columns using "LIMIT 21560, 20", costs about 4.484s.
the new sql costs only 0.063s. The new one is about 71 times faster

I had a similar issue on a table of 2.5 million records. Removing the limit part the query took a few seconds. With the limit part it stuck forever.
I solved with a subquery. In your case it would became:
SELECT *
FROM
(SELECT *
FROM orders
WHERE id_state = 2
AND id_mp IS NOT NULL
ORDER BY creation_date DESC) tmp
LIMIT 200
I noted that the original query was fast when the number of selected rows was greater than the limit parameter. Se the query became extremely slow when the limit parameter was useless.
Another solution is trying forcing index. In your case you can try with
SELECT *
FROM orders force index (id_mp_idx)
WHERE id_state = 2
AND id_mp IS NOT NULL
ORDER BY creation_date DESC
LIMIT 200

Problem is that mysql is forced to sort data on the fly. My query of deep offset like:
ORDER BY somecol LIMIT 99990, 10
Took 2.5s.
I fixed it by creating a new table, which has presorted data by column somecol and contains only ids, and there the deep offset (without need to use ORDER BY) takes 0.09s.
0.1s is not still enough fast though. 0.01s would be better.
I will end up creating a table that holds the page number as special indexed column, so instead of doing limit x, y i will query where page = Z.
i just tried it and it is fast as 0.0013. only problem is, that the offseting is based on static numbers (presorted in pages by 10 items for example.. its not that big problem though.. you can still get out any data of any pages.)

Search distinct date parts fast in mysql

I've got a database of ~10 million entries, each of which contains a date stored as DATE.
I've indexed that column using a non-unique BTREE.
I'm running a query that counts the number of entries for each distinct year:
SELECT DISTINCT(YEAR(awesome_date)) as year, COUNT(id) as count
FROM all_entries
WHERE awesome_date IS NOT NULL
GROUP BY YEAR(awesome_date)
ORDER BY year DESC;
The query takes about 90 seconds to run at the moment, and the EXPLAIN output demonstrates why:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
----------------------------------------------------------------------------------------------------------------------------------------
1 | SIMPLE | all_entries | ALL | awesome_date | | | | 9759848 | Using where; Using temporary; Using filesort
If I FORCE KEY(awesome_date) that drops the rows count down to ~8 million and the key_len = 4, but is still Using where; Using temporary; Using filesort.
I also run queries selecting DISTINCT(MONTH(awesome_date)) and DISTINCT(DAY(awesome_date)) with the relevant WHERE conditions restricting them to a particular year or month.
Other than storing the year, month and day information in separate columns, is there a way of speeding up this query and/or avoiding temporary tables and filesort?

Without splitting the date to 3 columns, you could:
First, you should remove the DISTINCT, it is useless. – ypercube 1 min ago edit
Remove the ORDER BY year, it would help improve speed (a bit). Change the Group By to: GROUP BY YEAR(awesome_date) DESC (this works in MySQL dialect only).
Change the COUNT(id) to COUNT(*) (assuming that id can never be NULL, this is faster in many MySQL versions).
In all, the query will become:
SELECT YEAR(awesome_date) AS year
, COUNT(*) AS cnt --- not good practise to use reserved words
--- for aliases
FROM all_entries
WHERE awesome_date IS NOT NULL
GROUP BY YEAR(awesome_date) DESC ;
Even better (faster) solutions are:
your proposal to split the column into 3 (year, month, day)
change from MySQL to MariaDB (that is a MySQL fork) and use VIRTUAL PERISTENT column for the year, and add an index on that virtual column.
stay in MySQL and add a persistent year column yourself - by using triggers.

HowTo: Query MySQL to retrieve search data, while limiting the results and sorting by a field.

I have two simple Mysql tables:
SYMBOL
| id | symbol |
(INT(primary) - varchar)
PRICE
| id | id_symbol | date | price |
(INT(primary), INT(index), date, double)
I have to pass two symbols to get something like:
DATE A B
2001-01-01 | 100.25 | 25.26
2001-01-02 | 100.23 | 25.25
2001-01-03 | 100.24 | 25.24
2001-01-04 | 100.25 | 25.26
2001-01-05 | 100.26 | 25.28
2001-01-06 | 100.27 | 30.29
Where A and B are the symbols i need to search and the date is the date of the prices. (because i need the same date to compare symbol)
If one symbol doesn't have a date that has the other I have to jump it. I only need to retrive the last N prices of those symbols.
ORDER: from the earliest date to latest (example the last 100 prices of both)
How could I implement this query?
Thank you

Implementing these steps should bring you the desired result:
Get dates and prices for symbol A. (Inner join PRICE with SYMBOL to obtain the necessary rows.)
Similarly get dates and prices for symbol B.
Inner join the two result sets on the date column and pull the price from the first result set as the A column and the other one as B.
This should be simple if you know how to join tables.

I think you should update your question to resolve any of the mistakes you made in representing your data. I'm having a hard time following the details. However, I think based on what I am seeing there are four MySQL concepts you need to solve your problem.
The first is JOINS you would use a join to put two tables together so you may select related data using the key that you describe as "id_symbol"
The second would be to use LIMIT which will allow you to specify the number of records to return such as that if you wanted one record you would use the keywould LIMIT 1 or if you wanted a hundred records LIMIT 100
The third would be to use a WHERE clause to allow you to search for a specific value in one of your fields from the table you are querying.
The last is the ORDER BY which will allow you to specify a field to sort your returned records and the direction you want them sorted ASC or DESC
An Example:
SELECT *
FROM table1
JOIN table2 ON table1.id = table2.table1_id
WHERE table1.searchfield = 'search string'
LIMIT 100
ORDER BY table1.orderfield DESC
(This is pseudo code so this query may not actually work but is close and should provide you with the correct idea.)
I suggest referencing the MySQL documentation found here it should provide everything you need to keep going.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008