MySQL Month and Year grouping slow performance

MySQL Month and Year grouping slow performance - mysql

I've got a report which I need to show the month and year profit from my transaction the query which I made and works is very slow and can not figure out how I can manage to change the query the way that consumes less time to load.
SELECT MONTH(MT4_TRADES.CLOSE_TIME) as MONTH
, YEAR(MT4_TRADES.CLOSE_TIME) as YEAR
, SUM(MT4_TRADES.SWAPS) as SWAPS
, SUM(MT4_TRADES.VOLUME)/100 as VOLUME
, SUM(MT4_TRADES.PROFIT) AS PROFIT
FROM MT4_TRADES
JOIN MT4_USERS
ON MT4_TRADES.LOGIN = MT4_USERS.LOGIN
WHERE MT4_TRADES.CMD < 2
AND MT4_TRADES.CLOSE_TIME <> "1970-01-01 00:00:00"
AND MT4_USERS.AGENT_ACCOUNT <> "1"
GROUP
BY YEAR(MT4_TRADES.CLOSE_TIME)
, MONTH(MT4_TRADES.CLOSE_TIME)
ORDER
BY YEAR
This is the full query, any suggestion would be highly appreciated.
This is the result of explain:

Echoing the comment from #Barmar, look at the EXPLAIN output to see query execution plan. Verify that suitable indexes are being used.
Likely the big rock in terms of performance is the "Using filesort" operation.
To get around that, we would need a suitable index available. and that would require some changes to the table. (The typical question on "improve query performance" topic on SO comes with a restrictions that we "can't add indexes or make any changes to the table".)
I'd be looking at a functional index (feature added in MySQL 8.0, for MySQL 5.7, I'd be looking at adding generated columns and including generated columns in a secondary index, featured added in MySQL 5.7)
CREATE INDEX `MT4_TRADES_ix2` ON MT4_TRADES ((YEAR(close_time)),(MONTH(close_time)))
I'd be tempted to go with a covering index, and also change the grouping to a single expression e.g. DATE_FORMAT(close_time,'%Y-%m')
CREATE INDEX `MT4_TRADES_ix3` ON MT4_TRADES ((DATE_FORMAT(close_time,'%Y-%m'))
,swaps,volume,profit,login,cmd,closetime)
from the query, it looks like login is going to be UNIQUE in MT4_USERS table, likely that's the PRIMARY KEY or a UNIQUE KEY, so an index is going to be available, but we're just guessing...
With suitable indexes available, we could so something like this:
SELECT DATE_FORMAT(close_time,'%Y-%m') AS close_year_mo
, SUM(IF(t.cmd < 2 AND t.close_time <> '1970-01-01', t.swaps ,NULL)) AS swaps
, SUM(IF(t.cmd < 2 AND t.close_time <> '1970-01-01', t.volume ,NULL))/100 AS volume
, SUM(IF(t.cmd < 2 AND t.close_time <> '1970-01-01', t.profit ,NULL)) AS profit
FROM MT4_TRADES t
JOIN MT4_USERS u
ON u.login = t.login
AND u.agent_account <> '1'
GROUP BY close_year_mo
ORDER BY close_year_mo
and we'd expect MySQL to do a loose index scan, with the EXPLAIN output top show "using index for group-by" and not show "Using filesort"
EDIT
For versions of MySQL before 5.7, we could create new columns, e.g.year_close and month_close, populate the columns with the results of expressions YEAR(close_time) and MONTH(close_time) (we could create BEFORE INSERT and BEFORE UPDATE triggers to handle that automatically for us)
Then we could create index with those columns as the leading columns
CREATE INDEX ... ON MT4_TRADES ( year_close, month_close, ... )
And then reference the new columns in the query
SELECT t.year_close AS `YEAR`
, t.month_close AS `MONTH`
FROM MT4_TRADES t
JOIN ...
WHERE ...
GROUP
BY t.year_close
, t.month_close
Ideally include in the index all of referenced columns from MT4_TRADES, to make a covering index for the query.

Related

Add index on alias column in MySQL

I have to sort the query result on an alias column (total_reports) which is in group by condition with limit of having 50 number of records.
Please let me know where I am missing,
SELECT Count(world_name) AS total_reports,
name,
Max(last_update) AS report
FROM `world`
WHERE ( `id` = ''
AND `status` = 1 )
AND `time` >= '2017-07-16'
AND `name` LIKE '%%'
GROUP BY `name`
HAVING `total_reports` >= 2
ORDER BY `total_reports` DESC
LIMIT 50 offset 0
Query return what I need. However it runs on all records of table then return result and takes too many time which is not right way. I have thousands of records so its take time. I want to apply key index on alias which is total_reports in my situation.

Create an index on an column from an aggregated result? No, I'm sorry, but MySQL cannot do that natively.
What you need is probably a Materialized View that you could index. Not supported in MySQL (yet), unless you install extra plugins. See How to Create a Materialized View in MySQL.
The Long Answer
You cannot create an index on a column resulting from a GROUP BY statement. That column does not exist on the table, and cannot be derived at the row level (not a virtual column).
You query may be slow since it's probably reading the whole table. To only read the specific range of rows, add the index:
create index ix1 on `world` (`status`, `id`, `time`);
That should make the query use the filtering condition in a much better way and hopefully will speed up your query, by using and Index Range Scan.
Also, please change '%%' for '%'. Double % doesn't make too much sense. Actually, you should remove this condition altogether -- it's not filtering anything.
Finally, if the query is still slow, please post the execution plan, using:
explain <my_query_here>

Query time suddenly increased

I have MariaDB 10.1.14, For a long time I'm doing the following query without problems (it tooks about 3 seconds):
SELECT
sum(transaction_total) as sum_total,
count(*) as count_all,
transaction_currency
FROM
transactions
WHERE
DATE(transactions.created_at) = DATE(CURRENT_DATE)
AND transaction_type = 1
AND transaction_status = 2
GROUP BY
transaction_currency
Suddenly, I'm not sure exactly why, this query take about 13 seconds.
This is the EXPLAIN:
And those are the all indexes of transactions table:
What is the reason for the sudden query time increase? and how can I decrease it?

If you are adding more data to your table the query time will increase.
But you can do a few things to improve the performance.
Create a composite index for ( transaction_type, transaction_status, created_at)
Remove the DATE() functions (or any function) from your fields, because that doesn't allow engine use the index. CURRENT_DATE is a constant so there doesn't matter, but isn't necessary because already return DATE
if created_at isnt date you can use
created_at >= CURRENT_DATE and created_at < CURRENT_DATE + 1
or create a different field to only save the date part.

+1 to answer from #JuanCarlosOropeza, but you can go a little further with the index.
ALTER TABLE transactions ADD INDEX (
transaction_type,
transaction_status,
created_at,
transaction_currency,
transaction_total
);
As #RickJames mentioned in comments, the order of columns is important.
First, columns in equality comparisons
Next, you can index one column that is used for a range comparison (which is anything besides equality), or GROUP BY or ORDER BY. You have both range comparison and GROUP BY, but you can only get the index to help with one of these.
Last, other columns needed for the query, if you think you can get a covering index.
I describe more detail about index design in my presentation How to Design Indexes, Really (video: https://www.youtube.com/watch?v=ELR7-RdU9XU).
You're probably stuck with the "using temporary" since you have a range condition and also a GROUP BY referencing different columns. But you can at least eliminate the "using filesort" by this trick:
...
GROUP BY
transaction_currency
ORDER BY NULL
Supposing that it's not important to you which order the rows of the query results return in.

I don't know what has made your query slower. More data? Fragmentation? New DB version?
However, I am surprised to see that there is no index really supporting the query. You should have a compound index starting with the column with highest cardinality (the date? well, you can try different column orders and see which index the DBMS picks for the query).
create index idx1 on transactions(created_at, transaction_type, transaction_status);
If created_at contains a date part, then you may want to create a computed column created_on only containing the date and index that instead.
You can even extend this index to a covering index (where clause fields followed by group by clause fields followed by select clause fields):
create index idx2 on transactions(created_at, transaction_type, transaction_status,
transaction_currency, transaction_total);

MySQL Slow query ~ 10 seconds

I have this query which basically goes through a bunch of tables to get me some formatted results but I can't seem to find the bottleneck. The easiest bottleneck was the ORDER BY RAND() but the performance are still bad.
The query takes from 10 sec to 20 secs without ORDER BY RAND();
SELECT
c.prix AS prix,
ST_X(a.point) AS X,
ST_Y(a.point) AS Y,
s.sizeFormat AS size,
es.name AS estateSize,
c.title AS title,
DATE_FORMAT(c.datePub, '%m-%d-%y') AS datePub,
dbr.name AS dateBuiltRange,
m.myId AS meuble,
c.rawData_id AS rawData_id,
GROUP_CONCAT(img.captionWebPath) AS paths
FROM
immobilier_ad_blank AS c
LEFT JOIN PropertyFeature AS pf ON (c.propertyFeature_id = pf.id)
LEFT JOIN Adresse AS a ON (c.adresse_id = a.id)
LEFT JOIN Size AS s ON (pf.size_id = s.id)
LEFT JOIN EstateSize AS es ON (pf.estateSize_id = es.id)
LEFT JOIN Meuble AS m ON (pf.meuble_id = m.id)
LEFT JOIN DateBuiltRange AS dbr ON (pf.dateBuiltRange_id = dbr.id)
LEFT JOIN ImageAd AS img ON (img.commonAd_id = c.rawData_id)
WHERE
c.prix != 0
AND pf.subCatMyId = 1
AND (
(
c.datePub > STR_TO_DATE('01-04-2016', '%d-%m-%Y')
AND c.datePub < STR_TO_DATE('30-04-2016', '%d-%m-%Y')
)
OR date_format(c.datePub, '%d-%m-%Y') = '30-04-2016'
)
AND a.validPoint = 1
GROUP BY
c.id
#ORDER BY
# RAND()
LIMIT
5000
Here is the explain query:
Visual Portion:
And here is a screenshot of mysqltuner
EDIT 1
I have many indexes Here they are:
EDIT 2:
So you guys did it. Down to .5 secs to 2.5 secs.
I mostly followed all of your advices and changed some of my.cnf + runned optimized on my tables.

You're searching for dates in a very suboptimal way. Try this.
... c.datePub >= STR_TO_DATE('01-04-2016', '%d-%m-%Y')
AND c.datePub < STR_TO_DATE('30-04-2016', '%d-%m-%Y') + INTERVAL 1 DAY
That allows a range scan on an index on the datePub column. You should create a compound index for that table on (datePub, prix, addresse_id, rawData_id) and see if it helps.
Also try an index on a (valid_point). Notice that your use of a geometry data type in that table is probably not helping anything.

To begin with you have quite a lot of indexes but many of them are not useful. Remember more indexes means slower inserts and updates. Also mysql is not good at using more than one index per table in complex queries. The following indexes have a cardinality < 10 and probably should be dropped.
IDX_...E88B
IDX....62AF
IDX....7DEE
idx2
UNIQ...F210
UNIQ...F210..
IDX....0C00
IDX....A2F1
At this point I got tired of the excercise, there are many more
Then you have some duplicated data.
point
lat
lng
The point field has the lat and lng in it. So the latter two are not needed. That means you can lose two more indexes idxlat and idxlng. I am not quite sure how idxlng appears twice in the index list for the same table.
These optimizations will lead to an overall increase in performance for INSERTS and UPDATES and possibly for all SELECTs as well because the query planner needs to spend less time deciding which index to use.
Then we notice from your explain that the query does not use any index on table Adresse (a). But your where clause has a.validPoint = 1 clearly you need an index on it as suggested by #Ollie-Jones
However I suspect that this index may have low cardinality. In that case I recommend that you create a composite index on this column + another.

The problem is your join with (a). The table has an index, but the index can't be used, more than likely due to the sort (/group by), or possibly incompatible types. The EXPLAIN shows three quarters of a million rows examined, this means that index lookup was not possible.
When designing a query, look for the smallest possible result set - search by that index, and then join from there. Perhaps "c" isn't the best table for the primary query.
(You could try using FORCE INDEX (id) on table a, if it doesn't work, the error may give you more information).

As others have pointed out, you need an index on a.validPoint but what about c.datePub that is also used in the WHERE clause. Why not a multiple column index on datePub, address_id the index on address_id is already used, so a multiple column index will be better here.

search by date mysql performance

I have a large table with about 100 million records, with fields start_date and end_date, with DATE type. I need to check the number of overlaps with some date range, say between 2013-08-20 AND 2013-08-30, So I use.
SELECT COUNT(*) FROM myTable WHERE end_date >= '2013-08-20'
AND start_date <= '2013-08-30'
date column are indexed.
The important points is that the date ranges that I am searching for overlap are always in the future, while the main part of the records in the table are in the past (say about 97-99 million).
So, will this query be faster, if I add a column is_future - TINYINT, so, by checking only that condition like this
SELECT COUNT(*) FROM myTable WHERE is_future = 1
AND end_date >= '2013-08-20' AND start_date <= '2013-08-30'
it will exclude the rest 97 million or so records and will check the date condition for only the remaining 1-3 million records ?
I use MySQL
Thanks
EDIT
The mysql engine is innodb, but will matter considerably if it is say, MyISAM
here is the create table
CREATE TABLE `orders` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`title`
`start_date` date DEFAULT NULL,
`end_date` date DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=24 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
EDIT 2 after #Robert Co answer
The partitioning looks like a good idea for this case, but it does not allow me to create partition based on is_future field unless I define it as primary key, otherwise I should remove my main primary key - id, which I can not do. So, if I define that field as primary key, then is there a meaning of partitioning, will not it be fast already if I search by is_future field which is primary key.
EDIT 3
The actual query where I need to use this is to select restaurant that have some free tables for that date range
SELECT r.id, r.name, r.table_count
FROM restaurants r
LEFT JOIN orders o
ON r.id = o.restaurant_id
WHERE o.id IS NULL
OR (r.table_count > (SELECT COUNT(*)
FROM orders o2
WHERE o2.restaurant_id = r.id AND
end_date >= '2013-08-20' AND start_date <= '2013-08-30'
AND o2.status = 1
)
)
SOLUTION
After a lot more research and testing the fastest way for counting the number of rows in my case was to just add one more condition, that start_date is more than current date (because the date ranges for search are always in the future)
SELECT COUNT(*) FROM myTable WHERE end_date >= '2013-09-01'
AND start_date >= '2013-08-20' AND start_date <= '2013-09-30'
also it is necessary to have one index - with start_date and end_date fields (thank you #symcbean).
As a result the execution time on table with 10m rows from 7 seconds - became 0.050 seconds.
SOLUTION 2 (#Robert Co)
partitioning in this case worked as well !! - perhaps it is better solution than indexing. Or they can both be applied together.
Thanks

This is a perfect use case for
table partitioning. If the Oracle INTERVAL feature makes it to MySQL, then it will just add to the awesomeness.

date column are indexed
What type of index? A hash based index is no use for range queries. If it's not a BTREE index then change it now. And you've not shown us *how they are indexed. Are both columns in the same index? Is there other stuff in there too? What order (end_date must appear as the first column)?
There are implicit type conversions in the script - this should be handled automatically by the optimizer, but it's worth checking....
SELECT COUNT(*) FROM myTable WHERE end_date >= 20130820000000
AND start_date <= 20130830235959
if I add a column is_future - TINYINT
First, in order to be of any use, this would require that the future dates be a small proportion of the total data stored in the table (less than 10%). And that's just to make it more efficient than a full table scan.
Secondly, it's going to require very frequent updates to the index to maintain it, which in addition to the overhead of initial populatiopn is likely to lead to fragmentation of the index and degraded performance (depending on how the iondex is constructed).
Thirdly, if this still has to process 3 million rows of data (and specifically, via an index lookup) then it's going to be very slow even with the data pegged in memory.
Further, the optimizer is never likely to use this index without being forced to (due to the low cardinality).

I have done a simple test, just created an index on the tinyint column. The structures may not be the same, but with an index it seems to work.
http://www.sqlfiddle.com/#!2/514ab/1/0
and for count
http://www.sqlfiddle.com/#!2/514ab/2/0
View execution plan there to see that the select just scans one row which means it would process only the lesser number of records in your case.
So the simple answer is yes, with an index it would work.

MySQL Performance issue with a cumulative report with 2 columns in the on clause

The following SQL runs extremely slow in MySQL. It takes well over an hour against a table of 250,000 rows (across a 3 year timeline.).
select L.order_date,
L.segname,
sum(O.product_total) as c_product_total,
sum(O.num_orders) as c_num_orders
from report_PurchasesByOrderDate_Hour_bySegment as L
join report_PurchasesByOrderDate_Hour_bySegment as O
on L.order_date >= O.order_date
and L.segname = O.segname
group by L.order_date, L.segname
;
This query generates cumulative sums each date for each segname (segment name).
I have run it through explain with indexes.
Does anyone have any thoughts on how this could be rewritten to work well on MySQL?
(This query works fine in DB2, but I have to use MySQL for this project.)
Thanks for any help!
Tadman requested I add the table definition including indexes. (Which admittedly, I should have posted initially, so here it is:
create table report_PurchasesByOrderDate_Hour_bySegment
(
order_date date not null,
hour_of_day int not null,
hourly_datetime datetime not null,
segname varchar(10),
product_total decimal(15,4),
num_orders bigint,
PRIMARY KEY (hourly_datetime, segname),
UNIQUE INDEX (order_date, hour_of_day, segname),
UNIQUE INDEX (hour_of_day, order_date, segname)
);
Note: The column hourly_datetime is actually redundant, I put it in while testing left join performance for another query.
Thanks for the feed back. hour_of_day is indeed used in a different query.
For testing purposes I have added the following indexes. (Only one of the two would be needed, but I created both for now to see which MySQL would use.)
create index test1 on report_PurchasesByOrderDate_Hour_bySegment (order_date, segname);
create index test2 on report_PurchasesByOrderDate_Hour_bySegment (segname, order_date);
Here is the explain output from explain used within MySQL Workbench:
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,SIMPLE,O,ALL,"order_date,test1,test2",NULL,NULL,NULL,253519,"Using temporary; Using filesort"
1,SIMPLE,L,ref,"order_date,test1,test2",test2,12,wc_store.O.segname,1267,"Using where; Using index"
I have run this both on my own laptop and an Amazon Managed MySQL database instance. The explain is identical for both.
On a side note as to why the hour_of_day clause is also in the pre-existing index.
There is another version of the select that aggregates by hour_of_day. It also performs badly (worse) but I posted the simpler of two, as the solution to the first one above (if there is one) can be applied to the more complex example. The other version adds "L.hour_of_day" to the select list and to the group by clause and has the following
on clause in the join:
on L.order_date >= O.order_date
and L.hour_of_day = O.hour_of_day
and L.segname = O.segname
Update
cbranch: Correct, the goal is to have a running total per date which sums up all prior dates. I changed the query to match the one you gave, which is correct to distinct the order_date and segname. However it did not improve performance. Given that MySQL sometimes has performance issues with sub-queries used in a join, I went ahead and created a temporary table for the result of the sub-query and put indexes on it. So here is the new version:
create temporary table tmp_order_segment as
select distinct order_date, segname from report_PurchasesByOrderDate_Hour_bySegment;
create unique index tmp_1 on tmp_order_segment (order_date, segname);
create unique index tmp_2 on tmp_order_segment (segname, order_date);
select L.order_date,
L.segname,
sum(O.product_total) as c_product_total,
sum(O.num_orders) as c_num_orders
from tmp_order_segment as L
join report_PurchasesByOrderDate_Hour_bySegment as O
on L.order_date >= O.order_date
and L.segname = O.segname
group by L.order_date, L.segname;
Unfortunately, this did not improve performance either. The query still runs for well over an hour. The explain output is:
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,SIMPLE,O,ALL,order_date,NULL,NULL,NULL,252264,"Using temporary; Using filesort"
1,SIMPLE,L,ref,"tmp_1,tmp_2",tmp_2,12,bsupply.O.segname,1,"Using where; Using index"
MySQL versions I have tried with this issue are: 5.5.24 and 5.5.27.
Thanks for any assistance.

Generally speaking, a greater than comparison won't use an index, but a between will.
Try this:
...
on report_PurchasesByOrderDate_Hour_bySegment as O
on L.order_date between O.order_date and now()
...
this has the same meaning, but will use an index on order_date if one exists. If it doesn't exist, create one.

You didn't show the output from EXPLAIN, so this is just a guess ...
You have two composite indexes that look like they MIGHT be usable for this query, except that both indexes include hour_of_day which is not part of you search criteria, so may disqualify those indexes. Try changing your first unique index to one of:
UNIQUE INDEX (order_date, segname, hour_of_day)
or
UNIQUE INDEX (segname, order_date, hour_of_day)
NOTE: If the existing index is required for other queries, add a new index rather than replacing the existing one.
EDIT:
Is the goal to generate a running total which sums up all prior orders? If so, I think you need to do the grouping prior to joining. Otherwise, you're joining table O to every individual row (hourly) in table L rather than one row per date, per segment. See if this makes sense:
select
L.order_date,
L.segname,
sum(O.product_total) as c_product_total,
sum(O.num_orders) as c_num_orders
from
(select distinct order_date, segname from report_PurchasesByOrderDate_Hour_bySegment) as L
join report_PurchasesByOrderDate_Hour_bySegment as O
on (L.order_date >= O.order_date and L.segname = O.segname)
group by
L.order_date,
L.segname
;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008