Msql Explain Query not the same Test vs local - mysql

I just dump the DB from test machine to local machine and run my query below:
SELECT advert_id,A.category_id,A.subcategory_id,subcategory,model_id,model,make,price,price2,gst,cndtn,currency,photo_id, SUM(S.visits) AS visits
FROM adverts A
LEFT JOIN subcategories SC ON A.subcategory_id = SC.subcategory_id
LEFT JOIN photos P ON P.sale_id = A.advert_id AND P.thumb=1 AND P.sale_type_id=1
LEFT JOIN (
SELECT
entity_id , visits
FROM
sitestats_ga
WHERE
entity_type_id=1 AND (date <= DATE_FORMAT(NOW(), "%Y%m%d") && date >= DATE_FORMAT(DATE_SUB(NOW(), INTERVAL 10 DAY), "%Y%m%d"))
) S ON A.advert_id = S.entity_id
WHERE '2015-12-02 06:44:55' >= A.datetime_added AND '2015-12-02 06:44:55' < A.datetime_removed AND A.sold_currency = '' AND (A.subcategory_id = '500' OR A.category_id = '100')
GROUP BY A.advert_id
ORDER BY visits DESC, A.datetime_added DESC
LIMIT 0,12
Unfortunately the result duration is different between TEST and LOCAL, to verify this I tried using EXPLAIN in mysql and they were different also.
Explain result on TEST
Explain result on LOCAL
Just take note that I had a fresh dump of database to my local , which means they have the same indexes and I already verified it after seeing the EXPLAIN results.
Got lucky today and run the query on PROD and the result is the same as my LOCAL, I am expecting that TEST should be the same on PROD but I think there's a discrepancy on DB , so now I want to fix the TEST environment.
How can I dig more on this issue? What do I have to do more?

What happened there is very possible because sometimes Mysql index are not exactly equal in different environment even you have the same amount of data. It is all because how you manipulate the data. For example when you delete data the index number may not refresh again for all the data, so you may still have some index pointing to the empty data. However, when you do explain , this empty index still being scanned.
You can do a test create a fresh index on one column and start insert and delete data. then re-create the same index name it different, sometimes you will see the different index number.

Related

mysql is scanning table despite index

I have the following mysql query that I think should be faster. The database table has 1 million records and the query table 3.5 seconds
set #numberofdayssinceexpiration = 1;
set #today = DATE(now());
set #start_position = (#pagenumber-1)* #pagesize;
SELECT *
FROM (SELECT ad.id,
title,
description,
startson,
expireson,
ad.appuserid UserId,
user.email UserName,
ExpiredCount.totalcount
FROM advertisement ad
LEFT JOIN (SELECT servicetypeid,
Count(*) AS TotalCount
FROM advertisement
WHERE Datediff(#today,expireson) =
#numberofdayssinceexpiration
AND sendreminderafterexpiration = 1
GROUP BY servicetypeid) AS ExpiredCount
ON ExpiredCount.servicetypeid = ad.servicetypeid
LEFT JOIN aspnetusers user
ON user.id = ad.appuserid
WHERE Datediff(#today,expireson) = #numberofdayssinceexpiration
AND sendreminderafterexpiration = 1
ORDER BY ad.id) AS expiredAds
LIMIT 20 offset 1;
Here's the execution plan:
Here are the indexes defined on the table:
I wonder what I am doing wrong.
Thanks for any help
First, I would like to point out some problems. Then I will get into your Question.
LIMIT 20 OFFSET 1 gives you 20 rows starting with the second row.
The lack of an ORDER BY in the outer query may lead to an unpredictable ordering. In particular, the Limit and Offset can pick whatever they want. New versions will actually throw away the ORDER BY in the subquery.
DATEDIFF, being a function, makes that part of the WHERE not 'sargeable'. That is it can't use an INDEX. The usual way (which is sargeable) to compare dates is (assuming expireson is of datatype DATE):
WHERE expireson >= CURDATE() - INTERVAL 1 DAY
Please qualify each column name. With that, I may be able to advise on optimal indexes.
Please provide SHOW CREATE TABLE so that we can see what column(s) are in each index.

Performance issue on query with math calculations

This my query with its performance (slow_query_log):
SELECT j.`offer_id`, o.`offer_name`, j.`success_rate`
FROM
(
SELECT
t.`offer_id`,
(
SUM(CASE WHEN `offer_id` = t.`offer_id` AND `sales_status` = 'SUCCESS' THEN 1 ELSE 0 END) / COUNT(*)
) AS `success_rate`
FROM `tblSales` AS t
WHERE DATE(t.`sales_time`) = CURDATE()
GROUP BY t.`offer_id`
ORDER BY `success_rate` DESC
) AS j
LEFT JOIN `tblOffers` AS o
ON j.`offer_id` = o.`offer_id`
LIMIT 5;
# Time: 180113 18:51:19
# User#Host: root[root] # localhost [127.0.0.1] Id: 71
# Query_time: 10.472599 Lock_time: 0.001000 Rows_sent: 0 Rows_examined: 1156134
Here, tblOffers have all the OFFERS listed. And the tblSales contains all the sales. What am trying to find out is the top selling offers, based on the success rate (ie. those sales which are SUCCESS).
The query works fine and provides the output I needed. But it appears to be that its a bit slower.
offer_id and sales_status are already indexed in the tblSales. So do you have any suggestion on improving the inner query (where it calculates the success rate) so that performance can be improved? I have been playing with the math for more than 2hrs. But couldn't get a better way.
Btw, tblSales has lots of data. It contains those sales which are SUCCESSFUL, FAILED, PENDING, etc.
Thank you
EDIT
As you requested am including the table design also(only relevant fields are included):
tblSales
`sales_id` bigint UNSIGNED NOT NULL AUTO_INCREMENT,
`offer_id` bigint UNSIGNED NOT NULL DEFAULT '0',
`sales_time` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
`sales_status` ENUM('WAITING', 'SUCCESS', 'FAILED', 'CANCELLED') NOT NULL DEFAULT 'WAITING',
PRIMARY KEY (`sales_id`),
KEY (`offer_id`),
KEY (`sales_status`)
There are some other fields also in this table, that holds some other info. Amount, user_id, etc. which are not relevant for my question.
Numerous 'problems', none of which involve "math".
JOINs make things difficult. LEFT JOIN says "I don't care whether the row exists in the 'right' table. (I suspect you don't need LEFT??) But it also says "There may be multiple rows in the right table. Based on the column names, I will guess that there is only one offer_name for each offer_id. If this is correct, then here my first recommendation. (This will convince the Optimizer that there is no issue with the JOIN.) Change from
SELECT ..., o.offer_name, ...
LEFT JOIN `tblOffers` AS o ON j.`offer_id` = o.`offer_id`
...
to
SELECT ...,
( SELECT offer_name FROM tbloffers WHERE offer_id j.offer_id
) AS offer_name, ...
It also gets rid of a bug wherein you are assuming that the inner ORDER BY will be preserved for the LIMIT. This used to be the case, but in newer versions of MariaDB / MySQL, it is not. The ORDER BY in a "derived table" (your subquery) is now ignored.
2 down, a few more to go.
"Don't hide an indexed column in a function." I am referring to DATE(t.sales_time) = CURDATE(). Assuming you have no sales_time values for the 'future', then that test can be changed to t.sales_time >= CURDATE(). If you really need to restrict to just today, then do this:
AND sales_time >= CURDATE()
AND sales_time < CURDATE() + INTERVAL 1 DAY
The ORDER BY and the LIMIT should usually be put together. In your case, you may as well add the LIMIT to the "derived table", thereby leading to only 5 rows for the outer query to work with. But... There is still the question of getting them sorted correctly. So change from
SELECT ...
FROM ( SELECT ...
ORDER BY ... )
LIMIT ...
to
SELECT ...
FROM ( SELECT ...
ORDER BY ...
LIMIT 5 ) -- trim sooner
ORDER BY ... -- deal with the loss of ordering from derived table
Rolling it all together, I have
SELECT j.`offer_id`,
( SELECT offer_name
FROM tbloffers
WHERE offer_id = j.offer_id
) AS offer_name,
j.`success_rate`
FROM
( SELECT t.`offer_id`,
AVG(t.sales_status = 'SUCCESS') AS `success_rate`
FROM `tblSales` AS t
WHERE t.sales_time >= CURDATE()
GROUP BY t.`offer_id`
ORDER BY `success_rate` DESC
LIMIT 5
) AS j
ORDER BY `success_rate` DESC;
(I took the liberty of shortening the SUM(...) in two ways.)
Now for the indexes...
tblSales needs at least (sales_time), but let's go for a "covering" (with sales_time specifically first):
INDEX(sales_time, sales_status, order_id)
If tbloffers has PRIMARY KEY(offer_id), then no further index is worth adding. Else, add this covering index (in this order):
INDEX(offer_id, offer_name)
(Apologies to other Answerers; I stole some of your ideas.)
Here, tblOffers have all the OFFERS listed. And the tblSales contains all the sales. What am trying to find out is the top selling offers, based on the success rate (ie. those sales which are SUCCESS).
Approach this with a simple JOIN and GROUP BY:
SELECT s.offer_id, o.offer_name,
AVG(s.sales_status = 'SUCCESS') as success_rate
FROM tblSales s JOIN
tblOffers o
ON o.offer_id = s.offer_id
WHERE s.sales_time >= CURDATE() AND
s.sales_time < CURDATE() + INTERVAL 1 DAY
GROUP BY s.offer_id, o.offer_name
ORDER BY success_rate DESC;
Notes:
The use of date arithmetic allows the query to make use of an index on tblSales(sales_time) -- or better yet tblSales(salesTime, offer_id, sales_status).
The arithmetic for success_rate has been simplified -- although this has minimal impact on performance.
I added offer_name to the GROUP BY. If you are learning SQL, you should always have all the unaggregated keys in the GROUP BY clause.
A LEFT JOIN is only needed if you have offers in tblSales which are not in tblOffers. I am guessing you have proper foreign key relationships defined, and this is not the case.
Based on not much information that you have provided (i mean table schema) you could try the following.
SELECT `o`.`offer_id`, `o`.`offer_name`, SUM(CASE WHEN `t`.`sales_status` = 'SUCCESS' THEN 1 ELSE 0 END) AS `success_rate`
FROM `tblOffers` `o`
INNER JOIN `tblSales` `t`
ON `o`.`offer_id` = `t`.`offer_id`
WHERE DATE(`t`.`sales_time`) = CURDATE()
GROUP BY `o`.`offer_id`
ORDER BY `success_rate` DESC
LIMIT 0,5;
You can find a sample of this query in this SQL Fiddle example
Without knowing your schema, the lowest hanging fruit I see is this part....
WHERE DATE(t.`sales_time`) = CURDATE()
Try changing that to something that looks like
Where t.sales_time >= #12-midnight-of-current-date and t.sales_time <= #23:59:59-of-current-date

MYSQL Check for record existence while fetching records

I've ran into some performance issues with my database structure "or better to say my query instead "
I have a the following table :
http://sqlfiddle.com/#!9/348cb
And following query trying to fetch certain data, and after that trying to check if there are other records matching my conditions, it's all in the following query.
it is working as expected, the only reason that I'm asking this question is that if there is a way I could increase its performance or use another way to get the results.
As you can see, There two ( SELECT )'s which trying to check if there are any other records containing current query data.
SELECT (
SELECT COUNT(*) FROM log AS LIKES
WHERE L.target_account=LIKES.target_account
AND LIKES.type='like'
) as liked,
(
SELECT COUNT(*) FROM log AS COMMENTS
WHERE L.target_account=COMMENTS.target_account
AND COMMENTS.type='follow_back'
) as follow_back,
(
SELECT COUNT(*) FROM log AS FOLLOW_BACK
WHERE L.target_account=FOLLOW_BACK.target_account
AND COMMENTS.type='follow_back'
) as follow_back,
L.*
FROM `log` as L
WHERE `L`.`information` = '".$target_name."'
AND `L`.`account_id` = '".$id."'
AND `L`.`date_ts` BETWEEN CURDATE() - INTERVAL ".$limit." DAY AND CURDATE()
This query takes too much time to fetch the data.
Thanks in advance.
You may be able to rewrite the query, depending on the relationship between target account and account id.
In the meantime, you want indexes. The two you want are instagram_log(target_account, type) and instagram_log(account_id, information, date_ts):
create index idx_instagram_log_1 on instagram_log(target_account, type);
create index idx_instagram_log_2 on instagram_log(account_id, information, date_ts);
SELECT SUM(LIKES) LIKES,SUM(FOLLOW_BACK) FOLLOW_BACK,SUM(COMMENTS) FROM
(
SELECT
CASE WHEN L.type='like' THEN 1 ELSE 0 END LIKES,
CASE WHEN L.type='follow_back' THEN 1 ELSE 0 END FOLLOW_BACK,
CASE WHEN L.type='comments' THEN 1 ELSE 0 END COMMENTS
FROM `log` as L
WHERE `L`.`information` = '".$target_name."'
AND `L`.`account_id` = '".$id."'
AND `L`.`date_ts` BETWEEN CURDATE() - INTERVAL ".$limit." DAY AND CURDATE()
)Z
Try the above query.

insert a table record into another table

I want to insert a table record into another table. I am selecting user id ,date and variance. When i insert the data of one user it works fine but when i insert multiple records it gives me an error of SQL Error [1292] [22001]: Data truncation: Truncated incorrect time value: '841:52:24.000000'.
insert into
features.Daily_variance_of_time_between_calls(
uId,
date,
varianceBetweenCalls)
SELECT
table_test.uid as uId,
SUBSTRING(table_test.date, 1, 10) as date ,
VARIANCE(table_test.DurationSinceLastCall) as varianceBetweenCalls #calculating the vairiance of inter-event call time
FROM
(SELECT
id,m.uid, m.date,
TIME_TO_SEC(
timediff(m.date,
COALESCE(
(SELECT p.date FROM creditfix.call_logs AS p
WHERE
p.uid = m.uid
AND
p.`type` in (1,2)
AND
(p.id < m.id AND p.date < m.date )
ORDER BY m.date DESC, p.duration
DESC LIMIT 1 ), m.date))
) AS DurationSinceLastCall,
COUNT(1)
FROM
(select distinct id, duration, date,uid from creditfix.call_logs as cl ) AS m
WHERE
m.uId is not NULL
AND
m.duration > 0
# AND
# m.uId=171
GROUP BY 1,2
) table_test
GROUP BY 1,2
If i remove the comment it works fine for one specific user.
Let's start with the error message:
Data truncation: Truncated incorrect time value: '841:52:24.000000'
This message suggests that at some stage MySQL is running into a value which it cannot convert to a date/time/datetime. Efforts in isolating the issue should therefore begin with a focus on where values are being converted to those data types.
Without knowing the data types of all the fields used, it's difficult to say where the problem is likely to be. However, once we knew that the query on it's own ran without complaint, we also then knew that the problem had to be with a conversion happening during the insert itself. Something in the selected data wasn't a valid date, but was being inserted into a date field. Although dates and times and involved in your calculation of varianceBetweenCalls, variance itself returns a numeric data type. Therefore I deduced the problem had to be with the data returned by SUBSTRING(table_test.date, 1, 10) which was being inserted into the date field.
As per the comments, this turned out to be correct. You can exclude the bad data and allow the insert to work by adding the clause:
WHERE
table_test.date NOT LIKE '841%'
AND table_test.DurationSinceLastCall NOT LIKE '841%' -- I actually think this line is not required.
Alternatively, you can retrieve only the bad data (with a view to fixing it), by removing the INSERT and using the clause
WHERE
table_test.date LIKE '841%'
OR table_test.DurationSinceLastCall LIKE '841%' -- I actually think this line is not required.
or better
SELECT *
FROM creditfix.call_logs m
WHERE m.date LIKE '841%'
However, I'm not sure the data type of that field, so you may need to to it like this:
SELECT *
FROM creditfix.call_logs m
WHERE SUBSTRING(m.date,10) LIKE '841%'
Once you correct the offending data, you should be able to remove the "fix" from your INSERT/SELECT statement, though it would be wise to investigate how the bad data got into the system.

SQL - max of a column corresponding to two columns

A table contains four columns "server", "directory", "usage" and "datetime". All servers has got 10 dirs in common. I need to get the data for a server and it's any dir the usage for the latest datetime in a day.
Say for example if there is a server A with directory B there will be Usage at multiple time for few days. I need the data to be reported by the query for all servers it's all corresponding directory's usage for the latest entry on each day.
If I understood your question correctly, you want to see the last usage for every server and directory. Given a table named "usagestats" with the given columns that would be:
SELECT a.server, a.directory, a.`usage`, a.datetime
FROM usagestats as a INNER JOIN (
SELECT server, directory, max(datetime) datetime
FROM usagestats
GROUP BY server, directory
) AS b ON (
a.server = b.server
and a.directory = b.directory
and a.datetime = b.datetime
)
ORDER BY a.server, a.directory, a.datetime
Im not sure that i understand correctly your question.
MySQL has IF() function, that returns one of statements according to condition in first parameter.
If you want to select data from column where number is bigger and there are only 2 columns - use IF statement like this:
SELECT
if(columnA > columnB, columnA , columnB) as grtColumn,
if(columnA > columnB,'columnA is bigger', 'columnB is bigger') as whichColumnWasGrt
from yourtable;
Help: mysql reference - if() function (there are IF statement and IF() function - diffrent thigs!