Slow SQL Query, how to improve this query speed? - mysql

I have a table (call_history) with a list of phone calls report, caller_id is the caller and start_date (DATETIME) is the call date. I need to make a report that will show how many people called for the first time for every day. For example:
2013-01-01 - 100
2013-01-02 - 80
2013-01-03 - 90
I have this query that does it perfectly, but it is very slow. There are indexes on both start_date and caller_id columns; is there an alternative way to get this information to speed the process up?
Here is the query:
SELECT SUBSTR(c1.start_date,1,10), COUNT(DISTINCT caller_id)
FROM call_history c1
WHERE NOT EXISTS
(SELECT id
FROM call_history c2
WHERE SUBSTR(c2.start_date,1,10) < SUBSTR(c1.start_date,1,10)
AND c2.caller_id=c1.caller_id)
GROUP BY SUBSTR(start_date,1,10)
ORDER BY SUBSTR(start_date,1,10) desc

The following "WHERE SUBSTR(c2.start_date,1,10)" is breaking your index (you shouldn't perform functions on the left hand side of a where clause)
Try the following instead:
SELECT DATE(c1.start_date), COUNT(caller_id)
FROM call_history c1
LEFT OUTER JOIN call_history c2 on c1.caller_id = c2.caller_id and c2.start_date < c1.start_date
where c2.id is null
GROUP BY DATE(start_date)
ORDER BY start_date desc
Also re-reading your problem, I think this is another way of writing without using NOT EXISTS
SELECT DATE(c1.start_date), COUNT(DISTINCT c1.caller_id)
FROM call_history c1
where start_date =
(select min(start_date) from call_history c2 where c2.caller_id = c1.caller_id)
GROUP BY DATE(start_date)
ORDER BY c1.start_date desc;

You are doing a weird thing - using functions in WHERE, GROUP and ORDER clauses. MySQL will never use indexes when function was applied to calculate condition. So, you can not do anything with this query, but to improve your situation, you should alter your table structure and store your date as DATE column (and single column). Then create index by this column - after this you'll get much better results.

Try to replace the NOT EXISTS with a left outer join.

OK here is the ideal solution,
speed is now 0.01
SELECT first_call_date, COUNT(caller_id) AS caller_count
FROM (
SELECT caller_id, DATE(MIN(start_date)) AS first_call_date
FROM call_history
GROUP BY caller_id
) AS ch
GROUP BY first_call_date
ORDER BY first_call_date DESC

Related

Mysql Query where max(time) less than today

I have two tables, the first table ( job ) stores the data and the second table ( job_locations ) stores the locations for each job, I'm trying to show the number of jobs that job locations are less than today
I use the DateTime for the Date Column
unfortunately, the numbers that appear after test the next code are wrong
My code
SELECT *
FROM `job`
left join job_location
on job_location.job_id = job.id
where job_location.cutoff_time < CURDATE()
group by job.id
Please help me to write the working Query.
I think you need to rephrase your query slightly. Select a count of jobs where the cutoff time is earlier than the start of today.
SELECT
j.id,
COUNT(CASE WHEN jl.cutoff_time < CURDATE() THEN 1 END) AS cnt
FROM job j
LEFT JOIN job_location jl;
ON j.id = jl.job_id
GROUP BY
j.id;
Note that the left join is important here because it means that we won't drop any jobs having no matching criteria. Instead, those jobs would still appear in the result set, just with a zero count.
As a note, you can simplify the count (in MySQL). And, assuming that all jobs have at least one location, you don't need a JOIN at all. So:
SELECT jl.job_id, sum( jl.cutoff_time < CURDATE() )
FROM job_location jl
GROUP BY jl.job_id;
If this is not correct (and you need the JOIN), then the condition on the date should go in the ON clause:
SELECT jl.job_id, COUNT(jo.job_id)
FROM job LEFT JOIN
job_location jl
ON jl.job_id = j.id AND jl.cutoff_time < CURDATE()
GROUP BY jl.job_id;

How can I speed up a multiple inner join query?

I have two tables. The first table (users) is a simple "id, username" with 100,00 rows and the second (stats) is "id, date, stat" with 20M rows.
I'm trying to figure out which username went up by the most in stat and here's the query I have. On a powerful machine, this query takes minutes to complete. Is there a better way to write it to speed it up?
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN stats AS b ON (b.id=a.id)
INNER JOIN stats AS c ON (c.id=a.id)
WHERE b.date = '2016-01-10'
AND c.date = '2016-01-13'
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
the other way i tried but it doesn't seem optimal is
SELECT a.id, a.username,
(SELECT b.stat FROM stats AS b ON (b.id=a.id) AND b.date = '2016-01-10') AS start,
(SELECT c.stat FROM stats AS c ON (c.id=a.id) AND c.date = '2016-01-14') AS end,
((SELECT b.stat FROM stats AS b ON (b.id=a.id) AND b.date = '2016-01-10') -
(SELECT c.stat FROM stats AS c ON (c.id=a.id) AND c.date = '2016-01-14')) AS stat_diff
FROM users AS a
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
Introduction
Let's suppose we rewrite sentence like this:
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN stats AS b ON
b.date = STR_TO_DATE('2016-01-10', '%Y-%m-%d' ) and b.id=a.id
INNER JOIN stats AS c ON
c.date = STR_TO_DATE('2016-01-13', '%Y-%m-%d' ) and c.id=a.id
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
And we ensure than:
users table has index on field id:
stats has index on composite field date, id: create index stats_idx_d_i on stats ( date, id );
Then
Database optimizer may use indexes to selected a Restricted Set of Date ('RSD'), that means, rows that match filtered dates. This is fast.
But
You are sorting by a calculated field:
(b.stat - c.stat) AS stat_diff #<-- calculated
ORDER BY stat_diff DESC #<-- this forces to calculate it
They are no possible optimization on this sort because you should to calculate one by one all results on your 'RSD' (restricted set of data).
Conclusion
The question is, how may rows they are on your 'RSD'? If only they are few hundreds rows you query may run fast, else, your query will be slow.
Any case, you should to be sure the first step of query ( without sorting ) is made by index and no fullscanning. Use Explain command to be sure.
All you need to do is to help optimizer.At a bare minimum.have a check list which looks like below
1.Are my join columns indexed ?
2.Are the where clauses Sargable
3.are there any implicit,explicit conversions
4.Am i seeing any statistics issues
one more interesting aspect to look at is how is your data distributed,once you understand the data,you will be able to intrepret the execution plan and alter it as per your need
EX:
Think like i have any customers table with 100rows,Each one has a minimum of 10 orders(total upto 10000 orders).Now if you need to find out only top 3 orders by date,you dont want a scan happening of orders table
Now in your case ,i may not go with second option,even though the optimizer may choose a good plan for this one as well,I will go first approach and try to see if the execution time is acceptable.if not then i will go through my check list and try to tune it further
The Query Seems OK, Verify your Indexes ..
Or
Try this Query
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN (select id,stat from stats where date = '2016-01-10') AS b ON (b.id=a.id)
INNER JOIN (select id,stat from stats where date = '2016-01-13') AS c ON (c.id=a.id)
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100

MySQL Query Optimization using multiple joins

I'm having trouble optimizing a query and could use some help. I'm currently pulling in events in a system that has to join several other tables to make sure the event is supposed to display, etc... The query was running smoothly (around 480ms) until I introduced another table in the mix. The query is as follows:
SELECT
keyword_terms,
`esf`.*,
`venue`.`name` AS venue_name,
...
`venue`.`zip`, ase.region_id,
(DATE(NOW()) BETWEEN...AND ase.region_id IS NULL) as featured,
getDistance(`venue`.`lat`, `venue`.`lng`, 36.073, -79.7903) as distance,
`network_exclusion`.`id` as net_exc_id
FROM (`event_search_flat` esf)
# Problematic part of query (pulling in the very next date for the event)
LEFT JOIN (
SELECT event_id, MIN(TIMESTAMP(CONCAT(event_date.date, ' ', event_date.end_time))) AS next_date FROM event_date WHERE
event_date.date >= CURDATE() OR (event_date.date = CURDATE() AND TIME(event_date.end_time) >= TIME(NOW()))
GROUP BY event_id
) edate ON edate.event_id=esf.object_id
# Pull in associated ad space
LEFT JOIN `ad_space` ads ON `ads`.`data_type`=`esf`.`data_type` AND ads.object_id=esf.object_id
# and make sure it is featured within region
LEFT JOIN `ad_space_exclusion` ase ON ase.ad_space_id=ads.id AND region_id =5
# Get venue details
LEFT JOIN `venue` ON `esf`.`venue_id`=`venue`.`id`
# Make sure this event should be listed
LEFT JOIN `network_exclusion` ON network_exclusion.data_type=esf.data_type
AND network_exclusion.object_id=esf.object_id
AND network_exclusion.region_id=5
WHERE `esf`.`event_type` IN ('things to do')
AND (`edate`.`next_date` >= '2013-07-18 16:23:53')
GROUP BY `esf`.`esf_id`
HAVING `net_exc_id` IS NULL
AND `distance` <= 40
ORDER BY DATE(edate.next_date) asc,
`distance` asc
LIMIT 6
It seems that the issue lies with the event_date table, but I'm unsure how to optimize this query (I tried various views, indexes, etc... to no avail). I ran EXPLAIN and received the following: http://cl.ly/image/3r3u1o0n2A46 .
At the moment, the query is taking 6.6 seconds. Any help would be greatly appreciated.
You may be able to get Using index on the event_date subquery by creating a compound index over (event_id, date, end_time). That may turn the subquery into an index-only query, which should speed it up slightly.
The subquery might be better written as the following, without GROUP BY:
SELECT event_id, TIMESTAMP(CONCAT(event_date.date, ' ', event_date.end_time))) AS next_date
FROM event_date
WHERE event_date.date >= CURDATE()
OR (event_date.date = CURDATE() AND TIME(event_date.end_time) >= TIME(NOW()))
ORDER BY next_date LIMIT 1
I'm more concerned that your EXPLAIN shows so many tables with type=ALL. That means it has to read every row from those tables and compare to them rows in other tables. You can get an idea of how much work it's doing by multiplying the values in the rows column. Basically, it's making billions of row comparisons to resolve the joins. As the tables grow, this query will get a lot worse.
Using LEFT [OUTER] JOIN has a specific purpose, and if you really mean to use INNER JOIN you should do that, because using an outer join where it doesn't belong can mess up the optimization. Use an outer join like A LEFT JOIN B only if you want rows in A that may not have matching rows in B.
For example, I assume based on column naming convention that LEFT JOIN venue ON esf.venue_id=venue.id should be an inner join, because there should always be a venue referenced by esf.venue_id (unless esf.venue_id is sometimes null).
event_search_flat should have a compound index with columns used in the WHERE clause first, then columns to join to other tables: (event_type, object_id, data_type, event_id)
ad_space should have a compound index for the join: (data_type, object_id). Does this need to be an inner join too?
ad_space_exclusion should have a compound index for the join: (ad_space_id, region_id)
network_exclusion should have a compound index for the join: (data_type, object_id, region_id)
venue is okay because it's doing a primary key lookup already.

count total records after groupBy select

I have a mysql select query that has a groupBy.
I want to count all the records after the group by statement.
Is there a way for this directly from mysql ?
thanks.
If the only thing you need is the count after grouping, and you don't want to use 2 separate queries to find the answer. You can do it with a sub query like so...
select count(*) as `count`
from (
select 0 as `doesn't matter`
from `your_table` yt
group by yt.groupfield
) sq
Note: You have to actually select something in the sub query, but what you select doesn't matter
Note: All temporary tables have to have a named alias, hence the "sq" at the end
You can use FOUND_ROWS():
SELECT <your_complicated_query>;
SELECT FOUND_ROWS();
It's really intended for use with LIMIT, telling you how many rows would have been returned without the LIMIT, but it seems to work just fine for queries that don't use LIMIT.
see this query for examples:
This query is used to find the available rooms record for a hotel, just check this
SELECT a.type_id, a.type_name, a.no_of_rooms,
(SELECT SUM(booked_rooms) FROM reservation
WHERE room_type = a.type_id
AND start_date >= '2010-04-12'
AND end_date <= '2010-04-15') AS booked_rooms,
(a.no_of_rooms - (SELECT SUM(booked_rooms)
FROM reservation
WHERE room_type = a.type_id
AND start_date >= '2010-04-12'
AND end_date <= '2010-04-15')) AS freerooms
FROM room_type AS a
LEFT JOIN reservation AS b
ON a.type_id = b.room_type
GROUP BY a.type_id ORDER BY a.type_id

Mysql Groupby and Orderby problem

Here is my data structure
alt text http://luvboy.co.cc/images/db.JPG
when i try this sql
select rec_id, customer_id, dc_number, balance
from payments
where customer_id='IHS050018'
group by dc_number
order by rec_id desc;
something is wrong somewhere, idk
I need
rec_id customer_id dc_number balance
2 IHS050018 DC3 -1
3 IHS050018 52 600
I want the recent balance of the customer with respective to dc_number ?
Thanx
There are essentially two ways to get this
select p.rec_id, p.customer_id, p.dc_number, p.balance
from payments p
where p.rec_id IN (
select s.rec_id
from payments s
where s.customer_id='IHS050018' and s.dc_number = p.dc_number
order by s.rec_id desc
limit 1);
Also if you want to get the last balance for each customer you might do
select p.rec_id, p.customer_id, p.dc_number, p.balance
from payments p
where p.rec_id IN (
select s.rec_id
from payments s
where s.customer_id=p.customer_id and s.dc_number = p.dc_number
order by s.rec_id desc
limit 1);
What I consider essentially another way is utilizing the fact that select rec_id with order by desc and limit 1 is equivalent to select max(rec_id) with appropriate group by, in full:
select p.rec_id, p.customer_id, p.dc_number, p.balance
from payments p
where p.rec_id IN (
select max(s.rec_id)
from payments s
group by s.customer_id, s.dc_number
);
This should be faster (if you want the last balance for every customer), since max is normally less expensive then sort (with indexes it might be the same).
Also when written like this the subquery is not correlated (it need not be run for every row of the outer query) which means it will be run only once and the whole query can be rewritten as a join.
Also notice that it might be beneficial to write it as correlated query (by adding where s.customer_id = p.customer_id and s.dc_number = p.dc_number in inner query) depending on the selectivity of the outer query.
This might improve performance, if you look for the last balance of only one or few rows.
I don't think there is a good way to do this in SQL without having window functions (like those in Postgres 8.4). You probably have to iterate over the dataset in your code and get the recent balances that way.
ORDER comes before GROUP:
select rec_id, customer_id, dc_number, balance
from payments
where customer_id='IHS050018'
order by rec_id desc
group by dc_number