count total records after groupBy select - mysql

I have a mysql select query that has a groupBy.
I want to count all the records after the group by statement.
Is there a way for this directly from mysql ?
thanks.

If the only thing you need is the count after grouping, and you don't want to use 2 separate queries to find the answer. You can do it with a sub query like so...
select count(*) as `count`
from (
select 0 as `doesn't matter`
from `your_table` yt
group by yt.groupfield
) sq
Note: You have to actually select something in the sub query, but what you select doesn't matter
Note: All temporary tables have to have a named alias, hence the "sq" at the end

You can use FOUND_ROWS():
SELECT <your_complicated_query>;
SELECT FOUND_ROWS();
It's really intended for use with LIMIT, telling you how many rows would have been returned without the LIMIT, but it seems to work just fine for queries that don't use LIMIT.

see this query for examples:
This query is used to find the available rooms record for a hotel, just check this
SELECT a.type_id, a.type_name, a.no_of_rooms,
(SELECT SUM(booked_rooms) FROM reservation
WHERE room_type = a.type_id
AND start_date >= '2010-04-12'
AND end_date <= '2010-04-15') AS booked_rooms,
(a.no_of_rooms - (SELECT SUM(booked_rooms)
FROM reservation
WHERE room_type = a.type_id
AND start_date >= '2010-04-12'
AND end_date <= '2010-04-15')) AS freerooms
FROM room_type AS a
LEFT JOIN reservation AS b
ON a.type_id = b.room_type
GROUP BY a.type_id ORDER BY a.type_id

Related

Is there any faster way to perform group by row counts using two table in mysql?

I am trying to find the total number of outlets(nation_id wise) that exist in the orders table in a given date range.
select o2.nation_id,count(o2.id) as outlet_count
from outlets o2
where o2.id in (
SELECT distinct o.outlet_id
from orders o
where order_date >= '2022-05-01 00:00:00'
and order_date <= '2022-06-30 23:59:59'
)
group by o2.nation_id ```
Now, this query gives the exact result but it takes around 3 seconds. Is there any way to perform this query faster? Probably less than 1 second.
N.B.: Outlets table contains around 25k data and orders table contains around 1.2M rows.
Avoiding IN will boost the performance and creating indices for nation_id & id columns in outlets table and outlet_id column of orders table will definitely improve the speed.
select o2.nation_id, count(Ord.outlet_id) as outlet_count
from outlets o2
LEFT JOIN
(
SELECT distinct o.outlet_id
from orders o
where order_date >= '2022-05-01 00:00:00'
and order_date <= '2022-06-30 23:59:59'
) Ord ON o2.id = Ord.outlet_id
group by o2.nation_id
Give this a try. It's something tricky that I just invented and tested (on different tables):
SELECT ou.nation_id, COUNT(*) AS outlet_count
FROM ( SELECT o.outlet_id, MIN(order_date)
FROM orders AS o
WHERE o.order_date >= '2022-05-01'
AND o.order_date < '2022-05-01' + INTERVAL 1 MONTH
GROUP BY o.outlet_id
) AS olist
JOIN outlets AS ou ON ou.id = olist.outlet_id
GROUP BY ou.nation_id
The GROUP BY together with the MIN (or MAX, etc) is a trick to get the Optimizer to hop through the index, touching only one row per outlet_id. Leaving out the MIN or changing to DISTINCT failed to involve the optimization. (Caveat: Different versions of MySQL/MariaDB may work differently here.)
orders: INDEX(outlet_id, order_date) -- required for the trick
If there are any issues, please provide SHOW CREATE TABLE and EXPLAIN SELECT ...

How to use SQL to count events in the first week

I'm trying to write a SQL query, which says how many logins each user made in their first week.
Assume, for the purpose of this question, that I have a table with at least user_id and login_date. I'm trying to produce an output table with user_id and num_logins_first_week
Use aggregation to get the first date for each user. Then join in the logins and aggregate:
select t.user_id, count(*) as num_logins_first_week
from t join
(select user_id, min(login_date) as first_login_date
from t
group by user_id
) tt
on tt.user_id = t.user_id and
t.login_date >= tt.first_login_date and
t.login_date < tt.first_login_date + interval 7 day
group by t.user_id;

Mysql Query where max(time) less than today

I have two tables, the first table ( job ) stores the data and the second table ( job_locations ) stores the locations for each job, I'm trying to show the number of jobs that job locations are less than today
I use the DateTime for the Date Column
unfortunately, the numbers that appear after test the next code are wrong
My code
SELECT *
FROM `job`
left join job_location
on job_location.job_id = job.id
where job_location.cutoff_time < CURDATE()
group by job.id
Please help me to write the working Query.
I think you need to rephrase your query slightly. Select a count of jobs where the cutoff time is earlier than the start of today.
SELECT
j.id,
COUNT(CASE WHEN jl.cutoff_time < CURDATE() THEN 1 END) AS cnt
FROM job j
LEFT JOIN job_location jl;
ON j.id = jl.job_id
GROUP BY
j.id;
Note that the left join is important here because it means that we won't drop any jobs having no matching criteria. Instead, those jobs would still appear in the result set, just with a zero count.
As a note, you can simplify the count (in MySQL). And, assuming that all jobs have at least one location, you don't need a JOIN at all. So:
SELECT jl.job_id, sum( jl.cutoff_time < CURDATE() )
FROM job_location jl
GROUP BY jl.job_id;
If this is not correct (and you need the JOIN), then the condition on the date should go in the ON clause:
SELECT jl.job_id, COUNT(jo.job_id)
FROM job LEFT JOIN
job_location jl
ON jl.job_id = j.id AND jl.cutoff_time < CURDATE()
GROUP BY jl.job_id;

How can I speed up a multiple inner join query?

I have two tables. The first table (users) is a simple "id, username" with 100,00 rows and the second (stats) is "id, date, stat" with 20M rows.
I'm trying to figure out which username went up by the most in stat and here's the query I have. On a powerful machine, this query takes minutes to complete. Is there a better way to write it to speed it up?
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN stats AS b ON (b.id=a.id)
INNER JOIN stats AS c ON (c.id=a.id)
WHERE b.date = '2016-01-10'
AND c.date = '2016-01-13'
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
the other way i tried but it doesn't seem optimal is
SELECT a.id, a.username,
(SELECT b.stat FROM stats AS b ON (b.id=a.id) AND b.date = '2016-01-10') AS start,
(SELECT c.stat FROM stats AS c ON (c.id=a.id) AND c.date = '2016-01-14') AS end,
((SELECT b.stat FROM stats AS b ON (b.id=a.id) AND b.date = '2016-01-10') -
(SELECT c.stat FROM stats AS c ON (c.id=a.id) AND c.date = '2016-01-14')) AS stat_diff
FROM users AS a
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
Introduction
Let's suppose we rewrite sentence like this:
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN stats AS b ON
b.date = STR_TO_DATE('2016-01-10', '%Y-%m-%d' ) and b.id=a.id
INNER JOIN stats AS c ON
c.date = STR_TO_DATE('2016-01-13', '%Y-%m-%d' ) and c.id=a.id
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
And we ensure than:
users table has index on field id:
stats has index on composite field date, id: create index stats_idx_d_i on stats ( date, id );
Then
Database optimizer may use indexes to selected a Restricted Set of Date ('RSD'), that means, rows that match filtered dates. This is fast.
But
You are sorting by a calculated field:
(b.stat - c.stat) AS stat_diff #<-- calculated
ORDER BY stat_diff DESC #<-- this forces to calculate it
They are no possible optimization on this sort because you should to calculate one by one all results on your 'RSD' (restricted set of data).
Conclusion
The question is, how may rows they are on your 'RSD'? If only they are few hundreds rows you query may run fast, else, your query will be slow.
Any case, you should to be sure the first step of query ( without sorting ) is made by index and no fullscanning. Use Explain command to be sure.
All you need to do is to help optimizer.At a bare minimum.have a check list which looks like below
1.Are my join columns indexed ?
2.Are the where clauses Sargable
3.are there any implicit,explicit conversions
4.Am i seeing any statistics issues
one more interesting aspect to look at is how is your data distributed,once you understand the data,you will be able to intrepret the execution plan and alter it as per your need
EX:
Think like i have any customers table with 100rows,Each one has a minimum of 10 orders(total upto 10000 orders).Now if you need to find out only top 3 orders by date,you dont want a scan happening of orders table
Now in your case ,i may not go with second option,even though the optimizer may choose a good plan for this one as well,I will go first approach and try to see if the execution time is acceptable.if not then i will go through my check list and try to tune it further
The Query Seems OK, Verify your Indexes ..
Or
Try this Query
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN (select id,stat from stats where date = '2016-01-10') AS b ON (b.id=a.id)
INNER JOIN (select id,stat from stats where date = '2016-01-13') AS c ON (c.id=a.id)
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100

Slow SQL Query, how to improve this query speed?

I have a table (call_history) with a list of phone calls report, caller_id is the caller and start_date (DATETIME) is the call date. I need to make a report that will show how many people called for the first time for every day. For example:
2013-01-01 - 100
2013-01-02 - 80
2013-01-03 - 90
I have this query that does it perfectly, but it is very slow. There are indexes on both start_date and caller_id columns; is there an alternative way to get this information to speed the process up?
Here is the query:
SELECT SUBSTR(c1.start_date,1,10), COUNT(DISTINCT caller_id)
FROM call_history c1
WHERE NOT EXISTS
(SELECT id
FROM call_history c2
WHERE SUBSTR(c2.start_date,1,10) < SUBSTR(c1.start_date,1,10)
AND c2.caller_id=c1.caller_id)
GROUP BY SUBSTR(start_date,1,10)
ORDER BY SUBSTR(start_date,1,10) desc
The following "WHERE SUBSTR(c2.start_date,1,10)" is breaking your index (you shouldn't perform functions on the left hand side of a where clause)
Try the following instead:
SELECT DATE(c1.start_date), COUNT(caller_id)
FROM call_history c1
LEFT OUTER JOIN call_history c2 on c1.caller_id = c2.caller_id and c2.start_date < c1.start_date
where c2.id is null
GROUP BY DATE(start_date)
ORDER BY start_date desc
Also re-reading your problem, I think this is another way of writing without using NOT EXISTS
SELECT DATE(c1.start_date), COUNT(DISTINCT c1.caller_id)
FROM call_history c1
where start_date =
(select min(start_date) from call_history c2 where c2.caller_id = c1.caller_id)
GROUP BY DATE(start_date)
ORDER BY c1.start_date desc;
You are doing a weird thing - using functions in WHERE, GROUP and ORDER clauses. MySQL will never use indexes when function was applied to calculate condition. So, you can not do anything with this query, but to improve your situation, you should alter your table structure and store your date as DATE column (and single column). Then create index by this column - after this you'll get much better results.
Try to replace the NOT EXISTS with a left outer join.
OK here is the ideal solution,
speed is now 0.01
SELECT first_call_date, COUNT(caller_id) AS caller_count
FROM (
SELECT caller_id, DATE(MIN(start_date)) AS first_call_date
FROM call_history
GROUP BY caller_id
) AS ch
GROUP BY first_call_date
ORDER BY first_call_date DESC