Query is taking too long even with 1k results - mysql

I have made several tests to optimize the query below but none of them helped.
What I tried is;
Add extra indexes
Change query logic by checking other attributes aswell in IN clause
Tested suggestions of online query optimization tools (eversql etc)
Indexes I am using;
radacct (`_accttime`);
radacct (`username`);
radacct (`acctstoptime`,`_accttime`);
Complete Query;
(SELECT *
FROM `radacct`
WHERE (radacct._accttime > NOW() - INTERVAL 1.2 HOUR)
AND radacct.acctstoptime IN
(SELECT MAX(radacct.acctstoptime)
FROM `radacct`
GROUP BY radacct.username) )
UNION
(SELECT *
FROM `radacct`
WHERE (radacct._accttime >= DATE_SUB(NOW(), INTERVAL 2 MONTH)
AND radacct.acctstoptime IS NULL) )
When I execute SELECT statements above by themselves, they only take about few miliseconds.
I have issue with IN clause. So this is the query that takes ages

As I see it, your problem is the dependent subquery in your IN. Apparently the optimizer doesn't get that the subquery technically doesn't change much. (also, the query might be suboptimal). Essentially, the subquery is executed for each row (which is bad).
Now, we have to find out, which part triggers it to be a dependent, because it isn't really. My first try would be to give it a different alias:
IN (SELECT MAX(inner.acctstoptime) FROM radacct AS `inner` GROUP BY inner.username)
If that isn't enough to make it independent, make it a full-blown join (INNER, such that non-joined rows [= non-max rows] are discarded from the result):
INNER JOIN (
SELECT MAX(inner.accstoptime) as maxstoptime, inner.username
FROM `radacct` AS `inner`
GROUP BY inner.username
) sub ON (sub.maxstoptime=radacct.acctstoptime)
Hope that does the trick.
since your result has rows of users with their max acctstoptimes, it might - on rare occasions - contain more than one row for a user, when there is a row with a acctstoptime, which isn't the max for THAT user but it matches the max of another user. In the join part, you can just add another condition in the ON-clause. In the IN subquery, you would drop the explicit group by and add WHERE radacct.username=inner.username. (which would indeed make it an explicit dependent subquery, but the optimizer might be able to handle it)
update: due to miscommunication ...
The resulting complete query with the join:
(SELECT DISTINCT radacct.*
FROM radacct
INNER JOIN (
SELECT MAX(inner.accstoptime) as maxstoptime, inner.username
FROM `radacct` AS `inner`
GROUP BY inner.username
) sub ON (sub.maxstoptime=radacct.acctstoptime)
WHERE (_accttime > NOW() - INTERVAL 1.2 HOUR)
)
UNION
(SELECT *
FROM `radacct`
WHERE (_accttime >= DATE_SUB(NOW(),INTERVAL 2 MONTH)
AND acctstoptime IS NULL)
)
you may still add the username comparison in the ON clause.
What this query does is, it removes the "IN" selector and force a intermediate result for the join (for each username the max acctstoptime). the join will then join the normal rows to an intermediate result row, if and only if the acctstoptime is the max for some user (or THAT user, if you add the username comparison). If it doesn't have the max acctstoptime and thus no join "partner", it will be discarded from the result (caused by the INNER, the LEFT JOIN was somewhat insufficient), thus leaving only the rows with a max acctstoptime (in the first part of the union).

Related

MariaDB select from inner joined query

I am not able to further select from a joined subquery.
I have data in three tables: "events", "records" and "work_list". Each table has one piece of the puzzle where work_list is the shortest and contains top-level data, and the events table tracks many tiny frequent events.
I need to calculate many statistical variables from the events based on some key variables defined in work_list like weighted moving average etc. I have those metrics ready and working, but I have problems filtering the data in events based on selected parameters stored in work_list.
Here is code that does not work. The SELECT * is not important, I will change it to be more meaningful later, it is for clarity. However, I have tried many selections in place of the * without success.
What is wrong with this query from subquery?
Query example 1:
SELECT * FROM
(SELECT events.id, events.type,events.timestamp, work_list.task
FROM
( events
INNER JOIN records ON events.record_id = records.id
INNER JOIN work_list ON records.work_list_id = work_list.id
)
WHERE work_list.customer_number = '1234' AS subquery
);
#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'as subquery ) LIMIT 0, 25' at line 8
The inner joined subquery works and it returns a normal table.
Query example 2:
SELECT events.id, events.type,events.timestamp, work_list.task
FROM (
events
INNER JOIN records ON events.record_id = records.id
INNER JOIN work_list ON records.work_list_id = work_list.id
)
WHERE work_list.customer_number = '1234';
I tried using parenthesis in different orders, and I changed selected variables in SELECT events.id, events.type,events.timestamp, work_list.task. I wonder if this is a poor way of doing this. I have the calculation part. So even if there might be better structures for this, I am interested in solutions that maintain this structure.
The goal of this phase is to filter the events table for further queries that are coded on top of it replacing the SELECT *.
These are the final calculations made earlier which I plan to use when I figure out the problem with Query example 1.
Query example 3:
SELECT *, ((SUM(rate * diff) OVER (ORDER BY startTime
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW)) /
(SUM(diff) OVER(ORDER BY startTime
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW))) as rate_WMA
FROM (
SELECT id, startTime, counts, diff, (counts / diff)*3600 as rate
FROM (
SELECT id, TIMESTAMPDIFF(SECOND, MIN(timestamp), MAX(timestamp))AS diff, SUM(change) as counts, MIN(timestamp) as startTime
FROM `the filered subquery here`
GROUP BY id
) AS subquery
WHERE diff > 0
) AS totaltotal;
You have extra parenthesis (no need for those) and the alias for the subquery should be placed after the subquery:
SELECT *
FROM (
SELECT events.id, events.type,events.timestamp, work_list.task
FROM events
INNER JOIN records ON events.record_id = records.id
INNER JOIN work_list ON records.work_list_id = work_list.id
WHERE work_list.customer_number = '1234'
) AS subquery;

mySql Query running super slow

I have seen a few post's regarding slow queries but none had the answer I'm hoping for.
I've been staring at this query for ages and for some reason cant see whats making this so damn slow dates such as 2022-01-01 > 2022-12-21 even taking 80 seconds....
So here is the query
SELECT
accounts.first_name,
accounts.last_name,
accounts.email,
(
SELECT
COUNT(ID)
FROM
customer_migration_details
WHERE
date_opened BETWEEN '2022-01-01' AND '2022-12-31' AND customer_migration_details.Assigned_to = accounts.email GROUP BY `accounts`.`email` LIMIT 1
) AS 'New Customers'
FROM
customer_migration_details
RIGHT JOIN accounts ON customer_migration_details.Assigned_to = accounts.email
WHERE
date_opened BETWEEN '2022-01-01' AND '2022-12-31' AND customer_migration_details.Assigned_to = accounts.email AND accounts.role LIKE '%Sales%'
GROUP BY
`accounts`.`email`
Heres the results
but here is the annoying part.
Showing rows 0 - 7 (8 total, Query took 109.5797 seconds.)
Theres got to be something im missing in the subquery maybe thats causing this to take so long.
acc: INDEX(email)
cmd: INDEX(Assigned_to, date_opened)
Having GROUP BY acc.email in the subquery seems wrong. And it may be unnecessary in the outer query.
Do not say COUNT(x) unless you need to avoid counting rows with x IS NULL. Instead, say simply COUNT(*).
If date_opened is a DATETIME, then you have excluded all but one second of New Years Eve.
LIKE with an initial wildcard is a performance problem. Are there multiple "roles" with "Sales" in them?
My brain gets scrambled when I see RIGHT JOIN. Can it be turned around to be a LEFT JOIN? Anyway, it seems to be an INNER JOIN.
Please provide EXPLAIN SELECT ...
Use a JOIN with GROUP BY or use a correlated sub-query, but not both at the same time.
SELECT
accounts.first_name,
accounts.last_name,
accounts.email,
COUNT(customer_migration_details.id) AS new_customers
FROM
accounts
LEFT JOIN
customer_migration_details
ON customer_migration_details.assigned_to = accounts.email
AND customer_migration_details.date_opened BETWEEN '2022-01-01' AND '2022-12-31'
WHERE
accounts.role LIKE '%Sales%'
GROUP BY
accounts.email
Or...
SELECT
accounts.first_name,
accounts.last_name,
accounts.email,
(
SELECT
COUNT(ID)
FROM
customer_migration_details
WHERE
date_opened BETWEEN '2022-01-01' AND '2022-12-31'
AND assigned_to = accounts.email
)
AS new_customers
FROM
accounts
WHERE
accounts.role LIKE '%Sales%'
Notes:
It's bad practice to put spaces, etc, in column names, so I changed New Customers to new_customers.
LIKE '%Sales%' can't use an index, so will scan each and every account row.

SQL Count on JOIN query is taking forever to execute?

I'm trying to run count query on a 2 table join. e_amazing_client table is having million entries/rows and m_user has just 50 rows BUT count query is taking forever!
SELECT COUNT(`e`.`id`) AS `count`
FROM `e_amazing_client` AS `e`
LEFT JOIN `user` AS `u` ON `e`.`cx_hc_user_id` = `u`.`id`
WHERE ((`e`.`date_created` >= '2018-11-11') AND (`e`.`date_created` >= '2018-11-18')) AND (`e`.`id` >= 1)
I don't know what is wrong with this query?
First, I'm guessing that this is sufficient:
SELECT COUNT(*) AS `count`
FROM e_amazing_client e
WHERE e.date_created >= '2018-11-11' AND e.id >= 1;
If user has only 50 rows, I doubt it is creating duplicates. The comparisons on date_created are redundant.
For this query, try creating an index on e_amazing_client(date_created, id).
Maybe you wanted this:
SELECT COUNT(`e`.`id`) AS `count`
FROM `e_amazing_client` AS `e`
LEFT JOIN `user` AS `u` ON `e`.`cx_hc_user_id` = `u`.`id`
WHERE ((`e`.`date_created` >= '2018-11-11') AND (`e`.`date_created` <= '2018-11-18')) AND (`e`.`id` >= 1)
to check between dates?
Also, do you really need
AND (`e`.`id` >= 1)
If id is what an id is usually in a table, is there a case to be <1?
Your query is pulling ALL records on/after 2018-11-11 because your WHERE clause is ID >= 1 You have no clause in there for a specific user. You also had in your original query based on a date of >= 2018-11-18. You MAY have meant you only wanted the count WITHIN the week 11/11 to 11/18 where the sign SHOULD have been >= 11-11 and <= 11-18.
As for the count, you are getting ALL people (assuming no entry has an ID less than 1) and thus a count within that date range. If you want it per user as you indicated you need to group by the cx_hc_user_id (user) column to see who has the most, or make the user part of the WHERE clause to get one person.
SELECT
e.cx_hc_user_id,
count(*) countPerUser
from
e_amazing_client e
WHERE
e.date_created >= '2018-11-11'
AND e.date_created <= '2018-11-18'
group by
e.cx_hc_user_id
You can order by the count descending to get the user with the highest count, but still not positive what you are asking.

Is there a way to create an SQL query faster than this one?

I have a MySQL table which stores the data of a hotel's reservations.
I need a query to see the amount of guests who stayed in the hotel for each date.
I was able to create a query (using a subquery) but it performs very slowly. Is there a better way to get the requested data? (For example join the table to itself, or whatever.)
My query is:
SELECT CheckOutDate AS Date,
(SELECT SUM(NrOfGuests) FROM tblGuests tG
WHERE tG.CheckInDate <= tblGuests.CheckOutDate
AND tG.CheckOutDate > tblGuests.CheckOutDate
AND tG.IsCancelled = False AND tG.NoShow = False)
AS NrOfGestsStaying
FROM tblGuests
GROUP BY CheckOutDate
What is the best way to make it perform faster?
In the original query, the SELECT returns a SUM on every row of the table using a subquery. The duplicates are removed afterwards using a group by CheckOutDate. So, in other words, this is the SUM(NrOfGuests) for distinct CheckOutDate.
You can remove duplicate CheckOutDate in advance by subquerying distinct CheckOutDate. So in the receiving query the SUM is applied just one time for distinct CheckOutDate:
SELECT dT.CheckOutDate
,(SELECT SUM(NrOfGuests)
FROM tblGuests tG
WHERE tG.CheckInDate <= dT.CheckOutDate
AND tG.CheckOutDate >= dT.CheckOutDate
AND tG.IsCancelled = 0
AND tG.NoShow = 0
) AS NrOfGuests
FROM (
SELECT DISTINCT CheckOutDate
FROM tblGuests
) AS dT
ORDER BY dT.CheckOutDate

How to fix SQL query with Left Join and subquery?

I have SQL query with LEFT JOIN:
SELECT COUNT(stn.stocksId) AS count_stocks
FROM MedicalFacilities AS a
LEFT JOIN stocks stn ON
(stn.stocksIdMF = ( SELECT b.MedicalFacilitiesIdUser
FROM medicalfacilities AS b
WHERE b.MedicalFacilitiesIdUser = a.MedicalFacilitiesIdUser
ORDER BY stn.stocksId DESC LIMIT 1)
AND stn.stocksEndDate >= UNIX_TIMESTAMP() AND stn.stocksStartDate <= UNIX_TIMESTAMP())
These query I want to select one row from table stocks by conditions and with field equal value a.MedicalFacilitiesIdUser.
I get always count_stocks = 0 in result. But I need to get 1
The count(...) aggregate doesn't count null, so its argument matters:
COUNT(stn.stocksId)
Since stn is your right hand table, this will not count anything if the left join misses. You could use:
COUNT(*)
which counts every row, even if all its columns are null. Or a column from the left hand table (a) that is never null:
COUNT(a.ID)
Your subquery in the on looks very strange to me:
on stn.stocksIdMF = ( SELECT b.MedicalFacilitiesIdUser
FROM medicalfacilities AS b
WHERE b.MedicalFacilitiesIdUser = a.MedicalFacilitiesIdUser
ORDER BY stn.stocksId DESC LIMIT 1)
This is comparing MedicalFacilitiesIdUser to stocksIdMF. Admittedly, you have no sample data or data layouts, but the naming of the columns suggests that these are not the same thing. Perhaps you intend:
on stn.stocksIdMF = ( SELECT b.stocksId
-----------------------------^
FROM medicalfacilities AS b
WHERE b.MedicalFacilitiesIdUser = a.MedicalFacilitiesIdUser
ORDER BY b.stocksId DESC
LIMIT 1)
Also, ordering by stn.stocksid wouldn't do anything useful, because that would be coming from outside the subquery.
Your subquery seems redundant and main query is hard to read as much of the join statements could be placed in where clause. Additionally, original query might have a performance issue.
Recall WHERE is an implicit join and JOIN is an explicit join. Query optimizers
make no distinction between the two if they use same expressions but readability and maintainability is another thing to acknowledge.
Consider the revised version (notice I added a GROUP BY):
SELECT COUNT(stn.stocksId) AS count_stocks
FROM MedicalFacilities AS a
LEFT JOIN stocks stn ON stn.stocksIdMF = a.MedicalFacilitiesIdUser
WHERE stn.stocksEndDate >= UNIX_TIMESTAMP()
AND stn.stocksStartDate <= UNIX_TIMESTAMP()
GROUP BY stn.stocksId
ORDER BY stn.stocksId DESC
LIMIT 1