So I'm having serious speed problems using a left join to count ticket comments from another table. I've tried using a sub-select in the count field and had precisely the same performance.
With the count, the query takes about 1 second on maybe 30 tickets, and 5 seconds for 19000 tickets (I have both a production and a development server so times are skewed a tad). I'm trying to optimize this as four variations of the query need to be run each time per page refresh.
Without the count, I see execution time fall from 1 second to 0.03 seconds, so certainly this count is being run across all tickets and not just the ones which are selected.
Here's a trimmed down version of the query in question:
SELECT tickets.ticket_id,
ticket_severity,
ticket_short_description,
ticket_full_description,
count(*) as commentCount,
FROM tickets (LEFT JOIN tickets_comment on ticket_id = tickets_comment.ticket_id)
WHERE ticket_status='Open'
and ticket_owner_id='133783475'
GROUP BY
everything,
under,
the,
sun
Now, not all tickets have comments, so I can't just do a right or standard join. When doing that the speed is fairly good (1/10th the current), but any tickets without comments aren't included.
I see three fixes for this, and would like any and all advice you have.
Create a new column comment_count and use a trigger/update query on new comment
Work with the UI and grab comments on the fly (not really wanted)
Hope stackoverflow folks have a more elegant solution :þ
Ideas?
A co-worker has come to the rescue. The query was just using join improperly.
What must be done here is create a second table with a query like:
select count(*) from tickets_comment group by ticket_id where (clause matches other)
which will create a table with counts for each ticket id. Then join that table with the ticket table where the ticket ids match. It's not as wicked fast as creating a new column, but it's at least 1/10th the speed it was, so I'm pleased as punch.
Last step is converting nulls (on tickets where there were no comments) into zeros
Is by far the fastest solution and you'll see it done in Rails all the time because it really is that fast.
count(*) is really only used when you aren't selecting any other attributes. Try count(ticket_id) and see if that helps. I can't run explain so I can't test it myself but if your analysis is correct it should help.
Try running explain on the query to make sure the correct indexes are being used. If there are no indexes being used, create another one
Related
Recently through all the help here I have been able to get all the queries I wanted to work, but one just doesn't work as well as I hoped in terms of how long it takes to complete. It can take up to 15 minutes for ms-access to be able to complete and unfreeze. When I do the query on a small amount of data (150 or so records) it can take 1-2 minutes, but the larger the data set the longer it takes. What I am trying to do is take two queries, one that has every result (about 18,000 records), compare it against another that has just all the people that have "passed" and find the ones that have only ever "failed". This is what a fellow member has shown to me and I used. SELECT *
FROM All_ESD_Results_Date_Changed
WHERE ((([EmpID] & [Date]) Not In (SELECT EmpID & Date FROM All_Pass))); Is there a way to speed this up or is it just the limitations of ms-access? All the other queries I use on this same data set take seconds.I really appreciate all the help and I never would have been able to get any of this done without the help of this forum. Thank You.
You can try left joining the two tables together to achieve the same logic. Something like this:
SELECT t1.*
FROM All_ESD_Results_Date_Changed t1
LEFT JOIN All_Pass t2
ON t1.[EmpID] = t2.[EmpID] AND
t1.[Date] = t2.[Date]
WHERE t2.[EmpID] IS NULL
If this doesn't help, then a further step to take would be adding indices on the EmpID and Date columns in the All_Pass table.
I'm trying to do what I think is a set of simple set operations on a database table: several intersections and one union. But I don't seem to be able to express that in a simple way.
I have a MySQL table called Moment, which has many millions of rows. (It happens to be a time-series table but that doesn't impact on my problem here; however, these data have a column 'source' and a column 'time', both indexed.) Queries to pull data out of this table are created dynamically (coming in from an API), and ultimately boil down to a small pile of temporary tables indicating which 'source's we care about, and maybe the 'time' ranges we care about.
Let's say we're looking for
(source in Temp1) AND (
((source in Temp2) AND (time > '2017-01-01')) OR
((source in Temp3) AND (time > '2016-11-15'))
)
Just for excitement, let's say Temp2 is empty --- that part of the API request was valid but happened to include 'no actual sources'.
If I then do
SELECT m.* from Moment as m,Temp1,Temp2,Temp3
WHERE (m.source = Temp1.source) AND (
((m.source = Temp2.source) AND (m.time > '2017-01-01')) OR
((m.source = Temp3.source) AND (m.time > '2016-11'15'))
)
... I get a heaping mound of nothing, because the empty Temp2 gives an empty Cartesian product before we get to the WHERE clause.
Okay, I can do
SELECT m.* from Moment as m
LEFT JOIN Temp1 on m.source=Temp1.source
LEFT JOIN Temp2 on m.source=Temp2.source
LEFT JOIN Temp3 on m.source=Temp3.source
WHERE (m.source = Temp1.source) AND (
((m.source = Temp2.source) AND (m.time > '2017-01-01')) OR
((m.source = Temp3.source) AND (m.time > '2016-11-15'))
)
... but this takes >70ms even on my relatively small development database.
If I manually eliminate the empty table,
SELECT m.* from Moment as m,Temp1,Temp3
WHERE (m.source = Temp1.source) AND (
((m.source = Temp3.source) AND (m.time > '2016-11-15'))
)
... it finishes in 10ms. That's the kind of time I'd expect.
I've also tried putting a single unmatchable row in the empty table and doing SELECT DISTINCT, and it splits the difference at ~40ms. Seems an odd solution though.
This really feels like I'm just conceptualizing the query wrong, that I'm asking the database to do more work than it needs to. What is the Right Way to ask the database this question?
Thanks!
--UPDATE--
I did some actual benchmarks on my actual database, and came up with some really unexpected results.
For the scenario above, all tables indexed on the columns being compared, with an empty table,
doing it with left joins took 3.5 minutes (!!!)
doing it without joins (just 'FROM...WHERE') and adding a null row to the empty table, took 3.5 seconds
even more striking, when there wasn't an empty table, but rather ~1000 rows in each of the temporary tables,
doing the whole thing in one query took 28 minutes (!!!!!), but,
doing each of the three AND clauses separately and then doing the final combination in the code took less than a second.
I still feel I'm expressing the query in some foolish way, since again, all I'm trying to do is one set union (OR) and a few set intersections. It really seems like the DB is making this gigantic Cartesian product when it seriously doesn't need to. All in all, as pointed out in the answer below, keeping some of the intelligence up in the code seems to be the better approach here.
There are various ways to tackle the problem. Needless to say it depends on
how many queries are sent to the database,
the amount of data you are processing in a time interval,
how the database backend is configured to manage it.
For your use case, a little more information would be helpful. The optimization of your query by using CASE/COUNT(*) or CASE/LIMIT combinations in queries to sort out empty tables would be one option. However, if-like queries cost more time.
You could split the SQL code to downgrade the scaling of the problem from 1*N^x to y*N^z, where z should be smaller than x.
You said that an API is involved, maybe you are able handle the temporary "no data" tables differently or even don't store them?
Another option would be to enable query caching:
https://dev.mysql.com/doc/refman/5.5/en/query-cache-configuration.html
I have the following query..
SELECT Flights.flightno,
Flights.timestamp,
Flights.route
FROM Flights
WHERE Flights.adshex = '400662'
ORDER BY Flights.timestamp DESC
Which returns the following screenshot.
However I cannot use a simple group by as for example BCS6515 will appear a lot later in the list and I only want to "condense" the rows that are the same next to each other in this list.
An example of the output (note BCS6515 twice in this list as they were not adjacent in the first query)
Which is why a GROUP BY flightno will not work.
I don't think there's a good way to do so in SQL without a column to help you. At best, I'm thinking it would require a subquery that would be ugly and inefficient. You have two options that would probably end up with better performance.
One would be to code the logic yourself to prune the results. (Added:) This can be done with a procedure clause of a select statement, if you want to handle it on the database server side.
Another would be to either use other information in the table or add new information to the table for this purpose. Do you currently have something in your table that is a different value for each instance of a number of BCS6515 rows?
If not, and if I'm making correct assumptions about the data in your table, there will be only one flight with the same number per day, though the flight number is reused to denote a flight with the same start/end and times on other days. (e.g. the 10a.m. from NRT to DTW is the same flight number every day). If the timestamps were always the same day, then you could use DAY(timestamp) in the GROUP BY. However, that doesn't allow for overnight flights. Thus, you'll probably need something such as a departure date to group by to identify all the rows as belonging to the same physical flight.
GROUP BY does not work because 'timestamp' value is different for 2 BCS6515 records.
it will work only if:
SELECT Flights.flightno,
Flights.route
FROM Flights
WHERE Flights.adshex = '400662'
GROUP BY (Flights.flightno)
I have a query which i have been working on for hours and can't seem to get it working.
the query is to Generate list of customers that have spent > €100 in the last 365 days. I am creating a video rental database..
This is how far i have gotten but cant seem to link the data together with date_rented data table.
SELECT CUST_ID, CUSTOMER_SPEND
FROM ACCOUNT_TEST
WHERE CUSTOMER_SPEND > 100;
the tables I am working with are cust_id, customer_spend, date_rented and account_test
Instead of trying to answer the question that was (sort of) asked, maybe it makes sense to step back, look at the (apparently) desired result, and show how that could be achieved. For the moment, I'm going to only look at one table out of what should be a number. This table will hold the details of an individual customer rental:
customer_id
date_rented
cost
More fields are certainly needed, but those seem to cover what we care about for this query. From this, we want a list of customers who've spent at least 100 (of whatever unit cost is in), along with the amount spent by each. The only slightly tricky part is that we can't use an aggregate like sum(customer_paid) in a where clause, so we put that in a having clause instead.
select customer_id, sum(cost) as customer_paid
from rental_details
where to_days(now()) - to_days(date_rented) <= 365
group by customer_id
having customer_paid > 100
As a quick warning, that might need minor tweaking to work with MySQL -- most of what I've written recently has been for SQL Server.
It's common to have a table where for example the the fields are account, value, and time. What's the best design pattern for retrieving the last value for each account? Unfortunately the last keyword in a grouping gives you the last physical record in the database, not the last record by any sorting. Which means IMHO it should never be used. The two clumsy approaches I use are either a subquery approach or a secondary query to determine the last record, and then joining to the table to find the value. Isn't there a more elegant approach?
could you not do:
select account,last(value),max(time)
from table
group by account
I tested this (granted for a very small, almost trivial record set) and it produced proper results.
Edit:
that also doesn't work after some more testing. I did a fair bit of access programming in a past life and feel like there is a way to do what your asking in 1 query, but im drawing a blank at the moment. sorry.
After literally years of searching I finally found the answer at the link below #3. The sub-queries above will work, but are very slow -- debilitatingly slow for my purposes.
The more popular answer is a tri-level query: 1st level finds the max, 2nd level gets the field values based on the 1st query. The result is then joined in as a table to the main query. Fast but complicated and time-consuming to code/maintain.
This link works, still runs pretty fast and is a lot less work to code/maintain. Thanks to the authors of this site.
http://access.mvps.org/access/queries/qry0020.htm
The subquery option sounds best to me, something like the following psuedo-sql. It may be possible/necessary to optimize it via a join, that will depend on the capabilities of the SQL engine.
select *
from table
where account+time in (select account+max(time)
from table
group by account
order by time)
This is a good trick for returning the last record in a table:
SELECT TOP 1 * FROM TableName ORDER BY Time DESC
Check out this site for more info.
#Tom
It might be easier for me in general to do the "In" query that you've suggested. Generally I do something like
select T1.account, T1.value
from table T as T1
where T1 = (select max(T2.time) from table T as T2 where T1.account = T2.Account)
#shs
yes, that select last(value) SHOULD work, but it doesn't... My understanding although I can't produce an authorative source is that the last(value) gives the last physical record in the access file, which means it could be the first one timewise but the last one physically. So I don't think you should use last(value) for anything other than a really bad random row.
I'm trying to find the latest date in a group using the Access 2003 query builder, and ran into the same problem trying to use LAST for a date field. But it looks like using MAX finds the lates date.
Perhaps the following SQL is clumsy, but it seems to work correctly in Access.
SELECT
a.account,
a.time,
a.value
FROM
tablename AS a INNER JOIN [
SELECT
account,
Max(time) AS MaxOftime
FROM
tablename
GROUP BY
account
]. AS b
ON
(a.time = b.MaxOftime)
AND (a.account = b.account)
;