I am trying to get data from multiple tables that need to be consolidated together for reporting. I am trying to get all the details of every header that is complete after a certain date and must check for the 0 date because of bad data.
I am working with legacy tables that I, sadly, did not create and cannot change. This statement times out in my application and takes about 40 seconds to run against a database on a local Virtual Machine. Is it possible to refactor the query for better performance? Any help is appreciated!
SELECT detail.*
FROM detail
JOIN header
ON detail.invoice = header.invoice
WHERE detail.dateapply != '0000-00-00'
AND header.dateapply >= '2017-09-17'
AND header.orderstatus IN ('complete', 'backorder')
ORDER BY detail.DateApply;`
For header:
INDEX(orderstatus, dateapply)
INDEX(invoice, orderstatus, dateapply)
For detail:
INDEX(invoice, dateapply)
INDEX(dateapply, invoice)
The ordering of the columns is deliberate. Since I can't tell which table the Optimizer will start with, I provided indexes that should be optimal either way.
If the two dateapply columns are in sync, then there are probably further optimizations.
Between my coworker and myself we came up with the below query that is returning results more quickly. The GROUP BY eliminated duplicate records. I would like to thank everyone for taking the time to help me look at my issue and look forward to the possibility of using indexes in the future.
SELECT d.*
FROM detail d
JOIN header h
ON d.invoice = h.invoice
WHERE d.qbposted = 0
AND d.dateapply != '0000-00-00'
AND h.dateapply >= '2017-09-17'
AND h.orderstatus IN ('complete', 'backorder')
GROUP BY d.rec ORDER BY d.dateapply;
Related
So I'm facing a difficult scenario, I have a legacy app, bad written and designed, with a table, t_booking. This app, has a calendar view, where, for every hall, and for every day in the month, shows its reservation status, with this query:
SELECT mr1b.id, mr1b.idreserva, mr1b.idhotel, mr1b.idhall, mr1b.idtiporeserva, mr1b.date, mr1b.ampm, mr1b.observaciones, mr1b.observaciones_bookingarea, mr1b.tipo_de_navegacion, mr1b.portal, r.estado
FROM t_booking mr1b
LEFT JOIN a_reservations r ON mr1b.idreserva = r.id
WHERE mr1b.idhotel = '$sIdHotel' AND mr1b.idhall = '$hall' AND mr1b.date = '$iAnyo-$iMes-$iDia'
AND IF (r.comidacena IS NULL OR r.comidacena = '', mr1b.ampm = 'AM', r.comidacena = 'AM' AND mr1b.ampm = 'AM')
AND (r.estado <> 'Cancelled' OR r.estado IS NULL OR r.estado = '')
LIMIT 1;
(at first there was also a ORDER BY r.estado DESC which I took out)
This query, after proper (I think) indexing, takes 0.004 seconds each, and the overall calendar view is presented in a reasonable time. There are indexes over idhotel, idhall, and date.
Now, I have a new module, well written ;-), which does reservations in another table, but I must present both types of reservations in same calendar view. My first approach was create a view, joining content of both tables, and selecting data for calendar view from this view instead of t_booking.
The view is defined like this:
CREATE OR REPLACE VIEW
t_booking_hall_reservation
AS
SELECT id,
idreserva,
idhotel,
idhall,
idtiporeserva,
date,
ampm,
observaciones,
observaciones_bookingarea,
tipo_de_navegacion, portal
FROM t_booking
UNION ALL
SELECT HR.id,
HR.convention_id as idreserva,
H.id_hotel as idhotel,
HR.hall_id as idhall,
99 as idtiporeserva,
date,
session as ampm,
observations as observaciones,
'new module' as observaciones_bookingarea,
'events' as tipo_de_navegacion,
'new module' as portal
FROM new_hall_reservation HR
JOIN a_halls H on H.id = HR.hall_id
;
(table new_hall_reservation has same indexes)
I tryed UNION ALL instead of UNION as I read this is much more efficient.
Well, the former query, changing t_booking for t_booking_hall_reservation, takes 1.5 seconds, to multiply for each hall and each day, which makes calendar view impossible to finish.
The app is spaguetti code, so, looping twice, once over t_booking and then over new_hall_reservation and combining results is somehow difficult.
Is it possible to tune the view to make this query fast enough? Another approach?
Thanks
PS: the less I modify original query, the less I'll need to modify the legacy app, which is, at less, risky to modify
This is too long for a comment.
A view is (almost) never going to help performance. Yes, they make queries simpler. Yes, they incorporate important logic. But no, they don't help performance.
One key problem is the execution of the view -- it doesn't generally take the filters in the overall tables into account (although the most recent versions of MySQL are better at this).
One suggestion -- which might be a fair bit of work -- is to materialize the view as a table. When the underlying tables change, you need to change t_booking_hall_reservation using triggers. Then you can create indexes on the table to achieve your performance goals.'
t_booking, unless it is a VIEW, needs
INDEX(idhotel, idhall, date)
VIEWs are syntactic sugar; they do not enhance performance; sometimes they are slower than the equivalent SELECT.
I'm trying to do what I think is a set of simple set operations on a database table: several intersections and one union. But I don't seem to be able to express that in a simple way.
I have a MySQL table called Moment, which has many millions of rows. (It happens to be a time-series table but that doesn't impact on my problem here; however, these data have a column 'source' and a column 'time', both indexed.) Queries to pull data out of this table are created dynamically (coming in from an API), and ultimately boil down to a small pile of temporary tables indicating which 'source's we care about, and maybe the 'time' ranges we care about.
Let's say we're looking for
(source in Temp1) AND (
((source in Temp2) AND (time > '2017-01-01')) OR
((source in Temp3) AND (time > '2016-11-15'))
)
Just for excitement, let's say Temp2 is empty --- that part of the API request was valid but happened to include 'no actual sources'.
If I then do
SELECT m.* from Moment as m,Temp1,Temp2,Temp3
WHERE (m.source = Temp1.source) AND (
((m.source = Temp2.source) AND (m.time > '2017-01-01')) OR
((m.source = Temp3.source) AND (m.time > '2016-11'15'))
)
... I get a heaping mound of nothing, because the empty Temp2 gives an empty Cartesian product before we get to the WHERE clause.
Okay, I can do
SELECT m.* from Moment as m
LEFT JOIN Temp1 on m.source=Temp1.source
LEFT JOIN Temp2 on m.source=Temp2.source
LEFT JOIN Temp3 on m.source=Temp3.source
WHERE (m.source = Temp1.source) AND (
((m.source = Temp2.source) AND (m.time > '2017-01-01')) OR
((m.source = Temp3.source) AND (m.time > '2016-11-15'))
)
... but this takes >70ms even on my relatively small development database.
If I manually eliminate the empty table,
SELECT m.* from Moment as m,Temp1,Temp3
WHERE (m.source = Temp1.source) AND (
((m.source = Temp3.source) AND (m.time > '2016-11-15'))
)
... it finishes in 10ms. That's the kind of time I'd expect.
I've also tried putting a single unmatchable row in the empty table and doing SELECT DISTINCT, and it splits the difference at ~40ms. Seems an odd solution though.
This really feels like I'm just conceptualizing the query wrong, that I'm asking the database to do more work than it needs to. What is the Right Way to ask the database this question?
Thanks!
--UPDATE--
I did some actual benchmarks on my actual database, and came up with some really unexpected results.
For the scenario above, all tables indexed on the columns being compared, with an empty table,
doing it with left joins took 3.5 minutes (!!!)
doing it without joins (just 'FROM...WHERE') and adding a null row to the empty table, took 3.5 seconds
even more striking, when there wasn't an empty table, but rather ~1000 rows in each of the temporary tables,
doing the whole thing in one query took 28 minutes (!!!!!), but,
doing each of the three AND clauses separately and then doing the final combination in the code took less than a second.
I still feel I'm expressing the query in some foolish way, since again, all I'm trying to do is one set union (OR) and a few set intersections. It really seems like the DB is making this gigantic Cartesian product when it seriously doesn't need to. All in all, as pointed out in the answer below, keeping some of the intelligence up in the code seems to be the better approach here.
There are various ways to tackle the problem. Needless to say it depends on
how many queries are sent to the database,
the amount of data you are processing in a time interval,
how the database backend is configured to manage it.
For your use case, a little more information would be helpful. The optimization of your query by using CASE/COUNT(*) or CASE/LIMIT combinations in queries to sort out empty tables would be one option. However, if-like queries cost more time.
You could split the SQL code to downgrade the scaling of the problem from 1*N^x to y*N^z, where z should be smaller than x.
You said that an API is involved, maybe you are able handle the temporary "no data" tables differently or even don't store them?
Another option would be to enable query caching:
https://dev.mysql.com/doc/refman/5.5/en/query-cache-configuration.html
I have created a complex view which gives me output within a second on Oracle 10g DBMS.. but the same view takes 2,3 minutes on MYSQL DBMS.. I have created indexes on all the fields which are included in the view definition and also increased the query_cache_size but still failed to get answer in less time. My query is given below
select * from results where beltno<1000;
And my view is:
create view results as select person_biodata.*,current_rank.*,current_posting.* from person_biodata,current_rank,current_posting where person_biodata.belt_no=current_rank.belt_no and person_biodata.belt_no=current_posting.belt_no ;
The current_posting view is defined as follows:
select p.belt_no belt_no,ps.ps_name police_station,pl.pl_name posting_as,p.st_date from p_posting p,post_list pl,police_station ps where p.ps_id=ps.ps_id and p.pl_id=pl.pl_id and (p.belt_no,p.st_date) IN(select belt_no,max(st_date) from p_posting group by belt_no);
The current_rank view is defined as follows:
select p.belt_no belt_no,r.r_name from p_rank p,rank r where p.r_id=r.r_id and (p.belt_no,p.st_date) IN (select belt_no,max(st_date) from p_rank group by belt_no)
Some versions of MySQL have a particular problem with in and subqueries, which you have in this view:
select p.belt_no belt_no,ps.ps_name police_station,pl.pl_name posting_as,p.st_date
from p_posting p,post_list pl,police_station ps
where p.ps_id=ps.ps_id and p.pl_id=pl.pl_id and
(p.belt_no,p.st_date) IN(select belt_no,max(st_date) from p_posting group by belt_no)
Try changing that to:
where exists (select 1
from posting
group by belt_no
having belt_no = p.belt_no and p.st_date = max(st_date)
)
There may be other issues, of course. At the very least, you could format your queries so they are readable and use ANSI standard join syntax. Being able to read the queries would be the first step to improving their performance. Then you should use explain in MySQL to see what the query plans are like.
Muhammad Jawad it's so simple. you have already created indexes on table that allow database application to find data fast, but if you change/update the (indexes tables) table (e.g: inserst,update,delete) then it take more time that of which have no indexes applied on table because the indexes also need updation so each index will be updated that take too much time. So you should apply indexes on columns or tables that we use it only for search purposes only. hope this will help u. thank u.
So my expertise is not in MySQL so I wrote this query and it is starting to run increasingly slow as in 5 minutes or so with 100k rows in EquipmentData and 30k or so in EquipmentDataStaging (which to me is very little data):
CREATE TEMPORARY TABLE dataCompareTemp
SELECT eds.eds_id FROM equipmentdatastaging eds
INNER JOIN equipment e ON e.e_id_string = eds.eds_e_id_string
INNER JOIN equipmentdata ed ON e.e_id = ed.ed_e_id
AND eds.eds_ed_log_time=ed.ed_log_time
AND eds.eds_ed_unit_type=ed.ed_unit_type
AND eds.eds_ed_value = ed.ed_value
I am using this query to compare data rows pulled from a clients device to current data sitting within their database. From here I take the temp table and use the ID's off it to make conditional decisions. I have the e_id_string indexed and I have e_id indexed and everything else is not. I know that it looks stupid that I have to compare all this information, but the clients system is spitting out redundant data and I am using this query to find it. Any type of help on this would be greatly appreciated whether it be a different approach by SQL or MySql Management. I feel like when I do stuff like this in MSSQL it handles the requests much better, but that is probably because I have something set up incorrectly.
TIPS
index all necessary columns which are using with ON or WHERE condition
here you need to index eds_ed_log_time,eds_e_id_string, eds_ed_unit_type, eds_ed_value,ed_e_id,ed_log_time,ed_unit_type,ed_value
change syntax to SELECT STRAIGHT JOIN ... see more reference
It's common to have a table where for example the the fields are account, value, and time. What's the best design pattern for retrieving the last value for each account? Unfortunately the last keyword in a grouping gives you the last physical record in the database, not the last record by any sorting. Which means IMHO it should never be used. The two clumsy approaches I use are either a subquery approach or a secondary query to determine the last record, and then joining to the table to find the value. Isn't there a more elegant approach?
could you not do:
select account,last(value),max(time)
from table
group by account
I tested this (granted for a very small, almost trivial record set) and it produced proper results.
Edit:
that also doesn't work after some more testing. I did a fair bit of access programming in a past life and feel like there is a way to do what your asking in 1 query, but im drawing a blank at the moment. sorry.
After literally years of searching I finally found the answer at the link below #3. The sub-queries above will work, but are very slow -- debilitatingly slow for my purposes.
The more popular answer is a tri-level query: 1st level finds the max, 2nd level gets the field values based on the 1st query. The result is then joined in as a table to the main query. Fast but complicated and time-consuming to code/maintain.
This link works, still runs pretty fast and is a lot less work to code/maintain. Thanks to the authors of this site.
http://access.mvps.org/access/queries/qry0020.htm
The subquery option sounds best to me, something like the following psuedo-sql. It may be possible/necessary to optimize it via a join, that will depend on the capabilities of the SQL engine.
select *
from table
where account+time in (select account+max(time)
from table
group by account
order by time)
This is a good trick for returning the last record in a table:
SELECT TOP 1 * FROM TableName ORDER BY Time DESC
Check out this site for more info.
#Tom
It might be easier for me in general to do the "In" query that you've suggested. Generally I do something like
select T1.account, T1.value
from table T as T1
where T1 = (select max(T2.time) from table T as T2 where T1.account = T2.Account)
#shs
yes, that select last(value) SHOULD work, but it doesn't... My understanding although I can't produce an authorative source is that the last(value) gives the last physical record in the access file, which means it could be the first one timewise but the last one physically. So I don't think you should use last(value) for anything other than a really bad random row.
I'm trying to find the latest date in a group using the Access 2003 query builder, and ran into the same problem trying to use LAST for a date field. But it looks like using MAX finds the lates date.
Perhaps the following SQL is clumsy, but it seems to work correctly in Access.
SELECT
a.account,
a.time,
a.value
FROM
tablename AS a INNER JOIN [
SELECT
account,
Max(time) AS MaxOftime
FROM
tablename
GROUP BY
account
]. AS b
ON
(a.time = b.MaxOftime)
AND (a.account = b.account)
;