Query Run time with Inner Join - mysql

I tried running a query with an inner join in Sequel Pro to get the most recent records/invoices using this:
SELECT tt.Hotel_Property, tt.Preferred_Hotel_Status
FROM hotel_detail tt
INNER JOIN
(SELECT Hotel_Property, MAX(STR_TO_DATE (`Invoice_Date`, '%m/%d/%Y')) AS MaxDateTime
FROM hotel_detail
GROUP BY Hotel_Property) groupedtt
ON tt.Hotel_Property = groupedtt.Hotel_Property
AND tt.Invoice_Date = groupedtt.MaxDateTime
But it's running the query for a long time and I'm not sure if it'll actually execute (cancelled it after waiting 14 mins). I know it's a lot of data to work through but wondered if anyone had suggestions to make it run faster?
*Ideally I want one record for each hotel property giving the most recent invoice date and the status associated with that max invoice
Thanks!

From my knowledge sequel pro uses MySQL so you might no be able to use analytical functions but try the following:
SELECT
Hotel_Property
, Preferred_Hotel_Status
, MAX(STR_TO_DATE (`Invoice_Date`, '%m/%d/%Y')) OVER(PARTITION BY hotel_property) AS MaxDateTime
FROM hotel_detail
If this doesn't work then I'd suggest running the query in 'chunks' based on the date. So maybe run a day at a time by employing a WHERE clause. I.E.:
SELECT tt.Hotel_Property, tt.Preferred_Hotel_Status
FROM hotel_detail tt
INNER JOIN
(SELECT Hotel_Property, MAX(STR_TO_DATE (`Invoice_Date`, '%m/%d/%Y')) AS MaxDateTime
FROM hotel_detail
WHERE DATE = "the_date_you_want_to_run"
GROUP BY Hotel_Property) groupedtt
ON tt.Hotel_Property = groupedtt.Hotel_Property
AND tt.Invoice_Date = groupedtt.MaxDateTime
WHERE DATE = "the_date_you_want_to_run"
Then you can either look at the results for different days separately, or simply INSERT them into a new table where you can perform more analysis.

Try using a correlated subquery:
select hd.*
from hotel_detail hd
where str_to_date(hd.invoice_date, '%m/%d/%Y') =
(select max(str_to_date(hd2.invoice_date, '%m/%d/%Y'))
from hotel_detail hd2
where hd2.hotel_property = hd.hotel_property
);
This can take advantage of an index on hotel_detail(hotel_property, invoice_date). The index would be more effective if you stored the date properly using the native SQL format of date or datetime.

Related

MySQL Long Response Time

I have a valid MySQL Query that selects the latest occupancy percentage of a table from each community entered in my DB, but it seems to be scanning the entire DB of entries as the lookup time takes roughly 3-4 seconds.
With the details provided in the query below, can someone provide me with a faster/better way to lookup the latest timestamp field for each community? - I need the query to select every community entered, with the latest timestamp, but the limit for each community selected should be 1 (meaning community named "Test Community" will have possibly hundreds of submissions but I need the latest entered Timestamp selected, along with the same selection for every community entered in the table)
SELECT t1.reportID, t1.communityID, t1.region, t1.percentOccupied,
t1.TIMESTAMP, Communities.fullName
FROM NightlyReports t1
INNER JOIN Communities On t1.communityID = Communities.communityID
WHERE t1.TIMESTAMP = ( SELECT MAX( TIMESTAMP ) FROM NightlyReports WHERE
t1.communityID = NightlyReports.communityID )
AND t1.region = 'GA' ORDER BY percentOccupied DESC
In my experience, correlated subqueries often have rather poor performance; try this instead:
SELECT t1.reportID, t1.communityID, t1.region, t1.percentOccupied
, t1.TIMESTAMP, Communities.fullName
FROM NightlyReports AS t1
INNER JOIN Communities ON t1.communityID = Communities.communityID
INNER JOIN (
SELECT communityID, MAX( TIMESTAMP ) AS lastTimestamp
FROM NightlyReports
WHERE region = 'GA'
GROUP BY communityID
) AS lastReports ON t1.communityID = lastReports.communityID
AND t1.TIMESTAMP = lastReports.lastTimestamp
WHERE t1.region = 'GA'
ORDER BY percentOccupied DESC
Your query is fine. For this query (which is rewritten just a bit):
SELECT nr.reportID, nr.communityID, nr.region, nr.percentOccupied,
nr.TIMESTAMP, c.fullName
FROM NightlyReports nr INNER JOIN
Communities c
ON nr.communityID = c.communityID
WHERE nr.TIMESTAMP = (SELECT MAX(nr2.TIMESTAMP)
FROM NightlyReports nr2
WHERE nr.communityID = nr2.communityID
) AND
nr.region = 'GA'
ORDER BY percentOccupied DESC;
You want indexes on:
NightlyReports(region, timestamp, communityid)
NightlyReports(communityid, timestamp)
Communities(communityID) (this may already exist)
The correlated subquery is not per se a problem.

MySQL Syntax Issue combining to working queries

I'm just starting to learn SQL, and managed to cobble together a couple of working queries, but then when I combine them I am getting a syntax error. The query throwing the error:
SELECT sca_ticket_status.name As Status, AVG(QueueTime)
FROM (SELECT DateDiff (created, now()) as 'QueueTime'
FROM sca_ticket as SubQuery
LEFT JOIN sca_ticket_status
ON sca_ticket.status_id = sca_ticket_status.id
GROUP BY name
ORDER BY sort
For reference, the two working queries that I am attempting to leverage are as follows:
SELECT sca_ticket_status.name As Status, COUNT(sca_ticket.ticket_id) AS Count
FROM sca_ticket
LEFT JOIN sca_ticket_status
ON sca_ticket.status_id = sca_ticket_status.id
WHERE sca_ticket.created between date_sub(now(),INTERVAL 1 WEEK) and now()
GROUP BY name
ORDER BY sort
SELECT AVG(QueueTime)
FROM (SELECT DateDiff (created, now()) as 'QueueTime'
FROM `sca_ticket`
WHERE `status_id` = 1) as SubQuery
Try closing your second select statement
SELECT sca_ticket_status.name As Status, AVG(QueueTime)
FROM (SELECT status_id, DateDiff (created, now()) as 'QueueTime'
FROM sca_ticket) q1
LEFT JOIN sca_ticket_status
ON q1.status_id = sca_ticket_status.id
GROUP BY name
ORDER BY sort
You will also need to expose the status_id column in your inner select list if you want to join on it later.
You do not need a subquery at all. This just slows down the processing in MySQL (the optimizer is not very smart; it materializes subqueries losing index information).
SELECT ts.name As Status, AVG(DateDiff(t.created, now()))
FROM sca_ticket t LEFT JOIN
sca_ticket_status ts
ON t.status_id = ts.id
GROUP BY ts.name
ORDER BY sort

Alternative to mysql WHERE IN SELECT GROUP BY when wanting max value in group by

I have the following query, which was developed from a hint found online because of a problem with a GROUP BY returning the maximum value; but it's running really slowly.
Having looked online I'm seeing that WHERE IN (SELECT.... GROUP BY) is probably the issue, but, to be honest, I'm struggling to find a way around this:
SELECT *
FROM tbl_berths a
JOIN tbl_active_trains b on a.train_uid=b.train_uid
WHERE (a.train_id, a.TimeStamp) in (
SELECT a.train_id, max(a.TimeStamp)
FROM a
GROUP BY a.train_id
)
I'm thinking I possibly need a derived table, but my experience in this area is zero and it's just not working out!
you can move that to a SUBQUERY and also select only required columns instead of All (*)
SELECT a.train_uid
FROM tbl_berths a
JOIN tbl_active_trains b on a.train_uid=b.train_uid
JOIN (SELECT a.train_id, max(a.TimeStamp) as TimeStamp
FROM a
GROUP BY a.train_id )T
on a.train_id = T.train_id
and a.TimeStamp = T.TimeStamp

Indexing on MySQL

I have a bulk query with subquery. My query works fine when I run it on development server, but when I've try it pn the live server, the query takes too much time to produce an output. I think it's because of a big data on the live server. Can anyone help me on how to index query on MySQL so that it will lessen the time execution.
Here is my query:
SELECT prd.fldemployeeno AS Empno,
(SELECT fldemployeename FROM tblprofile prf WHERE prf.fldemployeeno = prd.fldemployeeno LIMIT 0,1) AS Empname,
'01' AS `Week`,
COUNT(DISTINCT isAud.fldid) AuditedFiles,
COUNT(qua.seqid) ErrorCount,
COUNT(DISTINCT qua.fldid) OrdersWithError
FROM tbldownloadITL dwn
INNER JOIN tblproductionITL prd
ON dwn.fldid = prd.fldglobalid
INNER JOIN (SELECT p.fldemployeeno,fldglobalid,p.fldstarttime,COALESCE(q.fldstarttime,p.fldstarttime) `AuditDate`
FROM tblproductionitl p
LEFT JOIN tblqualityaudit q
ON p.fldemployeeno=q.fldemployeeno
AND p.fldstarttime=q.fldprodstarttime
AND p.fldglobalid=q.fldid
WHERE p.fldprojectgroup='PROJGROUP') temp
ON prd.fldglobalid=temp.fldglobalid
AND prd.fldemployeeno=temp.fldemployeeno
AND prd.fldstarttime=temp.fldstarttime
INNER JOIN tblisauditedITL isAud
USING (fldid)
LEFT JOIN tblqualityaudit qua
ON qua.fldid = dwn.fldid
AND qua.fldbusunit = dwn.fldbusunit
AND qua.fldprojectGroup = dwn.fldprojectGroup
AND qua.fldemployeeno = prd.fldemployeeno
AND qua.fldprodstarttime = prd.fldstarttime
AND qua.flderrorstatus != 'NOT ERROR'
LEFT JOIN tblerrorcategory
USING (flderrorcategoryid)
LEFT JOIN tblerrortypes
USING (flderrortypeid)
WHERE dwn.fldbusunit = 'BUSUNIT'
AND dwn.fldprojectGroup = 'PROJGROUP'
AND temp.AuditDate BETWEEN '2011-07-29 00:00:00' AND '2011-07-29 23:59:59'
GROUP BY prd.fldemployeeno
ORDER BY Empname
Here is also the description of the query:
I would suggest installing Sphinx on the your server if you have the access. That way you can have an indexed resource at your finger tips for extremely fast searching, on top of that you can add the execution of what is called a 'delta' index to allow for real time updating of your mysql database. It is highly customizable. Hopefully this will help you out.
http://sphinxsearch.com/

Slow subquery in MySQL

im trying to generate a report using CodeIgniter and Datatables.net .
Now i'm trying to the amount of closed jobs (its a human resources system). I used to query all jobs and in PHP do a foreach and then doing the calcs.
Because im want to use all the features of Datatables (sorting specifically) im trying to do all the calcs in mySQL.
The problem is: the second subquery is very very very slow.
SELECT
jobs.jobs_id, clients.nome_fantasia, concat_ws(' ', user_profiles.first_name, user_profiles.last_name) as fullname,
jobs.titulo_vaga, jobs.qtd_vagas, company.name as nome_company, jobs_status.name as status_name, DATEDIFF(NOW(), jobs.data_abertura) as date_idade,
(select count(job_cv.jobs_id) from job_cv where job_cv.jobs_id = jobs.jobs_id) as qtd_int,
(select count(distinct job_cv.user_id) from job_cv_history join job_cv on job_cv.job_cv_id = job_cv_history.job_cv_id where job_cv_history.status = '11' and job_cv.jobs_id = jobs.jobs_id ) as fechadas
FROM (jobs)
JOIN clients ON lients.clients_id=jobs.clients_idJOIN user_profiles ON jobs.consultor_id=user_profiles.user_id
JOIN jobs_status ON jobs.status=jobs_status.jobs_status_id
JOIN company ON jobs.company_id=company.company_id
LIMIT 50
Some one can help me? I can provide more information if its needed.
UPDATE
The idea to use JOIN instead SELECT work with the first subquery but with the second one not, there a way to pass a 'variable' to use inside the subquery? Like the current jobs_id?
UPDATE AGAIN
This line works fine by itself. But inside the subquery take about a minute with worng values
SELECT job_cv.jobs_id,count(distinct job_cv.user_id) AS fechadas
FROM job_cv_history
JOIN job_cv
ON job_cv.job_cv_id = job_cv_history.job_cv_id
WHERE job_cv_history.status = '11'
GROUP BY job_cv.jobs_id
It is not subquery that is slow. It's the fact, that you're executing these subqueries for each row returned from outer query. Move these to joins instead, and you should observe increase in performance.
SELECT
jobs.jobs_id, clients.nome_fantasia, concat_ws(' ', user_profiles.first_name, user_profiles.last_name) as fullname,
jobs.titulo_vaga, jobs.qtd_vagas, company.name as nome_company, jobs_status.name as status_name, DATEDIFF(NOW(), jobs.data_abertura) as date_idade,
qtd.qtd_int,
fechadas.fechadas
FROM (jobs)
JOIN clients ON lients.clients_id=jobs.clients_idJOIN user_profiles ON jobs.consultor_id=user_profiles.user_id
JOIN jobs_status ON jobs.status=jobs_status.jobs_status_id
JOIN company ON jobs.company_id=company.company_id
JOIN (
SELECT jobs_id, count(jobs_id) AS qtd_int FROM job_cv GROUP BY jobs_id
) AS qtd ON qtd.jobs_id = jobs.jobs_id
JOIN (
SELECT job_cv.user_id, count(distinct job_cv.user_id) AS fechadas
FROM job_cv_history
JOIN job_cv
ON job_cv.job_cv_id = job_cv_history.job_cv_id
WHERE job_cv_history.status = '11'
GROUP BY job_cv.user_id
) AS fechadas ON job_cv.jobs_id = jobs.jobs_id
LIMIT 50
You may try to create these indexes:
ALTER TABLE `job_cv` ADD INDEX `job_cv_cindex` (`job_cv_id` ASC, `jobs_id` ASC, `user_id` ASC);
ALTER TABLE `job_cv_history` ADD INDEX `job_cv_history_cindex` (`job_cv_id` ASC, `status` ASC);
use Joins instead of sub queries. It significantly improves the performance in MySql.
try to use Left join on your case and see if performance improves or not