Select with Inner Join operator and IN (long results) - Optimize Query - mysql

I do a MySql query that looks for results in two different tables.
Tables
Contract
id, contract, creditor_id, client_id, event_id
Invoice
id, contract_id, invoice, due, value
The idea is to select the contracts using some parameters in the query, such as:
initial delay and final, initial value and final, events, creditor.
For this, I use the INNER JOIN, HAVING and IN.
Details:
After receiving the result, I take the values ​​and loop to make an update on each query result, using the result ID.
I built an example in SQL Fiddle for better visualization.
The problem is, when I do this query with very long results or thousands of lines, the query is really slow.
So, I wanted to know if there is a better way to do the same query in an optimal way.
Query:
SELECT `c`.`id`,
`c`.`contract`,
`c`.`creditor_id`,
`c`.`client_id`,
`c`.`event_id`,
`t`.`total_value`,
`delay`
FROM `contract` `c`
INNER JOIN
(SELECT contract_id,
Sum(value) total_value,
Datediff(Curdate(), due) AS delay
FROM invoice t GROUP BY contract_id
HAVING delay <= 99999
AND delay >= 1
AND total_value >= 1
AND total_value < 99999) t ON `t`.`contract_id` = `c`.`id`
WHERE `c`.`creditor_id` = 1
AND `c`.`event_id` IN(4, 7, 5, 8, 13, 3, 6, 15, 2, 24, 1, 21, 20, 14, 17, 18, 16, 23, 25, 22, 9, 10, 26, 12, 19, 11)

If "1..99999" means "any value", then remove the test from the query. That is construct a different query when the user wants an open-ended test.
Deal with the lack of due in the GROUP BY.
Change Datediff(Curdate(), due) > 123 to due < CURDATE() - INTERVAL 123 DAY. That will give us a chance to use due in an INDEX.
Qualify due and value; we can't tell which table they are in.
Please provide SHOW CREATE TABLE.
c could use INDEX(creditor_id, event_id), but after the above issues are addressed, there may be an even better index.

Related

MySQL Procedure / MySQL Function

I am still relatively new to MySQL and am stuck on a bit of data engineering.
I have a table with following:
Event_ID, Minutes, EventCode
I have multiple rows with same Event_ID and what event has occurred (eventcode) along with when in minutes (Minutes).
What I want to do is output to a new table the sequence of events based on the minutes for an event_id:
Eg:
Source:
Event_ID, Minutes, EventCode
12, 45, A
12, 49, B
12, 78, A
WOuld be transformed into:
12, 45, A, 1
12, 49, B, 2
12, 78, B, 3
So the last column shows the sequence. Although it can be assmed the source table is sorted by event_id following by minutes I would rather a solution that worked for it to be unsorted if possible
Some pointers would be great!
Thanks
Im MySQL 8 and higher you can use the row_number() window function.
SELECT event_id,
minutes,
eventcode,
row_number() OVER (PARTITION BY event_id
ORDER BY minutes)
FROM elbat;
Try this query:
select event_id, minutes, eventcode, #rownum:=#rownum+1 No from elbat, (SELECT #rownum:=0) r;

MySQL - Pull most recent value within date range for group of IDs

I have the query below
SELECT SUM(CAST(hd.value AS SIGNED)) as case_count
FROM historical_data hd
WHERE hd.tag_id IN (45,109,173,237,301,365,429)
AND hd.shift = 1
AND hd.timestamp BETWEEN '2018-04-10' AND '2018-04-11'
ORDER BY TIMESTAMP DESC
and with this I'm trying to select a SUM of the value for each of the IDs passed, during the time frame in the BETWEEN statement - but the most recent respective to that timeframe. So the end result would be a SUM of the case_count values for each ID passed in at the last timestamp the ID has i nthat date range.
I am having trouble figuring out HOW to accomplish this. My historical_data table is HUGE, however I do have very specific indexing on it that allows the queries to function fairly well - as well as partitioning on the table by YEAR.
Can anyone provide a pointer on how to get the data I need? I'd rather not loop over the list of IDs and run this query without the SUM and a LIMIT 1, but I guess I can if that's the only way.
Here is one method:
SELECT SUM(CAST(hd.value AS SIGNED)) as case_count
FROM historical_data hd
WHERE hd.tag_id IN (45, 109, 173, 237, 301, 365, 429) AND
hd.shift = 1 AND
hd.timestamp = (SELECT MAX(hd2.timestamp)
FROM historical_data hd
WHERE hd2.tag_id = hd.tag_id AND
hd2.shift = hd.shift AND
hd2.timestamp BETWEEN '2018-04-10' AND '2018-04-11'
);
The optimal index for this query is on historical_data(shift, tag_id, timestamp).

Aggregating statistics into JSON in Postgresql

So I am trying to calculate overview statistics into JSON, but am having trouble wrangling them into a query.
There are 2 tables:
appointments
- time timestamp
- patients int
assignments
- user_id int
- appointment_id int
I want to calculate the number of patients by user, by hour for the day. Ideally, it would look like this:
[
{hour: "2015-07-01T08:00:00.000Z", assignments: [
{user_id: 123, patients: 3},
{user_id: 456, patients: 10},
{user_id: 789, patients: 4},
]},
{hour: "2015-07-01T09:00:00.000Z", assignments: [
{user_id: 456, patients: 1},
{user_id: 789, patients: 6}
]},
{hour: "2015-07-01T10:00:00.000Z", assignments: []}
...
]
I got kind of close:
with assignments_totals as (
select user_id,sum(patients),date_trunc('hour',appointments.time) as hour
from assignments
inner join appointments on appointments.id = assignments.appointment_id
group by date_trunc('hour',sales.time),user_id
), hours as (
select to_char(date_trunc('hour',time),'YYYY-MM-DD"T"HH24:00:00.000Z') as hour, array_to_json(array_agg(DISTINCT assignment_totals)) as patients
from appointments
left join assignment_totals on date_trunc('hour',sales.time) = assignment_totals.hour
where time >= '2015-07-01T07:00:00.000Z' and time < '2015-07-02T07:00:00.000Z'
group by date_trunc('hour',time)
order by date_trunc('hour',time)
)
select array_to_json(array_agg(hours)) as hours from hours;
Which outputs:
[
{hour: "2015-07-01T08:00:00.000Z", assignments: [
{user_id: 123, patients: 3, hour: "2015-07-01T08:00:00.000Z" },
{user_id: 456, patients: 10, hour: "2015-07-01T08:00:00.000Z"},
{user_id: 789, patients: 4, hour: "2015-07-01T08:00:00.000Z"},
]},
{hour: "2015-07-01T09:00:00.000Z", assignments: [
{user_id: 456, patients: 1, hour: "2015-07-01T09:00:00.000Z"},
{user_id: 789, patients: 6, hour: "2015-07-01T09:00:00.000Z"}
]},
{hour: "2015-07-01T10:00:00.000Z", assignments: [null]}
...
]
While this works, there are 2 issues, which may or may not be independent of each other:
If there are no appointments that hour, I still want the hour to be included in the array (like 10AM in the example), but to have an empty "assignments" array. Right now it puts a null in there, and I can't figure out how to get rid of it while still keeping the hours in there.
I have to have the hour included in the assignments entries along with user_id and appointments because I need it to join the assignments_totals query to the hours query. But it's unnecessary because it's already in the parent.
I feel like it should be able to be done in 1 cte and 1 query and now I'm using 2 cte's... but can't figure out how to condense it and make it work.
I wanted to do something like
hours as (
select to_char(date_trunc('hour',time),'YYYY-MM-DD"T"HH24:00:00.000Z') as hour, sum(appointments.patients) OVER(partition by assignments.user_id) as appointments
from appointments
left join assignments on appointments.id = assignments.appointment_id
where time >= '2015-07-01T07:00:00.000Z' and time < '2015-07-02T07:00:00.000Z'
group by date_trunc('hour',time)
order by date_trunc('hour',time)
)
select array_to_json(array_agg(hours)) as hours from hours
but i can't get it to work without giving me a "attribute must be in the group by or aggregate function error.
Anyone know how to fix any of these issues? Thanks in advance!
The main issue with your last query seems to be in conflating window functions with aggregate functions. Window functions use the OVER syntax, and they do not in themselves require GROUP BY when there are other fields in the SELECT clause. Aggregate functions, on the other hand, use GROUP BY when there are other (non-aggregate-function) fields in the SELECT clause. One practical consequence of this difference is that window functions are not automatically DISTINCT.
The issue with NULL values resulting from the window function can be resolved with a simple COALESCE such that zero is used instead of null.
So, to write your query using a window function, use something like:
WITH hours AS
(
SELECT DISTINCT to_char(date_trunc('hour', ap.time), 'YYYY-MM-DD"T"HH:00:00.000Z') AS hour,
COALESCE(SUM(ap.patients) OVER (PARTITION BY asgn.user_id), 0) AS appointment_count
FROM appointments ap
LEFT JOIN assignments asgn ON ap.id = asgn.appointment_id
WHERE ap.time >= '2015-07-01T07:00:00.000Z'
AND ap.time < '2015-07-02T07:00:00.000Z'
)
SELECT array_to_json(array_agg(hours)) AS hours
FROM hours
ORDER BY hour
With an aggregate function:
WITH hours AS
(
SELECT to_char(date_trunc('hour', ap.time), 'YYYY-MM-DD"T"HH:00:00.000Z') AS hour,
SUM(COALESCE(ap.patients, 0)) AS appointment_count,
asgn.user_id
FROM appointments ap
LEFT JOIN assignments asgn ON ap.id = asgn.appointment_id
WHERE ap.time >= '2015-07-01T07:00:00.000Z'
AND ap.time < '2015-07-02T07:00:00.000Z'
GROUP BY asgn.user_id, to_char(date_trunc('hour', ap.time), 'YYYY-MM-DD"T"HH:00:00.000Z')
)
SELECT array_to_json(array_agg(hours)) AS hours
FROM hours
ORDER BY hour
My syntax may not be quite correct, so double-check before using this solution or one like it (and feel free to edit to correct any errors).
Most of my frustration with this came because I was not looking at the Postgres 9.4 documentation, which has new functions for dealing with json.
The solution I found builds upon the original query, but then breaks the assignments array down using json_array_elements, filters using where, then builds it back up again. It seems pointless to have essentially:
json_agg(json_array_elements(json_agg(*)))
But it makes very little performance difference and gets me where I need to go. Feel free to comment if you find a better solution! It should also be possible in <9.4 using array_agg and unnest but I was having trouble because I was trying to unnest a record type returned from my CTE, instead of an actual row type with column definitions.
with assignment_totals as (
select
date_trunc('hour',appointments.time) as hour,
user_id,
coalesce(sum(patients),0) as patients
from appointments
left outer join assignments on appointment.id = assignments.appointment_id
where time >= '2015-07-01T07:00:00.000Z' and time < '2015-07-02T07:00:00.000Z'
group by date_trunc('hour',appointments.time),user_id
), hours as (
select
to_char(assignment_totals.hour,'YYYY-MM-DD"T"HH24:00:00.000Z') as hour,
(
select coalesce(json_agg(json_build_object('user_id',(t->'user_id'),'patients',(t->'patients')) order by (t->>'user_id')),'[]'::json)
from json_array_elements(json_agg(assignment_totals)) t
where (t->>'patients') != '0'
) as patients
from assignment_totals
group by assignment_totals.hour
order by assignment_totals.hour
)
select array_to_json(array_agg(hours)) as hours from hours
Thanks to Andrew for pointing out that I can coalesce nulls to 0. But I still want to filter out entries where patients = 0. This solves all my problems by giving me the ability to filter them out with a where, and then gives me the ability to take out the time by building a new json object with json_build_object.

Long running MYSQL query, Frame Table and View with grouping

I have a table t_date_interval_30 that is cartesian product of a 365 calendar year of dates, and a time field incremented at 30 minute intervals. I use this as a framework to hang call data on.
t_date_interval_30
DATE, DAYNAME, INTERVAL
'2013-01-01', 'Tuesday', '00:00:00'
'2013-01-01', 'Tuesday', '00:30:00'
'2013-01-01', 'Tuesday', '01:00:00'
'2013-01-01', 'Tuesday', '01:30:00'
'2013-01-01', 'Tuesday', '02:00:00'
'2013-01-01', 'Tuesday', '02:30:00'
ETC...
Next I have a view v_call_details that is a summarized view of the call data. Call data is summarized down to one row per call session initiated - the source for this can have multiple rows per call session; i.e., call rolls Ring No Answer from one target to another, each leg of the call increments a new record row.
v_call_details
CLIENT, CSQ, SESS_ID, DATE, CALL_START, CONT_DISP, MET_SLA
'Acme','ACME_CSQ','123-123456789-01','2013-01-01','2013-01-01 00:12:34','ABANDONED',TRUE
'Acme','ACME_CSQ','123-123456998-01','2013-01-01','2013-01-01 00:45:02','HANDLED',TRUE
'Acme','ACME_CSQ','123-123457291-01','2013-01-02','2013-01-02 13:31:58','HANDLED',FALSE
ETC...
So, when I run the below query it takes forever.
SELECT
cd.`client`,
cd.`csq`,
di.`date`,
di.`dayname`,
di.`interval`,
count(cd.`sess_id`) AS `calls`,
(count(cd.`sess_id`) - sum(IF(cd.`cont_disp` = 'ABANDONED'
AND cd.`met_sla` > 0,
1,
0))) AS `presented`
FROM
t_date_interval_30 di
LEFT JOIN
v_call_details cd ON (di.`date` = cd.`date`
AND di.`interval` = SEC_TO_TIME((TIME_TO_SEC(cd.`call_start`) DIV 1800) * 1800))
WHERE
di.`date` BETWEEN '2013-05-01' AND '2013-05-02'
GROUP BY cd.`csq`, di.`date`, di.`interval`
I have never really worked with indexes (though I have tried adding a few to the DATE values and CALL_START values). When I run an EXPLAIN EXTENDED I get the below results.
id, select_type, table, type, possible_keys, key, key_len, ref, rows, filtered, Extra
1, PRIMARY, di, range, i_date, i_date, 3, , 96, 100.00, Using where; Using temporary; Using filesort
1, PRIMARY, <derived2>, ALL, , , , , 153419, 100.00, ,
2, DERIVED, t_cisco_csq_agent_details, ALL , , , , 161925, 100.00, Using temporary; Using filesort
2, DERIVED, t_lkp_clients, ALL , , , , 56, 100.00, ,
Any advice would be greatly appreciated. Right now if I run the query, returning results for 2 days worth of data takes roughly 70 seconds. At that rate, doing a 90 day report will take an hour and a half... I need to find a way to bring that down.
First, don't assume that 90 days worth of data will require 45 times the effort of 2 days. Your query is doing a full scan of the call details table, and this may account for much of the effort. MySQL can propagate the condition on date from di to cd through the equijoin. I'm not sure if it does in this case (because of the second condition).
Second, you are using a view. That might make it impossible to actually improve performance. You can try, but you should try to write the query without the view.
My next question is how long does this take to run:
select cd.csq, cd.`date`,
SEC_TO_TIME((TIME_TO_SEC(cd.`call_start`) DIV 1800) * 1800)) as interval,
count(*)
from v_call_details cd
WHERE cd.`date` BETWEEN '2013-05-01' AND '2013-05-02';
If this takes a reasonable amount of time, then test it for 90 days. If that works, then you can do the aggregation first and then join back to the di table. This is just an idea. I suspect the real performance problem is in the view.

getting sum of count() in groupby

Dont know if i am breaking comuntiy guidelines by posting a continuation question as new question. If so. I am sorry!!
Now, using,
SELECT count(alertid) as cnt,date(alertdate) as alertDate
FROM alertmaster a,subscriptionmaster s
WHERE alertDate BETWEEN DATE_SUB(CURDATE(),INTERVAL 7 DAY) AND CURDATE()
GROUP BY date(alertDate),s.subId
ORDER BY a.alertDate DESC;
produces:
13, '2011-04-08'
13, '2011-04-08'
13, '2011-04-08'
14, '2011-04-07'
13, '2011-04-07'
Where I want is:
39, '2011-04-08'
27, '2011-04-07'
How to achieve this?
The reason you are getting more than one row per date is because you have GROUP BY date(alertDate),s.subId. Just change your GROUP BY to
GROUP BY date(alertDate)
If you don't actually want separate groups for each s.subId,date combination.
Also the code you posted is missing a JOIN condition. This is one reason why using the explicit (ANSI 92) JOIN syntax is preferred.