MySQL Complex Inner Join - mysql

Suppose equity has a column called TickerID. I would like to replace the 111's with equity.TickerID. MySQL can't seem to resolve the scope and returns an unknown column when I try that. This SQL statement works but I need to run it for each ticker. Would be nice if I could get a full table.
SELECT Ticker,
IF(tbl_m200.MA200_Count = 200,tbl_m200.MA200,-1) AS MA200,
IF(tbl_m50.MA50_Count = 50,tbl_m50.MA50,-1) AS MA50,
IF(tbl_m20.MA20_Count = 20,tbl_m20.MA20,-1) AS MA20
FROM equity
INNER JOIN
(SELECT TickerID,AVG(Y.Close) AS MA200,COUNT(Y.Close) AS MA200_Count FROM
(
SELECT Close,TickerID FROM equity_pricehistory_daily
WHERE TickerID = 111
ORDER BY Timestamp DESC LIMIT 0,200
) AS Y
) AS tbl_m200
USING(TickerID)
INNER JOIN
(SELECT TickerID,AVG(Y.Close) AS MA50,COUNT(Y.Close) AS MA50_Count FROM
(
SELECT Close,TickerID FROM equity_pricehistory_daily
WHERE TickerID = 111
ORDER BY Timestamp DESC LIMIT 50
) AS Y
) AS tbl_m50
USING(TickerID)
INNER JOIN
(SELECT TickerID,AVG(Y.Close) AS MA20,COUNT(Y.Close) AS MA20_Count FROM
(
SELECT Close,TickerID FROM equity_pricehistory_daily
WHERE TickerID = 111
ORDER BY Timestamp DESC LIMIT 0,20
) AS Y
) AS tbl_m20
USING(TickerID)

This seems to be some bug or "feature" of MySQL. Many persons seems to have the same problem with outer tables being out of scope.
Anyway... You could create functions that retrieve the information you want:
DROP FUNCTION IF EXISTS AveragePriceHistory_20;
CREATE FUNCTION AveragePriceHistory_20(MyTickerID INT)
RETURNS DECIMAL(9,2) DETERMINISTIC
RETURN (
SELECT AVG(Y.Close)
FROM (
SELECT Z.Close
FROM equity_pricehistory_daily Z
WHERE Z.TickerID = MyTickerID
ORDER BY Timestamp DESC
LIMIT 20
) Y
HAVING COUNT(*) = 20
);
SELECT
E.TickerID,
E.Ticker,
AveragePriceHistory_20(E.TickerID) AS MA20
FROM equity E;
You would get NULL instead of -1. If this is undesirable, you could wrap the function-call with IFNULL(...,-1).
Another way of solving this, would be to select for the time-frame, instead of using LIMIT.
SELECT
E.TickerID,
E.Ticker,
(
SELECT AVG(Y.Close)
FROM equity_pricehistory_daily Y
WHERE Y.TickerID = E.TickerID
AND Y.Timestamp > ADDDATE(CURRENT_TIMESTAMP, INTERVAL -20 DAY)
) AS MA20
FROM equity E;

Related

MYSQL error code: 1054 Unknown column in where clause. Error occurring in Nested SubQueries

I am trying to get through a problem where there are multiple accounts of same scheme on same customer id. On a given txn date I want to retrieve the total Sanctioned Limit and total utilized amount from these accounts. Below is the SQL query I have constructed.
SELECT
cust_id,
tran_date,
rollover_date,
next_rollover,
(
SELECT
acc_num as kcc_ac
FROM
dbzsubvention.acc_disb_amt a
WHERE
(a.tran_date <= AB.tran_date)
AND a.sch_code = 'xxx'
AND a.cust_id = AB.cust_id
ORDER BY
a.tran_date desc
LIMIT
1
) KCC_ACC,
(
SELECT
SUM(kcc_prod)
FROM
(
SELECT
prod_limit as kcc_prod,
acc_num,
s.acc_status
FROM
dbzsubvention.acc_disb_amt a
inner join dbzsubvention.acc_rollover_all_sub_status s using (acc_num)
left join dbzsubvention.acc_close_date c using (acc_num)
WHERE
a.cust_id = AB.cust_id
AND a.tran_date <= AB.tran_date
AND (
ac_close > AB.tran_date || ac_close is null
)
AND a.sch_code = 'xxx'
AND s.acc_status = 'R'
AND s.rollover_date <= AB.tran_date
AND (
AB.tran_date < s.next_rollover || s.next_rollover is null
)
GROUP BY
acc_num
order by
a.tran_date
) t
) kcc_prod,
(
SELECT
sum(disb_amt)
FROM
(
SELECT
disb_amt,
acc_num,
tran_date
FROM
(
SELECT
disb_amt,
a.acc_num,
a.tran_date
FROM
dbzsubvention.acc_disb_amt a
inner join dbzsubvention.acc_rollover_all_sub_status s using (acc_num)
left join dbzsubvention.acc_close_date c using (acc_num)
WHERE
a.tran_date <= AB.tran_date
AND (
c.ac_close > AB.tran_date || c.ac_close is null
)
AND a.sch_code = 'xxx'
AND a.cust_id = AB.cust_id
AND s.acc_status = 'R'
AND s.rollover_date <= AB.tran_date
AND (
AB.tran_date < s.next_rollover || s.next_rollover is null
)
GROUP BY
acc_num,
a.tran_date
order by
a.tran_date desc
) t
GROUP BY
acc_num
) tt
) kcc_disb
FROM
dbzsubvention.acc_disb_amt AB
WHERE
AB.cust_id = 'abcdef'
group by
cust_id,
tran_date
order by
tran_date asc;
This query isn't working. Upon research I have found that correlated subquery works only till 1 level down. However I couldn't get a workaround to this problem.
I have tried searching the solution around this problem but couldn't find the desired one. Using the SUM function at the inner query will not give desired results as
In the second subquery that will sum all the values in column before applying the group by clause.
In third subquery the sorting has to be done first then the grouping and finally the sum.
Therefore I am reaching out to the community for help to suggest a workaround to the issue.
You're correct - external column cannot be transferred through the nesting level immediately.
Try this workaround:
SELECT ... -- outer query
( -- correlated subquery nesting level 1
SELECT ...
( -- correlated subquery nesting level 2
SELECT ...
...
WHERE table0_level1.column0_1 ... -- moved value
)
FROM table1
-- move through nesting level making it a source of current level
CROSS JOIN ( SELECT table0.column0 AS column0_1 ) AS table0_level1
) AS ...,
...
FROM table0
...

Mysql query return too slow

I have written a query. It works better. But currently, all tables have 100K rows, and one of my queries returns too slow. Can you please suggest to me how I can optimize the query?
select *
from tbl_xray_information X
WHERE locationCode = (SELECT t.id
from tbl_location t
where CODE = '202')
AND ( communicate_with_pt is NULL || communicate_with_pt='')
AND x.patientID NOT IN (SELECT patientID
FROM tbl_gxp_information
WHERE center_id = '202')
order by insertedON desc LIMIT 2000
Please note here 'patientID' is varchar.
This may run faster:
select *
from tbl_xray_information AS X
WHERE locationCode =
( SELECT t.id
from tbl_location t
where CODE = '202'
)
AND ( x.communicate_with_pt is NULL
OR x.communicate_with_pt = '' )
AND NOT EXISTS ( SELECT 1 FROM tbl_gxp_information
WHERE x.patientID = patientID
AND center_id = '202' )
order by insertedON desc
LIMIT 2000
These indexes may help:
tbl_location: INDEX(CODE)
tbl_gxp_information: INDEX(center_id, patientID) -- (either order)
Since OR is poorly optimized, it may be better to pick either NULL or empty-string for communicate_with_pt (to avoid testing for both).

Efficiently Select the latest row grouped by a column

I have a table:
QUOTE
| id | value | mar_id | date |
And I am trying to select the latest row for each mar_id (market id). I have managed to achieve what I need from the query below:
SELECT
q.*
FROM quote q
WHERE q.date = (
SELECT MAX(q1.date)
FROM quote q1
WHERE q.mar_id = q1.mar_id
)
However I find that the query is incredibly slow (>60s), to the extent my database kills the connection.
I did an EXPLAIN to find out why and got the result:
I have a composite unique index QUO_UQ on mar_id, date which appears to be getting used.
logically this doesn't seem such a tough query to run, what can I do to do this more efficiently?
An example of an uncorrelated subquery
SELECT x.*
FROM quote x
JOIN
( SELECT mar_id
, MAX(date) date
FROM quote
GROUP
BY mar_id
) y
ON y.mar_id = x.mar_id
AND y.date = x.date;
select * from (
select mar_id, [date],row_number() over (partition by mar_id order by [date] desc ) as [Rank] from
qoute
group by mar_id, [date]) q where Rank = 1
Your query is fine:
SELECT q.*
FROM quote q
WHERE q.date = (SELECT MAX(q1.date)
FROM quote q1
WHERE q.mar_id = q1.mar_id
);
I recommend an index on quote(mar_id, date). This is probably the fastest method to get your result.
EDIT:
I'm curious if you find that this uses the index:
SELECT q.*
FROM quote q
WHERE q.date = (SELECT q1.date
FROM quote q1
WHERE q.mar_id = q1.mar_id
ORDER BY q1.date DESC
LIMIT 1
);

Select Status That Last for More Than 1 Second

I have got a problem looks simple, but I could not find the solution.
So, I have got a table with two cols like this:
Time Status
00:00:00.111 Off
00:00:00.222 On
00:00:00.345 On
00:00:01.555 On
00:00:01.666 Off
00:00:02.222 On
00:00:02.422 On
00:00:02.622 Off
00:00:05.888 Off
00:00:05.999 Off
I want to select all statuses of On which lasted for more than 1 second,
in this example, I want the sequence:
00:00:00.222 On
00:00:00.345 On
00:00:01.555 On
Could you guys give me any clue? Many thanks!
A simple GROUP BY and SUM can not do this on your current dataset, so my idea is to add a helper column:
CREATE TABLE someTable(
`time` DATETIME,
status CHAR(3),
helperCol INT
);
The helperCol is an INT and will be set as follows:
CREATE PROCEDURE setHelperCol()
BEGIN
DECLARE finished,v_helperCol INT;
DECLARE status CHAR(3);
DECLARE ts DATETIME;
DECLARE CURSOR st FOR SELECT `time`,status,helperCol FROM someTable WHERE helperCol IS NOT NULL; -- Handy for re-use: No need to go over all data, so you can save the helperCol as permanent value.
DECLARE CONTINUE HANDLER FOR NOT FOUND SET finished = 1;
SELECT #maxVal:=MAX(helperCol) FROM helperCol;
SET finished=0;
SET helperCol=#maxVal;
IF(!helperCol>0) SET helperCol=1;
OPEN st;
FETCH ts,status,v_helperCol FROM st;
WHILE(finished=0) DO
IF(status='Off') v_helperCol=v_helperCol+1;
UPDATE someTable SET helperCol=v_helperCol WHERE `time`=ts; -- Assuming `time` is unique;
FETCH ts,status,v_helperCol FROM st;
END WHILE;
CLOSE st;
END;
Execute the procedure and the result is:
Time Status helperCol
00:00:00.111 Off 2
00:00:00.222 On 2
00:00:00.345 On 2
00:00:01.555 On 2
00:00:01.666 Off 3
00:00:02.222 On 3
00:00:02.422 On 3
00:00:02.622 Off 4
This can now be grouped and processed:
SELECT MAX(`time`)-MIN(`time`) AS diffTime
FROM someTable
WHERE status='ON'
GROUP BY helperCol
HAVING MAX(`time`)-MIN(`time`)>1;
The result of that is (you need to search for the correct datetime functions to apply in the MAX-MIN part):
1.333
Alternative:
You can also process the MAX-MIN in the stored procedure, but that would not be efficiently repeatable as the helperColumn solution is.
SELECT a.time start
, MIN(c.time) end
, TIMEDIFF(MIN(c.time),a.time) duration
FROM
( SELECT x.*, COUNT(*) rank FROM my_table x JOIN my_table y ON y.time <= x.time GROUP BY time ) a
LEFT
JOIN
( SELECT x.*, COUNT(*) rank FROM my_table x JOIN my_table y ON y.time <= x.time GROUP BY time ) b
ON b.status = a.status
AND b.rank = a.rank - 1
JOIN
( SELECT x.*, COUNT(*) rank FROM my_table x JOIN my_table y ON y.time <= x.time GROUP BY time ) c
ON c.rank >= a.rank
LEFT
JOIN
( SELECT x.*, COUNT(*) rank FROM my_table x JOIN my_table y ON y.time <= x.time GROUP BY time ) d
ON d.status = c.status
AND d.rank = c.rank + 1
WHERE b.rank IS NULL
AND d.rank IS NULL
AND a.status = 1
GROUP
BY a.time
HAVING duration >= 1;
Another, faster, method might be along these lines - unfortunately I don't think the data types and functions in my version of MySQL support fractions of a second, so this is probably a little bit wrong (there may also be a logical error)...
SELECT time
, status
, cumulative
FROM
( SELECT *
, CASE WHEN #prev = status THEN #i:=#i+duration ELSE #i:=0 END cumulative
, #prev:=status
FROM
( SELECT x.*
, TIME_TO_SEC(MIN(y.time))-TIME_TO_SEC(x.time) duration
FROM my_table x
JOIN my_table y
ON y.time > x.time
GROUP
BY x.time
) n
ORDER
BY time
) a
WHERE cumulative >= 1
AND status = 1;

Checking for maximum length of consecutive days which satisfy specific condition

I have a MySQL table with the structure:
beverages_log(id, users_id, beverages_id, timestamp)
I'm trying to compute the maximum streak of consecutive days during which a user (with id 1) logs a beverage (with id 1) at least 5 times each day. I'm pretty sure that this can be done using views as follows:
CREATE or REPLACE VIEW daycounts AS
SELECT count(*) AS n, DATE(timestamp) AS d FROM beverages_log
WHERE users_id = '1' AND beverages_id = 1 GROUP BY d;
CREATE or REPLACE VIEW t AS SELECT * FROM daycounts WHERE n >= 5;
SELECT MAX(streak) AS current FROM ( SELECT DATEDIFF(MIN(c.d), a.d)+1 AS streak
FROM t AS a LEFT JOIN t AS b ON a.d = ADDDATE(b.d,1)
LEFT JOIN t AS c ON a.d <= c.d
LEFT JOIN t AS d ON c.d = ADDDATE(d.d,-1)
WHERE b.d IS NULL AND c.d IS NOT NULL AND d.d IS NULL GROUP BY a.d) allstreaks;
However, repeatedly creating views for different users every time I run this check seems pretty inefficient. Is there a way in MySQL to perform this computation in a single query, without creating views or repeatedly calling the same subqueries a bunch of times?
This solution seems to perform quite well as long as there is a composite index on users_id and beverages_id -
SELECT *
FROM (
SELECT t.*, IF(#prev + INTERVAL 1 DAY = t.d, #c := #c + 1, #c := 1) AS streak, #prev := t.d
FROM (
SELECT DATE(timestamp) AS d, COUNT(*) AS n
FROM beverages_log
WHERE users_id = 1
AND beverages_id = 1
GROUP BY DATE(timestamp)
HAVING COUNT(*) >= 5
) AS t
INNER JOIN (SELECT #prev := NULL, #c := 1) AS vars
) AS t
ORDER BY streak DESC LIMIT 1;
Why not include user_id in they daycounts view and group by user_id and date.
Also include user_id in view t.
Then when you are queering against t add the user_id to the where clause.
Then you don't have to recreate your views for every single user you just need to remember to include in your where clause.
That's a little tricky. I'd start with a view to summarize events by day:
CREATE VIEW BView AS
SELECT UserID, BevID, CAST(EventDateTime AS DATE) AS EventDate, COUNT(*) AS NumEvents
FROM beverages_log
GROUP BY UserID, BevID, CAST(EventDateTime AS DATE)
I'd then use a Dates table (just a table with one row per day; very handy to have) to examine all possible date ranges and throw out any with a gap. This will probably be slow as hell, but it's a start:
SELECT
UserID, BevID, MAX(StreakLength) AS StreakLength
FROM
(
SELECT
B1.UserID, B1.BevID, B1.EventDate AS StreakStart, DATEDIFF(DD, StartDate.Date, EndDate.Date) AS StreakLength
FROM
BView AS B1
INNER JOIN Dates AS StartDate ON B1.EventDate = StartDate.Date
INNER JOIN Dates AS EndDate ON EndDate.Date > StartDate.Date
WHERE
B1.NumEvents >= 5
-- Exclude this potential streak if there's a day with no activity
AND NOT EXISTS (SELECT * FROM Dates AS MissedDay WHERE MissedDay.Date > StartDate.Date AND MissedDay.Date <= EndDate.Date AND NOT EXISTS (SELECT * FROM BView AS B2 WHERE B1.UserID = B2.UserID AND B1.BevID = B2.BevID AND MissedDay.Date = B2.EventDate))
-- Exclude this potential streak if there's a day with less than five events
AND NOT EXISTS (SELECT * FROM BView AS B2 WHERE B1.UserID = B2.UserID AND B1.BevID = B2.BevID AND B2.EventDate > StartDate.Date AND B2.EventDate <= EndDate.Date AND B2.NumEvents < 5)
) AS X
GROUP BY
UserID, BevID