Help with a nested query in MySQL - mysql

I am trying to do some calculations based on record from master table and wanted store manipulated result into a separate test table.
>Table:Master:
>C1 C2 C3 C4
>---------- -------- -- --
>2011-02-19 Test-A 31 3
>2011-02-19 Test-B 34 3
>2011-02-19 Test-C 17 1
>2011-02-15 Test-A* 48 =I 4
>2011-02-15 Test-B 64 6
>2011-02-15 Test-C 55 5
>2011-02-11 Test-A 64 =I2 6
>2011-02-11 Test-B 53 5
>2011-02-11 Test-C 17 1
>2011-02-10 Test-A 12 =I3 1 =J
>2011-02-10 Test-B 02 0
>2011-02-10 Test-C 54 5
Three kinds of test conducted in random fashions in a same day; but for this case date is not much important; only last three test records are used for the calculation.
I am trying to perform sequential calculations as below; using 3rd oldest element. for example, for test A, I(iteration) will be 48 (3rd oldest record = column c3) and therefore R2 & R3 will be calculated based on I2 & I3. And at last displaying average of, R,R2,R3 - J. ( C4 = latest record. )
Expected result:
>Table:Test-A
>SR Date I I2 I3 I4
>-- ---------- ----- ----------- ----------- -------------------
>1 2011/02/17 48 -52.96 -24.18 -10.71
>Formula:
>SR Date R R2 R3 R4
>-- ---------- ----- ----------- ----------- -------------------
>1 today() 48=C3 (I*0.23-I2) (I*0.23-I3) =avg(I,I1,I2,I3)-C4
I guess I need to use sub/nested query with join, but i couldn't able to figure out how to handle I; all result will be placed in individual test tables. Your input will be much appreciated. TIA

Setup test case:
CREATE TABLE `m1`
(c1 DATE
,c2 VARCHAR(6)
,c3 SMALLINT
,c4 TINYINT
) DEFAULT CHARSET=latin1;
INSERT INTO `m1` VALUES
('2011-02-19','Test-A',31,3)
,('2011-02-19','Test-B',34,3)
,('2011-02-19','Test-C',17,1)
,('2011-02-15','Test-A',48,4)
,('2011-02-15','Test-B',64,6)
,('2011-02-15','Test-C',55,5)
,('2011-02-11','Test-A',64,6)
,('2011-02-11','Test-B',53,5)
,('2011-02-11','Test-C',17,1)
,('2011-02-10','Test-A',12,1)
,('2011-02-10','Test-B',02,0)
,('2011-02-10','Test-C',54,5);
This query makes use of one local variable (#i). Provide the test_name ('Test-A') and the date ('2011-02-17') in the query, shown as literals here.
SELECT o.tn AS `Test`
, o.dt AS `Date`
, SUM(CASE WHEN o.n = 1 THEN o.c3*1.00 ELSE NULL END) AS R
, SUM(CASE WHEN o.n = 1 THEN o.c3*0.23 WHEN o.n = 2 THEN -1.00*o.c3 ELSE NULL END) AS R2
, SUM(CASE WHEN o.n = 1 THEN o.c3*0.23 WHEN o.n = 3 THEN -1.00*o.c3 ELSE NULL END) AS R3
, AVG(CASE WHEN o.n < 4 THEN c3*1.00 ELSE NULL END)-SUM(CASE WHEN n = 3 THEN c4*1.00 ELSE NULL END) AS R4
FROM (
SELECT #i := #i + 1 AS n
, s.tn
, s.dt
-- , m.c1
, m.c3
, m.c4
FROM (SELECT '2011-02-17' AS dt,_latin1'Test-A' AS tn, #i := 0) s
JOIN m1 m
ON m.c2 = s.tn AND m.c1 <= s.dt
ORDER BY m.c1 DESC
LIMIT 0,3
) o
GROUP BY o.tn, o.dt
HAVING SUM(1) >= 3
You can run just the inner query, uncomment the m.c1 from the select list, to check the rows returned (1st, 2nd and 3rd latest, prior to the supplied date.
This query returns a different value for R3 than shown in the question, but the result returned by the query appears to be the correct result for the given formula.
Also, the formula for R4 references 5 values: avg(I,I1,I2,I3)-J3. The formula used in the query is effectively =avg(I1,I2,I3)-J3
To get the result for all tests, as of a given date:
SELECT o.tn AS `Test`
, o.dt AS `Date`
, SUM(CASE WHEN o.n = 1 THEN o.c3 ELSE NULL END) AS R
, SUM(CASE WHEN o.n = 1 THEN o.c3*0.23 WHEN o.n = 2 THEN -1.00*o.c3 ELSE NULL END) AS R2
, SUM(CASE WHEN o.n = 1 THEN o.c3*0.23 WHEN o.n = 3 THEN -1.00*o.c3 ELSE NULL END) AS R3
, AVG(CASE WHEN o.n <= 3 THEN c3*1.00 ELSE NULL END)-SUM(CASE WHEN n = 3 THEN c4 ELSE NULL END) AS R4
FROM (
SELECT #i := CASE WHEN #prev_tn = m.c2 THEN #i + 1 ELSE 1 END AS n
, #prev_dt := s.dt AS dt
, #prev_tn := m.c2 AS tn
, m.c1
, m.c3
, m.c4
FROM (SELECT '2011-02-17' AS dt, #i := 0, #prev_tn := NULL) s
JOIN m1 m
ON m.c1 <= s.dt
ORDER BY s.dt, m.c2, m.c1 DESC
) o
GROUP BY o.tn, o.dt
HAVING SUM(1) >= 3
(The HAVING clause guarantees that the query returns results only if there are at least three rows for a given test, preceding the given date.) Here is the query output for two different dates, the 17th and the 20th:
Test Date R R2 R3 R4
------ ---------- -- ------ ------ -----
Test-A 2011-02-17 48 -52.96 -0.96 40.33
Test-B 2011-02-17 64 -38.28 12.72 39.67
Test-C 2011-02-17 55 -4.35 -41.35 37.00
Test Date R R2 R3 R4
------ ---------- -- ------ ------ -----
Test-A 2011-02-20 31 -40.87 -56.87 41.67
Test-B 2011-02-20 34 -56.18 -45.18 45.33
Test-C 2011-02-20 17 -51.09 -13.09 28.67
(The query would be somewhat more involved, to get results for more than one date.)
This may not be the best way to solve the problem, but I've successfully used this approach with MySQL.

Related

MySQL: count appearance of sequences

i have table with:
id
mode
1
B
2
B
3
A
4
A
5
A
6
A
7
B
8
B
9
C
10
C
11
C
12
B
13
A
14
A
15
A
16
B
17
C
18
B
19
C
20
B
21
B
i would like to count following sequences:
"start": xA -> xB -> xC
"stop": xC -> xB ->xA
so that final result for this table would be:
START = 2 (ID: 3-11, 13-17)
STOP = 1 (ID: 9-15)
Point is that i need to count only right mode changes, no matter how many times mode is recorded.
Can anybody help? (tnx!)
ok, i got this solved with:
select
sum(case when prev= "A" and nxt="C" then 1 else 0 end) as start,
sum(case when prev= "C" and nxt="A" then 1 else 0 end) as stop
from (
select id, prev, mode, nxt
from (
select
id,
LAG(mode)OVER (
partition by 1
order by id
) prev,
mode,
LEAD(mode)OVER (
partition by 1
order by id
) nxt
from prva
order by id
) sub
where mode <> nxt and mode="B"
) data

Unable to date range query results in MySQL

I am having trouble querying the correct date range. My query does not seem to consider the Where clause for post_date > (date provided).
SELECT `code`,
`description`,
SUM( IF( month = 3 && year = 2018, monthly_quantity_total, 0 ) ) AS monthlyqt,
SUM( IF( month = 3 && year = 2018, monthly_price_total, 0 ) ) AS monthlypt,
SUM( monthly_quantity_total ) AS yearlyqt,
SUM( monthly_price_total ) AS yearlypt
FROM (
SELECT `invoices_items`.`code`,
`invoices_items`.`description`,
SUM( invoices_items.discounted_price * invoices_items.quantity_supplied ) AS monthly_price_total,
SUM( invoices_items.quantity_supplied ) AS monthly_quantity_total,
YEAR( invoices_items.datetime_created ) AS year,
MONTH( invoices_items.datetime_created ) AS month
FROM `invoices_items`
JOIN `invoices` ON `invoices`.`id` = `invoices_items`.`invoice_id`
WHERE `invoices`.`is_finalised` = 1
AND `invoices`.`post_date` > 2018-02-28
AND `invoices_items`.`type` = 1
GROUP BY `year`, `month`, `invoices_items`.`code`
UNION ALL
SELECT `credit_notes_items`.`code`,
`credit_notes_items`.`description`,
SUM( credit_notes_items.discounted_price * credit_notes_items.quantity_supplied * -1 ) AS monthly_price_total,
SUM( credit_notes_items.quantity_supplied ) AS monthly_quantity_total,
YEAR( credit_notes_items.datetime_created ) AS year,
MONTH( credit_notes_items.datetime_created ) AS month
FROM `credit_notes_items`
JOIN `credit_notes` ON `credit_notes`.`id` = `credit_notes_items`.`credit_note_id`
WHERE `credit_notes`.`is_finalised` = 1
AND `credit_notes`.`post_date` > 2018-02-28
AND `credit_notes_items`.`type` = 1
GROUP BY `year`, `month`, `credit_notes_items`.`code`
) AS sub
GROUP BY code;
There are basically 4 tables being queried here. invoices, invoices_items, credit_notes and credit_notes_items.
table 1 - invoices
id post_date is_finalised
1 2018-01-01 1
2 2018-02-01 1
3 2018-03-01 1
table 2 - invoices_items
id invoice_id code description discounted_price quantity_total type
1 1 TEST-01 Test product 9.99 1 1
2 1 TEST-01 Test product 9.99 2 1
3 2 TEST-01 Test product 9.99 5 1
4 3 TEST-01 Test product 9.99 5 1
I have give some example rows above. From the table 1 and 2 above the desired result should be;
Desired Output
code description monthyqt monthlypt yearlyqt yearlypt
TEST-01 Test product 5 49.95 5 49.95
However the output I am receiving is as below;
Received Output
code description monthyqt monthlypt yearlyqt yearlypt
TEST-01 Test product 5 49.95 13 129.87
The query works as intended except for the date range I am trying to achieve by using the Where clause. You can see I am trying to filter out any row which are not matching invoices.post_date > 2018-02-28 (and also credit_notes.post_date > 2018-02-28).
I am not sure what I have done wrong here but any help will be much appreciated.
Thanks in advance.
I worked it out. Basically the variable which was parsing the date value needed to be encapsulated in single quotes.
e.g.
invoices.post_date > '2018-02-28'
It's always something this simple which throws you off.

How to Efficiently Find Number of Specific Day Between Two Dates in MySQL?

Different variations of this question have been asked before, but none for the use case that I'm looking for. I'd like to find the specific number of weekdays between two dates for each row of a MySQL table and then update a column of each row with the result of that operation. This is part of an ETL process, and I'd like to keep this in a stored procedure if at all possible.
Data
Dates are of DATE type and I'd like to find the number of a specific because I have 7 day columns that have a flag if a record occurs on that day of the week. Like this (1 is Monday):
day_1 | day_2 | day_3 | day_4 | day_5 | day_6 | day_7
----- | ----- | ----- | ----- | ----- | ----- | -----
0 | 1 | 0 | 1 | 1 | 0 | 1
Example Use Case
I'm doing this because I'm trying to find the frequency of rows for a timeframe that's not available in the input data (call it input). So for a record that had start and end date values of 2016-01-01 and 2016-03-01, I'd want to know how often that record would have occurred only from 2016-01-01 to 2016-01-31, inclusive. I initially tried to do this by making a table that contained all datevalues for many years into the future like:
datevalue
---------
2016-01-01
2016-01-02
...
and then joining input to that table on start_date and end_date and then aggregating up while counting the number of each day like this:
SUM(CASE WHEN WEEKDAY(B.datevalue) + 1 = 1 THEN 1 ELSE 0 END) * day_1 +
SUM(CASE WHEN WEEKDAY(B.datevalue) + 1 = 2 THEN 1 ELSE 0 END) * day_2 +
SUM(CASE WHEN WEEKDAY(B.datevalue) + 1 = 3 THEN 1 ELSE 0 END) * day_3 +
SUM(CASE WHEN WEEKDAY(B.datevalue) + 1 = 4 THEN 1 ELSE 0 END) * day_4 +
SUM(CASE WHEN WEEKDAY(B.datevalue) + 1 = 5 THEN 1 ELSE 0 END) * day_5 +
SUM(CASE WHEN WEEKDAY(B.datevalue) + 1 = 6 THEN 1 ELSE 0 END) * day_6 +
SUM(CASE WHEN WEEKDAY(B.datevalue) + 1 = 7 THEN 1 ELSE 0 END) * day_7 AS adj_total_frequency
That worked perfectly on a smaller dataset, but input has > 30 million records, and when I tried running on that procedure it ran for 36 hours before I killed it.
Is there a more efficient way of doing this in MySQL?
Too long for a comment but, combining with the pre-calculation of weekday I originally suggested, how much does this (using a single SUM with a complete CASE) work out for you?
SUM(CASE WHEN B.weekdayval = 1 AND day_1 THEN 1
WHEN B.weekdayval = 2 AND day_2 THEN 1
WHEN B.weekdayval = 3 AND day_3 THEN 1
WHEN B.weekdayval = 4 AND day_4 THEN 1
WHEN B.weekdayval = 5 AND day_5 THEN 1
WHEN B.weekdayval = 6 AND day_6 THEN 1
WHEN B.weekdayval = 7 AND day_7 THEN 1
ELSE 0 END) AS adj_total_frequency
actually this could be better; it could theoretically mean B.weekdayval only gets compared once per row (I say theoretically because MySQL does not guarantee irrelevant THEN clauses will not be evaluated, just not "returned" from the CASE).
SUM(CASE WHEN day_1 THEN B.weekdayval = 1
WHEN day_2 THEN B.weekdayval = 2
WHEN day_3 THEN B.weekdayval = 3
WHEN day_4 THEN B.weekdayval = 4
WHEN day_5 THEN B.weekdayval = 5
WHEN day_6 THEN B.weekdayval = 6
WHEN day_7 THEN B.weekdayval = 7
ELSE 0 END) AS adj_total_frequency
Edit: As far as the datesub method goes, I don't have the time to write a full solution, but to start you (or other potential answerers) on that...
I meant DATEDIFF
you can get the number of whole weeks between the start and end with DATEDIFF(end, start) DIV 7
multiply that by the number of days in a week that apply to get an approximation
then (the hardest part), figure out the number of days to add for the fractional week not covered by div.
(Sometimes) MySQL has big troubles optimizing GROUP BY statements with a JOIN. To overcome that you can store the joined result into a temporary table so you can use GROUP BY with one table.
drop temporary table if exists tmp;
create temporary table tmp (id int unsigned not null)
engine=myisam
select i.id
from input i
straight_join dates B
on B.datevalue >= i.`start`
and B.datevalue < i.`end`
where (
(WEEKDAY(B.datevalue ) = 0) AND i.day_7 OR
(WEEKDAY(B.datevalue ) = 1) AND i.day_1 OR
(WEEKDAY(B.datevalue ) = 2) AND i.day_2 OR
(WEEKDAY(B.datevalue ) = 3) AND i.day_3 OR
(WEEKDAY(B.datevalue ) = 4) AND i.day_4 OR
(WEEKDAY(B.datevalue ) = 5) AND i.day_5 OR
(WEEKDAY(B.datevalue ) = 6) AND i.day_6
)
-- and i.id > 000000
-- and i.id <= 100000
;
drop temporary table if exists tmp1;
create temporary table tmp1 (id int unsigned not null, cnt int unsigned not null)
engine=myisam
select id, count(1) as cnt
from tmp
group by id
;
update input i
join tmp1 using(id)
set i.numdays = tmp1.cnt
where 1=1;
My test data contains 1M rows with random day bits (round(rand())) and an average date range of 50 days. So the tmp table contains about 25M rows.
On my system it takes about 500 msec for 10K rows, 5 sec for 100K rows and 2 mins for 1M rows. So if you split the updates in chunks of 100K rows (using the commented id range condition in the first statement) you should be ready in about 30 minutes.

How to build this query?

i have table
http://oi58.tinypic.com/2s7xreo.jpg
id action_date type item_id quantity
--- ----------- ---- ------- --------
87 4/25/2014 1 s-1 100
88 4/1/2014 1 s-1 150
89 4/4/2014 1 s-1 200
90 4/3/2014 1 s-2 222
91 4/7/2014 1 s-2 10
96 4/4/2014 1 s-2 8
97 4/22/2014 1 s-2 8
98 4/21/2014 2 s-1 255
99 4/5/2014 2 s-1 6
100 4/6/2014 2 s-2 190
101 4/6/2014 2 s-3 96
102 4/8/2014 2 s-1 120
103 4/15/2014 2 s-2 3
104 4/16/2014 2 s-2 3
type column which mean if 1 this is in item to my shop >>> if 2 this is out item from my shop >>
i need query to give me result like this
item in out net
s1 300 195 105
and so on >>
how to write query that give me this result >>
and if i must but them in tow table >> table for the in and table for the out if that help >>> how it build the query >
and thanx in advance :)
note :: i am work on access
One way to get the result is to use an aggregate function on an expression that conditionally returns the value of quantity, based on the value of type.
SELECT t.item
, SUM(CASE WHEN t.type = 1 THEN t.quantity ELSE 0 END) AS in_
, SUM(CASE WHEN t.type = 2 THEN t.quantity ELSE 0 END) AS out_
, SUM(CASE WHEN t.type = 1 THEN t.quantity ELSE 0 END)
- SUM(CASE WHEN t.type = 2 THEN t.quantity ELSE 0 END) AS net_
FROM mytable t
GROUP BY t.item
This approach works in Oracle, as well as MySQL and SQL Server.
(If you remove the SUM() aggregate function and the GROUP BY clause, you can see how that CASE expression is working. The query above gives the result you specified, this one is just a demonstration that helps "explain" how that query works.)
SELECT t.item
, CASE WHEN t.type = 1 THEN t.quantity ELSE 0 END AS in_
, CASE WHEN t.type = 2 THEN t.quantity ELSE 0 END AS out_
, t.*
FROM mytable t
UPDATE
Unfortunately, Microsoft Access doesn't support CASE expressions. But Access does have an iif function. The same approach should work, the syntax might be something like this:
SELECT t.item
, SUM(iif(t.type = 1, t.quantity, 0) AS in_
, SUM(iif(t.type = 2, t.quantity, 0) AS out_
, SUM(iif(t.type = 1, t.quantity, 0)
- SUM(iif(t.type = 2, t.quantity, 0) AS net_
FROM mytable t
GROUP BY t.item
SELECT item_id,
In_col,
Out_col,
(In_col - Out_col) AS Net
FROM
(SELECT item_id,
sum(CASE WHEN TYPE = 1 THEN quantity END) AS In_col,
sum(CASE WHEN TYPE = 2 THEN quantity END) AS Out_col
FROM TABLE
GROUP BY item_id) AS t1;
I hope this what you need.

Select All Columns By Most Recent Date and Highest Version

I have been stumped on this for quite awhile. Request#, SlotId, Segment, and Version all make up the primary key. What i want from my stored proc is to be able to retrieve all rows by passing in the Request # and Segment, but for each slot i want the most recent effective date on or before todays date and from that i need the highest version #. I appriciate your time.
Values in database
Request# SlotId Segment Version Effective Date ContentId
A123 1 A 1 2012-01-01 1
A123 2 A 1 2012-01-01 2
A123 2 A 2 2012-02-01 34
A123 2 A 3 2012-02-01 24
A123 2 A 4 2015-01-01 6 //beyond todays date. dont want
Values I want to return from my stored proc when i pass in A123 for Request # and A for Segment.
A123 1 A 1 2012-01-01 1
A123 2 A 3 2012-02-01 24
The query could be written like this:
; WITH cte AS
( SELECT Request, SlotId, Segment, Version, [Effective Date], ContentId,
ROW_NUMBER() OVER ( PARTITION BY Request, Segment, SlotId
ORDER BY Version DESC ) AS RowN
FROM
tableX
WHERE
Request = #Req AND Segment = #Seg --- the 2 parameters
AND [Effective Date] < DATEADD(day, 1, GETDATE())
)
SELECT Request, SlotId, Segment, Version, [Effective Date], ContentId
FROM cte
WHERE Rn = 1 ;
Consider this:
;
WITH A as
(
SELECT DISTINCT
Request
, Segment
, SlotId
FROM Table1
)
SELECT A.Request
, A.SlotId
, A.Segment
, B.EffectiveDate
, B.Version
, B.ContentID
FROM A
JOIN (
SELECT Top 1
Request
, SlotId
, Segment
, EffectiveDate
, Version
, ContentId
FROM Table1 t1
WHERE t1.Request = A.Request
AND t1.SlotId = A.SlotId
AND T1.Segment = A.Segment
AND T1.EffectiveDate <= GetDate()
ORDER BY
T1.EffectiveDate DESC
, T1.Version DESC
) as B
ON A.Request = B.Request
AND A.SlotId = B.SlotId
AND A.Segment = B.Segment