Moving average query MS Access - ms-access

I am trying to calculate the moving average of my data. I have googled and found many examples on this site and others but am still stumped. I need to calculate the average of the previous 5 flow for the record selected for the specific product.
My Table looks like the following:
TMDT Prod Flow
8/21/2017 12:01:00 AM A 100
8/20/2017 11:30:45 PM A 150
8/20/2017 10:00:15 PM A 200
8/19/2017 5:00:00 AM B 600
8/17/2017 12:00:00 AM A 300
8/16/2017 11:00:00 AM A 200
8/15/2017 10:00:31 AM A 50
I have been trying the following query:
SELECT b.TMDT, b.Flow, (SELECT AVG(Flow) as MovingAVG
FROM(SELECT TOP 5 *
FROM [mytable] a
WHERE Prod="A" AND [a.TMDT]< b.TMDT
ORDER BY a.TMDT DESC))
FROM mytable AS b;
When I try to run this query I get an input prompt for b.TMDT. Why is b.TMDT not being pulled from mytable?
Should I be using a different method altogether to calculate my moving averages?
I would like to add that I started with another method that works but is extremely slow. It runs fast enough for tables with 100 records or less. However, if the table has more than 100 records it feels like the query comes to a screeching halt.
Original method below.
I created two queries for each product code (There are 15 products): Q_ProdA_Rank and Q_ProdA_MovAvg
Q_ProdA_RanK (T_ProdA is a table with Product A's information):
SELECT a.TMDT, a.Flow, (Select count(*) from [T_ProdA]
where TMDT<=a.TMDT) AS Rank
FROM [T_ProdA] AS a
ORDER BY a.TMDT DESC;
Q_ProdA_MovAvg
SELECT b.TMDT, b.Flow, Round((Select sum(Flow) from [Q_PRodA_Rank] where
Rank between b.Rank-1 and (b.Rank-5))/IIf([Rank]<5,Rank-1,5),0) AS
MovingAvg
FROM [Q_ProdA_Rank] AS b;

The problem is that you're using a nested subquery, and as far as I know (can't find the right site for the documentation at the moment), variable scope in subqueries is limited to the direct parent of the subquery. This means that for your nested query, b.TMDT is outside of the variable scope.
Edit: As this is an interesting problem, and a properly-asked question, here is the full SQL answer. It's somewhat more complex than your try, but should run more efficiently
It contains a nested subquery that first lists the 5 previous flows for per TMDT and prod, then averages that, and then joins that in with the actual query.
SELECT A.TMDT, A.Prod, B.MovingAverage
FROM MyTable AS A LEFT JOIN (
SELECT JoinKeys.TMDT, JoinKeys.Prod, Avg(Top5.Flow) As MovingAverage
FROM (
SELECT JoinKeys.TMDT, JoinKeys.Prod, Top5.Flow
FROM MyTable As JoinKeys INNER JOIN MyTable AS Top5 ON JoinKeys.Prod = Top5.Prod
WHERE Top5.TMDT In (
SELECT TOP 5 A.TMDT FROM MyTable As A WHERE JoinKeys.Prod = A.Prod AND A.TMDT < JoinKeys.TMDT ORDER BY A.TMDT
)
)
GROUP BY JoinKeys.TMDT, JoinKeys.Prod
) AS B
ON A.Prod = B.JoinKeys.Prod AND A.TMDT = B.JoinKeys.TMDT
While in my previous version I advocated a VBA approach, this is probably more efficient, only more difficult to write and adjust.

Related

I would like to know if there is a better way to write this query (multiple joins of the same table)

here is the problem:
I have vehicles table in db (fields of this table are not so important), what's important is that each vehicle has a model_id, which refers to the vehicle_models table.
Vehicle models table has id, class, model, series, cm3hp, created_at and updated_at fields.
I need to define the stock age in terms of how many vehicles of the certain model class are on the stock by the given criteria. The criteria being: 0-30 days, 31-60 days, 61-90 days... 360 + days...
I don't know if it is clear enough but let me try to explain even better: For each day range I need to find the count of vehicles with the given model class. There are other criteria but that's not important for what I am trying to find out. To help you better understand the problem I'll include the screenshot of how the structure should look like:
I am using MySQL 8.
The query I wrote is:
SELECT DISTINCT vm.class,
IFNULL(t1.count, 0) as t1c,
IFNULL(t2.count, 0) as t2c,
IFNULL(t3.count, 0) as t3c,
IFNULL(t4.count, 0) as t4c,
IFNULL(t5.count, 0) as t5c,
IFNULL(t6.count, 0) as t6c,
IFNULL(t7.count, 0) as t7c
FROM vehicle_models vm
LEFT JOIN (
SELECT
vm.class as class,
count(*) as count
FROM a3s186jg7ffmm0q8.vehicles v
JOIN vehicle_models vm
ON vm.id = v.model_id
WHERE
DATEDIFF(IFNULL(v.retail_date, now()), v.wholesale_date) BETWEEN 0 AND 30
GROUP BY vm.class
) t1 ON t1.class = vm.class
*** MORE SAME LEFT JOINS ***
ORDER BY vm.class;
Now, this provides desired results, but what I would like to know if there is a better way to write this query in terms of performance and also code structure.
I guesss you are presenting a report of inventory aging (of how long that car sits on the dealer's lot before somebody buys it). You can put the age ranges in your top-level select rather than putting each one in a separate subquery. That will make your query faster (subqueries have a cost) and shorter / easier to read.
Try something like this nested query. The inner query gives back one row per vehicle with its aging number. The outer query aggregates them.
SELECT class,
COUNT(*) total,
SUM(age BETWEEN 0 AND 30) t1c,
SUM(age BETWEEN 31 AND 60) t2c,
SUM(age BETWEEN 61 AND 90) t3c,
... etc ...
FROM (
SELECT vm.class,
DATEDIFF(IFNULL(v.retail_date, now()), v.wholesale_date) age
FROM a3s186jg7ffmm0q8.vehicles v
JOIN vehicle_models vm ON vm.id = v.model_id
) subq
GROUP BY class
ORDER BY class;
This SUM() trick works in MySQL because expressions like age BETWEEN 0 AND 30 have the value 1 when true and 0 when false.

SQL Capture duplicate records across two DIFFERENT columns

I am writing an Exception Catching Page using MySQL for catching duplicate billing entries the following scenario.
Items details are entered in a table which has the following two columns (among others).
ItemCode VARCHAR(50), BillEntryDate DATE
It often happens that same item's bill is entered multiple times, but over a period of few days. Like,
"Football","2019-01-02"
"Basketball","2019-01-02"
...
...
"Football","2019-01-05"
"Rugby","2019-01-05"
...
"Handball","2019-01-05"
"Rugby","2019-01-07"
"Rugby","2019-01-10"
In the above example, the item Football is billed twice - first on 2Jan and again on 5Jan. Similarly, item Rugby is billed thrice on 5,7,10Jan.
I am looking to write simple SQL which can pickup each item [say, using distinct(ItemCode) clause], and then display all the records which are duplicates over a period of 30 days.
In the above case, the expected output should be the following 5 records:
"Football","2019-01-02"
"Football","2019-01-05"
"Rugby","2019-01-05"
"Rugby","2019-01-07"
"Rugby","2019-01-10"
I am trying to run the following SQL:
select * from tablen a, tablen b, where a.ItemCode=b.ItemCode and a.BillEntryDate = b.BillEntryDate+30;
However, this seems to be highly inefficient as it is running for long without displaying any records.
Is there any possibility for getting a less complex and faster method?
I did explore existing topics (like How do I find duplicates across multiple columns?), but it is catching duplicates where BOTH columns have same value. My requirement is one column same value, and second column varying over a month-long date range.
You can use:
select t.*
from tablen t
where exists (select 1
from tablen t2
where t2.ItemCode = t.ItemCode and
t2.BillEntryDate <> t.BillEntryDate and
t2.BillEntryDate >= t1.BillEntryDate - interval 30 day and t2.BillEntryDate <= t1.BillEntryDate + interval 30 day
);
This will pick up both duplicates in the pair.
For performance, you want an index on (ItemCode, BillEntryDate).
With EXISTS:
select ItemCode, BillEntryDate
from tablename t
where exists (
select 1 from tablename
where
ItemCode = t.ItemCode
and
abs(datediff(BillEntryDate, t.BillEntryDate)) between 1 and 30
)

Subquery with max value in a big table SQL

I'm trying to make a query to get the date of last work experience of a person and also the date they left the company (in some cases that value is null because the person is still working on the company).
I have something like:
SELECT r.idcurriculum, r.startdate, r.lastdate FROM (
SELECT idcurriculum, max(startdate) as startdate
FROM workexperience
GROUP BY idcurriculum) as s
INNER JOIN workexperience r on (r.idcurriculum = s.idcurriculum)
The structure should come out something like this:
idcurriculum | startdate | lastdate
1234 | 2010-05-01| null
2532 | 2005-10-01| 2010-02-28
5234 | 2011-07-01| 2013-10-31
1025 | 2012-04-01| 2014-03-31
I tried running that query but I had to stop it because it was taking too long. The workexperience table weights aprox 20GB. I don't know if the query is wrong, I've only run it for 10 minutes.
Help will be much appreciated.
You might try rephrasing the query as:
select r.*
from workexperience we
where not exists (select 1
from workexperience we2
where we2.idcurriculum = we.idcurriculum and
we2.startdate > we.startdate
);
Important: for performance reasons you need a composite index on idcurriculum, startdate:
create index idx_workexperience_idcurriculum_startdate on workexperience(idcurriculum, strtdate)
The logic of the query is: "Get me all rows from workexperience where there is no row for the same idcurriculum that has a larger startdate". That is a fancy way of saying "get me the maximum".
With the group by, MySQL has to do an aggregation, which would typically involve sorting the data -- expensive on 20 Gbytes. With this method, it can look up the results using the index, which should be faster.
As an alternative to Gordon's answer you could also write the query as:
SELECT r.*
FROM work_experience we
LEFT JOIN work_experience we2
ON we2.idcurriculum = we.idcurriculum
AND we2.startdate > we.startdate
WHERE we2.idcurriculum IS NULL;
You can run into problems when there are multiple maximum start_dates in the group however.

MySQL query for items where average price is less than X?

I'm stumped with how to do the following purely in MySQL, and I've resorted to taking my result set and manipulating it in ruby afterwards, which doesn't seem ideal.
Here's the question. With a dataset of 'items' like:
id state_id price issue_date listed
1 5 450 2011 1
1 5 455 2011 1
1 5 490 2011 1
1 5 510 2012 0
1 5 525 2012 1
...
I'm trying to get something like:
SELECT * FROM items
WHERE ([some conditions], e.g. issue_date >= 2011 and listed=1)
AND state_id = 5
GROUP BY id
HAVING AVG(price) <= 500
ORDER BY price DESC
LIMIT 25
Essentially I want to grab a "group" of items whose average price fall under a certain threshold. I know that my above example "group by" and "having" are not correct since it's just going to give the AVG(price) of that one item, which doesn't really make sense. I'm just trying to illustrate my desired result.
The important thing here is I want all of the individual items in my result set, I don't just want to see one row with the average price, total, etc.
Currently I'm just doing the above query without the HAVING AVG(price) and adding up the individual items one-by-one (in ruby) until I reach the desired average. It would be really great if I could figure out how to do this in SQL. Using subqueries or something clever like joining the table onto itself are certainly acceptable solutions if they work well! Thanks!
UPDATE: In response to Tudor's answer below, here are some clarifications. There is always going to be a target quantity in addition to the target average. And we would always sort the results by price low to high, and by date.
So if we did have 10 items that were all priced at $5 and we wanted to find 5 items with an average < $6, we'd simply return the first 5 items. We wouldn't return the first one only, and we wouldn't return the first 3 grouped with the last 2. That's essentially how my code in ruby is working right now.
I would do almost an inverse of what Jasper provided... Start your query with your criteria to explicitly limit the few items that MAY qualify instead of getting all items and running a sub-select on each entry. Could pose as a larger performance hit... could be wrong, but here's my offering..
select
i2.*
from
( SELECT i.id
FROM items i
WHERE
i.issue_date > 2011
AND i.listed = 1
AND i.state_id = 5
GROUP BY
i.id
HAVING
AVG( i.price) <= 500 ) PreQualify
JOIN items i2
on PreQualify.id = i2.id
AND i2.issue_date > 2011
AND i2.listed = 1
AND i2.state_id = 5
order by
i2.price desc
limit
25
Not sure of the order by, especially if you wanted grouping by item... In addition, I would ensure an index on (state_id, Listed, id, issue_date)
CLARIFICATION per comments
I think I AM correct on it. Don't confuse "HAVING" clause with "WHERE". WHERE says DO or DONT include based on certain conditions. HAVING means after all the where clauses and grouping is done, the result set will "POTENTIALLY" accept the answer. THEN the HAVING is checked, and if IT STILL qualifies, includes in the result set, otherwise throws it out. Try the following from the INNER query alone... Do once WITHOUT the HAVING clause, then again WITH the HAVING clause...
SELECT i.id, avg( i.price )
FROM items i
WHERE i.issue_date > 2011
AND i.listed = 1
AND i.state_id = 5
GROUP BY
i.id
HAVING
AVG( i.price) <= 500
As you get more into writing queries, try the parts individually to see what you are getting vs what you are thinking... You'll find how / why certain things work. In addition, you are now talking in your updated question about getting multiple IDs and prices at apparent low and high range... yet you are also applying a limit. If you had 20 items, and each had 10 qualifying records, your limit of 25 would show all of the first item and 5 into the second... which is NOT what I think you want... you may want 25 of each qualified "id". That would wrap this query into yet another level...
What MySQL does makes perfectly sense. What you want to do does not make sense:
if you have let's say 4 items, each with price of 5 and you put HAVING AVERAGE <= 7 what you say is that the query should return ALL the permutations, like:
{1} - since item with id 1, can be a group by itself
{1,2}
{1,3}
{1,4}
{1,2,3}
{1,2,4}
...
and so on?
Your algorithm of computing the average in ruby is also not valid, if you have items with values 5, 1, 7, 10 - and seek for an average value of less than 7, element with value 10 can be returned just in a group with element of value 1. But, by your algorithm (if I understood correctly), element with value 1 is returned in the first group.
Update
What you want is something like the Knapsack problem and your approach is using some kind of Greedy Algorithm to solve it. I don't think there are straight, easy and correct ways to implement that in SQL.
After a google search, I found this article which tries to solve the knapsack problem with AI written in SQL.
By considering your item price as a weight, having the number of items and the desired average, you could compute the maximum value that can be entered in the 'knapsack' by multiplying desired_cost with number_of_items
I'm not entirely sure from your question, but I think this is a solution to your problem:
SELECT * FROM items
WHERE (some "conditions", e.g. issue_date > 2011 and listed=1)
AND state_id = 5
AND id IN (SELECT id
FROM items
GROUP BY id
HAVING AVG(price) <= 500)
ORDER BY price DESC
LIMIT 25
note: This is off the top of my head and I haven't done complex SQL in a while, so it might be wrong. I think this or something like it should work, though.

Very complex Group By / Unique / Limit by SQL-command

I actually don't even know how to call this :P, but...
I have one table, let's call it "uploads"
id owner date
-----------------------------
0 foo 20100101120000
1 bar 20100101120300
2 foo 20100101120400
3 bar 20100101120600
.. .. ..
6 foo 20100101120800
Now, when I'ld do something like:
SELECT id FROM uploads ORDER BY date DESC
This would result in:
id owner date
-----------------------------
6 foo 20100101120800
.. .. ..
3 bar 20100101120600
2 foo 20100101120400
1 bar 20100101120300
0 foo 20100101120000
Question: Nice, but, I want to go even further. Because now, when you would build a timeline (and I did :P), you are 'spammed' by messages saying foo and bar uploaded something. I'ld like to group them and return the first result with a time-limit of '500' at the date-field.
What kind of SQL-command do I need that would result in:
id owner date
-----------------------------
6 foo 20100101120800
3 bar 20100101120600
0 foo 20100101120000
Then, after that, I can perform a call for each record to get the associative records in a timeframe of 5 minutes (this is an exmaple for id=6):
SELECT id FROM uploads WHERE date>=20100101120800-500 ORDER BY date DESC
Does anyone now how I should do the first step? (so limiting/grouping the results)
(btw. I know that when I want to use this, I should convert every date (YmdHis=60) to Unix-time (=100), but I don't need the 5 minutes to be exactly 5 minutes, they may be a minute less sometimes...)
I'm not quite clear on the result you are trying to get, even with your examples. Perhaps something with rounding and group by.
SELECT max(id) max_id,owner, (ROUND(date/500)*500) date_interval, max(date) date
FROM uploads GROUP BY date_interval,owner
You may want to use FLOOR or CEILING instead of ROUND, depending on what you want.
Standard SQL doesn't deal with intervals very well.
You are going to need to do a self-join of the table to compare dates of different tuples.
That way, you can easily find all pairs of tuples of which the dates are no more than 500 apart.
However, you really want to cluster the dates in sets no more than 500 apart - and that can't be expressed in SQL at all, as far as I know.
What you can do is something quite similar: split the total time interval into fixed 500-unit ranges, and then cluster all tuples in the table based on the interval they're in. For that, you first need a table or query result with the start times of the intervals; this can be created using a SQL query on your table and a function that either "rounds off" a timestamp to the starting time in its interval, or computes its interval sequence number. Then as a second step you can join the table with that result to group its timestamps according to their corresponding start time. I can't give the SQL because it's DBMS-dependent, and I certainly can't tell you if this is the best way of accomplishing what you want in your situation.
Use an inline view? e.g. something like
SELECT u1.*
FROM uploads u1,
(SELECT date
FROM uploads u2
WHERE u2.owner='foo') datum_points
WHERE u1.date BETWEEN datum_points.date
AND DATE_ADD(datum_points.date INTERVAL 5 MINUTES)
should return all the posts made within 5 minutes of 'foo' making a post.