mySQL: Quering time-related rows grouped by century? - mysql

I am working on a project which features one database table looking like this (structurally, but not datawise - puuh):
year | event | category
------------------------------------------------------------
1970 | Someone ate a cheeseburger | food
2010 | Justin bieber was discovered | other
1500 | Columbus makes 3rd trip to America | notable
------------------------------------------------------------
How would I query this table, so that my result is grouped in a per-century way?
2000-century:
2010 - Justin bieber was discovered
1900-century:
1970 - Someone ate a cheeseburger
1500-century:
1500 - Columbus makes 3rd trip to America
Sorry for the cheesy psuedodata :)

MySQL doesn't have a century function, but it does have year, so you basically do:
SELECT whatever
FROM yourtable
WHERE ...
GROUP BY CAST((Year(datetimefield) / 100) AS INTEGER)
Of course, this doesn't take into account that centuries officially start on their year "1", and not on the year "0" (e.g. 1900 is still the 18th century, not 19th). But if you're not a stickler for precision, the simple "divide by 100" method is the quickest/easiest.

SELECT CAST((Year(datetimefield) / 100) AS INTEGER) as century, someotherfields
FROM yourtable
WHERE ...
ORDER BY datetimefield
Is a better aproach to the question I think.
(pseudo code borrowed from Marc B)

Related

How to fit autocorrelation structures in a diving glmer model vs nlme and glmmTMB models?

I want to compare diving behaviour between penguin populations breeding at different locations using generalized linear mixed effects models (glmer). The data looks like this:
# A tibble: 12 × 7
# Groups: ID [1]
ID begdesc bout maxdep island phase diurnal
<chr> <dttm> <dbl> <dbl> <chr> <chr> <chr>
1 Nelson_Bro1_HP01 2019-01-01 19:53:11 1 52.0 Nelson Chick day
2 Nelson_Bro1_HP01 2019-01-01 20:08:18 2 5.34 Nelson Chick day
3 Nelson_Bro1_HP01 2019-01-01 20:14:39 2 52.0 Nelson Chick day
4 Nelson_Bro1_HP01 2019-01-01 20:24:46 3 64.1 Nelson Chick day
5 Nelson_Bro1_HP01 2019-01-01 20:28:44 3 75.5 Nelson Chick day
6 Nelson_Bro1_HP01 2019-01-01 20:39:44 4 68.5 Nelson Chick day
7 Nelson_Bro1_HP01 2019-01-01 20:46:58 4 62.8 Nelson Chick day
8 Nelson_Bro1_HP01 2019-01-01 20:52:19 4 62.0 Nelson Chick day
9 Nelson_Bro1_HP01 2019-01-01 20:56:43 4 60.7 Nelson Chick day
10 Nelson_Bro1_HP01 2019-01-01 20:59:56 4 62.1 Nelson Chick day
11 Nelson_Bro1_HP01 2019-01-01 21:04:05 4 62.8 Nelson Chick day
12 Nelson_Bro1_HP01 2019-01-01 21:10:05 4 51.5 Nelson Chick day
ID= Individual penguin, begdesc = date and time of dive, bout = bout number, maxdep = dive depth, Island, breeding phase (incubation or chick-rearing) and diurnal = day/night.
The response variable is maximum dive depth and the fixed effects are Island ('Kop' or 'Nel') and Diurnal (Day or Night). I include the ID of each penguin as a random effect to account for the repeated measures. The diving depths have a Gamma error distribution and each row represents a single dive. The times of dives are irregularly spaced. I used the following syntax for the generalized linear mixed models.
glmer(maxdepth ~ island + diurnal+ (1|ID), data = dive.stats, family = Gamma(link = "log")
I realized that the dives closer in time are highly autocorrelated.
ACF plot without an important time variable
So now I also have a variable 'bout' that group dives that are closer together in time in the same group and dives that are separated by more than 5 minutes into a different dive bout so that the timing of dives is accounted for. I account for repeated measures of bouts within individuals as a nested random effect:
glmer(maxdepth ~ island + diurnal+ (1|ID/bout), data = dive.stats, family = Gamma(link = "log")
Once I account for bouts of dives within individuals, the ACF plot improved a lot.
ACF plot with a variable that accounts for time.
But I still want to include an autocorrelation structure in the glmer model to improve the model estimates. I have looked online for how to include autocorrelation structures in a generalized linear mixed effects model, but I can't find any good answers. Has anyone been able to do this before?
Other options I explored are to use 'lme' instead, where I can add autocorrelation structures easily, but that also means that I can't specify that the data is Gamma distributed. So I will need to log transform the diving depth variable so that the errors are normally distributed, which has been discouraged before because interpretation is not easy to do with transformed data. The ACF plots look even better when I use 'lme' compared to the glmer models.
ACF plot using lme
I've also tried 'glmmTMB' where I can specify the Gamma (link: log) error distribution, but I am struggling to figure out how to include the AR1 correlation structure. Can I specify 'bout' as the time factor needed for the ar1 correlation structure, since groups of dives within 5 min of each other in the same bout? Or should I try and include another time variable, e.g. hour when dive took place?
According to this vignette, if I have irregular spaced dives I should rather use the ou (Ornstein-Uhlenbeck) correlation structure, but then I need to include the locations of dives?
I would appreciate any suggestions you might have!

How to set up crosstab queries to count days for negative stock counts?

Hello Stack overflow (and anyone googling similar questions in the future)!
I have a dataset that regularly reports which products are absent on a warehouse stockcheck, which I am trying to use to analyse when stock is or isn’t available. I’m essentially trying to identify “Has a part been reported as missing? -> If so, count the number of days it is missing until another part in the same category is reported as missing, but the original part was not reported as missing on that date (as we can assume it’s back in stock)”.
I’ve managed to make this work in excel, but my spreadsheet began to die from the calculation of 5 locations worth of categories and parts, let alone across the 600+ I’m working on! As a result, I’m trying to set up a similar function in Access to analyse which, and for how long, parts were out of stock.
My dataset looks something like:
Location number
Location
Category
Date reported
Part Number
Part Description
Order number
1
London
Car
03/06/2021
2021
Wheel A
1
2
London
Bus
03/06/2021
1491
Seat C
2
3
Manchester
Car
01/06/2021
2021
Wheel A
3
My assumptions are that:-
• My data is fed by individual workers who each cover a location, and check all stock for a random selection of categories each visit (with the idea that they cover all of their location’s categories within a certain number of visits) and record which parts are missing. There is no particular visit plan – it can be a random number of days between each visit. This data gets fed into a central table, which I have access to.
• As my workers may not check all categories in a location on each visit, I must assume that a previously reported missing part is OOS until they check products in the same category, but do not report that part again.
I made this work on excel by setting up another column that concatenated my location, part number, and date reported, and then set up three tables (all of which are essentially locations, categories, and parts down my X axis, and dates across the Y axis):-
• Table1, to look if my concatenated code was reported for each day (and if so, output 1 – essentially working in days) – essentially, was each part reported as missing for each category and location?
• Table2, to look if any parts were reported for each category, for each location – essentially, how many parts were reported for each category for each location, and a value greater than 0 means we can assume that that category at that location has been checked by my workers for that date.
• Table3, that for each location+category+day asked as a formula – IF(category was checked as per table2 = yes , pull the value of 1 for that part/location/category in table 1 , re-use yesterday’s value for this part/location/day in this table). For the 1st day in my date range, I used the values for table1 for that day as a “starting up” point.
When I look at table 3, I can visually the run of days products were out of stock, and can from there crunch numbers related to that, which is what I want!
My initial Access plan was to set up three crosstab queries, to mirror my three excel tables. I can make Table1 and Table2 very easily, but for the life of me can’t make table3 work (currently have a calculated expression that mirrors the formula I had in table 3, but something has gone amiss…).
I’m looking for a steer/advice on setting up the expression in my crosstab query, or other ideas/approaches I could use to calculate how long each part is missing for. Any help would be greatly appreciated, as I’ve lost my mind going in circles today!
Edit:-
Simplified dataset I'm working with:-
Location
Category
Date Reported
Part number
Part Description
Order number
Concatenate code
Concatenate Code 2
1
London
Car
03/06/2021
2021
Wheel
1
1443502021
1
London
Bus
03/06/2021
1491
Seat
2
1443501491
2
Manchester
Car
05/06/2021
2021
Wheel
3
2443522021
1
London
Car
05/06/2021
2021
Wheel
4
1443522021
1
London
Car
07/06/2021
2021
Wheel
5
1443542021
1
London
Bus
05/06/2021
1860
Seatbelt
6
1443521860
1
London
Bus
05/06/2021
1860
Seatbelt
7
1443521860
2
manchester
Bus
01/06/2021
1860
Seatbelt
8
2443481860
2
Manchester
Bus
06/06/2021
1860
Seatbelt
9
2443531860
2
manchester
Bus
04/06/2021
1491
Seat
10
2443511491
2
Manchester
Bus
06/06/2021
1491
Seat
11
2443531491
I'm trying to output something like (which I've made work in Excel):-
Location
Category
Part code
01/06/2021
02/06/2021
03/06/2021
04/06/2021
05/06/2021
06/06/2021
07/06/2021
1
London
Car
2021
1
1
1
1
1
London
Car
2626
1
London
Bus
1491
1
1
1
London
Bus
1860
1
1
2
Manchester
Car
2021
1
1
2
Manchester
Car
2626
2
Manchester
Bus
1491
1
1
1
2
Manchester
Bus
1860
1
1
1
1
3
Liverpool
Car
2021
3
Liverpool
Car
2626
3
Liverpool
Bus
1491
Or to return the value for how many concurrent days out of stock a part has been, like per day of this version:-
Location
Category
Part code
01/06/2021
02/06/2021
03/06/2021
04/06/2021
05/06/2021
06/06/2021
07/06/2021
1
London
Car
2021
1
2
3
4
1
London
Car
2626
1
London
Bus
1491
1
2
1
London
Bus
1860
1
2
2
Manchester
Car
2021
1
2
2
Manchester
Car
2626
2
Manchester
Bus
1491
1
2
3
2
Manchester
Bus
1860
1
2
3
1
3
Liverpool
Car
2021
3
Liverpool
Car
2626
3
Liverpool
Bus
1491
My Access sql (that I then turned into a crosstab) to identify ordered parts per day:
SELECT DISTINCT T_stores.[Store Nos], T_stores.[Store Name], t_Stands.Brand, t_Productlookup.TPND, t_Productlookup.TITLE, t_gapdata.Quantity, t_gapdata.[Requested Date]
FROM ((T_stores
INNER JOIN t_Stands ON T_stores.[Store Nos] = t_Stands.[Store Nos])
INNER JOIN t_gapdata ON (t_Stands.[Brand] = t_gapdata.[Brand]) AND (t_Stands.[Store Nos] = t_gapdata.[Store No]))
INNER JOIN t_Productlookup ON t_gapdata.[Part Number] = t_Productlookup.[EAN];
And likewise, to identfy is parts were ordered for a location's category:-
SELECT DISTINCT T_stores.[Store Nos], T_stores.[Store Name], t_Stands.Brand, t_Productlookup.TPND, t_Productlookup.TITLE, t_gapdata.Quantity, t_gapdata.[Requested Date]
FROM ((T_stores
INNER JOIN t_Stands ON T_stores.[Store Nos] = t_Stands.[Store Nos])
INNER JOIN t_gapdata ON (t_Stands.[Brand] = t_gapdata.[Brand]) AND (t_Stands.[Store Nos] = t_gapdata.[Store No]))
INNER JOIN t_Productlookup ON t_gapdata.[Part Number] = t_Productlookup.[EAN];
These first two work fine, but I'm struggling to put them together with some sort of Iif calculated field for a third query:-
SELECT First(q_gaps_per_product.[Store Nos]) AS [FirstOfStore Nos], First(q_gaps_per_product.[Store Name]) AS [FirstOfStore Name], First(q_gaps_per_product.Brand) AS FirstOfBrand, First(q_gaps_per_brand_store.[Order Id]) AS [FirstOfOrder Id], First(q_gaps_per_product.TPND) AS FirstOfTPND, First(q_gaps_per_product.TITLE) AS FirstOfTITLE, First(q_gaps_per_product.[Requested Date]) AS [FirstOfRequested Date], First(IIf([q_gaps_per_brand_store]![Requested Date]>=[q_gaps_per_product]![Requested Date],[Quantity],"PREVIOUS DAY")) AS Expr1, [q_gaps_per_product]![Store Nos] & [q_gaps_per_product]![Quantity] & [q_gaps_per_product]![TPND] AS Expr2
FROM q_gaps_per_product LEFT JOIN q_gaps_per_brand_store ON q_gaps_per_product.[Brand] = q_gaps_per_brand_store.[Brand]
GROUP BY [q_gaps_per_product]![Store Nos] & [q_gaps_per_product]![Quantity] & [q_gaps_per_product]![TPND];
Expr1 is supposed to be how many days a product is out of stock, with the idea that "PREVIOUS DAY" would return the same criteria for the previous day, to show either running gaps or that a product was in fact available as a 0, but I haven't got that far yet.
Expr2 is basically something I tried to make up to group the results by, as I had an insane number of results due to my janky table relationships.
I sort of think this query is DOA, and I need to go back to the drawing board to reproduce something like my Excel tables / how many days out of stock products have been concurrently out of stock before.
Sorry for the sheer storm of words!

MS Access Report for multiple queries

I want to create a report in access that shows sales forecast in 3 subreports (the next 30 days, 31-60 days and 61-90 days). I haven't been able to figure out how to create a report like that. I have 3 queries for each date range. I would like to group the data for each salesperson showing company, amount and date. Something like below.
Can I do it using subforms or is there a better way? Thanks.
30 days
Bob Smith
Acme $3000 10/31/2019
Detla $2000 11/12/2019
60 days
Julie Brown
Bigly $4000 12/05/2019
90 days
Bob Smith
Acme $2500 01/10/2020
Julie Brown
Bigly $3300 01/12/2020

How to group after two fields and concatenate information from third field in query

I want to group data in ONE table after two columns (Numbers) and concatenate the information from a third column (short text) for the grouped results.
I am a non programming end user of Access with only little experience in SQL and no experience in VBA but I managed to follow the steps in this other question which already gets me half way to solving my own problem.
Concatenate multiple rows in one field in Access?
That´s why I will reuse the example data used in that post.
One probably just needs to tweak the code slightly for everything to work as intended.
The slightly changed example data from a referenced question looks like this:
Table "YourTable"
Year | Order Number | Product Types
2014 | 100001 | TV
2014 | 100001 | Phone
2016 | 100001 | Internet
2014 | 100002 | Phone
2014 | 100002 | Phone
2014 | 100003 | TV
2014 | 100003 | Internet
2015 | 100003 | Phone
2015 | 100003 | Notebook
For each available combination of Year and OrderNumber I want all corresponding differing entries in the column "product Types" listed separated by a slash or semicolon.
To do this for only one column to group by (Order Number) you can find the solution under the above linked question in the answer by HansUp:
https://stackoverflow.com/a/12428291/3954188
He uses the function "Concatenate values from related records" provided by Allen Browne to achieve the desired result and also provides the final query as an example. Everything works fine for grouping after one column using these resources.
How would I modify the query to get it working the way I´d like it to or is this impossible and needs another solution?
Please post the modified function code and/or query if possible. I managed to implement the function and the example solution from the other question but I´m not well versed in using SQL or VBA.
(I´m using Win 7, 64bit and MS Office 2013)
Include Year and Order Number in your query's GROUP BY. Then you want to concatenate the Product Types values within each of those groups.
I stored your sample data in an Access 2010 table named YourTable. With those data, this is the output from the query below ...
Year Order Number Expr1
---- ------------ --------------
2014 100001 Phone;TV
2014 100002 Phone;Phone
2014 100003 Internet;TV
2015 100003 Notebook;Phone
2016 100001 Internet
SELECT
y.Year,
y.[Order Number],
ConcatRelated
(
'[Product Types]',
'YourTable',
'[Year]=' & y.Year & ' AND [Order Number]=' & y.[Order Number],
'[Product Types]',
';'
) AS Expr1
FROM YourTable AS y
GROUP BY
y.Year,
y.[Order Number];
YOU CAN GET IT using following SQL (ON MYSQL)
SELECT Year , Order Number ,GROUP_CONCAT( Product Types) FROM TABLE GROUP BY Year , Order Number

How to construct price time series from price changes using MS-Access 2007 queries?

Background
I've got a table of price changes tblPriceChanges of various items tblItems (example below - data made up):
[tblPriceChanges]:
Timestamp Item Price
9AM 01/01/2013 Orange 50p
9AM 01/01/2013 Apple 30p
2PM 01/01/2013 Pen 80p
2PM 02/01/2013 Orange 55p
2PM 02/01/2013 Pen 85p
9AM 03/01/2013 Apple 25p
9AM 05/01/2013 Pencil 10p
9AM 05/01/2013 Pen 70p
2PM 05/01/2013 Pencil 15p <- Notice there can be multiple price changes on the same day
...
[tblItems]:
Item Category Ratio
Orange Fruit 1
Apple Fruit 3
Pen Stationary 2
Pencil Stationary 5
...
Problem
The end result is that I want to be able to see how the average price of each category changes through time - for example:
Specifically, the average price series of Fruits, for instance, should be calculated as the weighted average of Orange and Apple prices in a Ratio of 1:3. So in the end I'm looking to generate (via some combination of queries) the following table for the underlying data of the chart:
Timestamp Fruit Stationary
01/01/2013 40.0 80.0
02/01/2013 55.0 85.0
03/01/2013 50.0 85.0
04/01/2013 50.0 85.0
05/01/2013 50.0 75.0
...
(this data is also made up and so probably not consistent with the original example)
I've managed to get an inner join on the two tables, but not too sure how to proceed. My main problem is how to handle days with no prices changes such as 01/04/13. The average prices still exist, but is not getting picked up by any query I try.
So how to use queries to construct data for the chart?
One remark before we start on a solution: be careful not to use reserved words for field names. Timestamp is a reserved word. Access will let you name a column that way, but you may encounter strange issues later on, especially if you ever move to another database in the future or use some other tools that fetch data from your Access database.
So here, I renamed your Timestamp column to DateTimeStamp.
I think that your requirements would be more straightforward to implement in VBA rather than in pure SQL queries: while you can easily build a query that gets you the average of each category per day, you are going to struggle to fill-in the data for the days where you have no data.
Simple but incomplete SQL solution
However, maybe having these holes in your data isn't that much of an issue since the graph will simply skip those missing values (it's not like they would show up as a data value of 0.00).
In that case, the following query should give you the results:
SELECT Dateserial(Year([DateTimeStamp]),
Month([DateTimeStamp]),
Day([DateTimeStamp])) AS NormalisedDate,
tblItems.Category,
SUM([Price]*[Ratio])/SUM([Ratio]) AS AvgPrice
FROM tblPriceChanges
INNER JOIN tblItems
ON tblPriceChanges.Item=tblItems.Item
GROUP BY Dateserial(Year([DateTimeStamp]),
Month([DateTimeStamp]),
Day([DateTimeStamp])),
tblItems.Category
Would result in the following:
NormalisedDate Category AvgPrice
01/01/2013 Fruit 35
01/01/2013 Stationary 80
02/01/2013 Fruit 55
02/01/2013 Stationary 85
03/01/2013 Fruit 25
05/01/2013 Stationary 22.0833333333333