Using JOIN (?) to intentionally return more results than rows - mysql

Sorry for the length of detail required to ask the question.
There are four tables (related to research, not really having anything to do with a sporting facility). They're as follows:
1) Let's say the first table is a list of tennis courts, and let's say there are hundreds of possibilities (not just indoor and outdoor).
-------------
TENNIS_COURTS
ID Type
-------------
1 Indoor
2 Outdoor
…
2) We want to note which day of the year they're available for rental. To prevent redundant rows, we can list individual days (e.g., just the 2nd day of the year, entered as "From:2", "To:2") or blocks (e.g., from the 24th day through the 25th day, entered as "From:24", "To:25"). In this example, the indoor court is the most available while the outdoor court has only two date ranges (obviously unrealistic for winter).
---------------------------
DAYS_AVAILABLE
ID ProductID From To
---------------------------
1 1 2 2 《 Indoor
2 2 24 25 《 Outdoor
3 2 140 170 《 Outdoor
4 1 280 300 《 Indoor
5 1 340 345 《 Indoor
…
3) We also want to add a list of attributes which will grow quite long over time. So rather than incorporating these in a field rule, there's an Attributes table.
-----------------------
ATTRIBUTES
ID Attribute
-----------------------
1 Age of Player
2 Time of Day
3 Outside Temperature
…
4) Lastly, we want to add a list of Considerations (or factors) to consider when renting a court. In this example, risk of injury applies to both indoor and outdoor courts, but visibility and temperature only applies to outdoor.
--------------------------------------------------
CONSIDERATIONS
ID ProductID AttributeID Effect Link
--------------------------------------------------
1 1 1 Risk of injury www… 《 Indoor
2 2 1 Risk of injury www… 《 Outdoor
3 2 2 Hard to see www… 《 Outdoor
4 2 3 Gets cold www… 《 Outdoor
…
Utilizing the individual tables above, we'd like to create a consolidated saved view that contains at least one row for each date in the range, starting from the first day of the year (in which a court is available) through the last day of the year (for which a court is available). We also want to repeat the applicable considerations for each day listed.
Based on the data shown above, it would look like this:
----------------------------------------
CONSOLIDATED VIEW
Day Court Consideration Link
----------------------------------------
2 Indoor 《 from DAYS_AVAILABLE
2 Indoor Risk of injury www… 《 from CONSIDERATIONS
24 Outdoor 《 from DAYS_AVAILABLE
24 Outdoor Risk of injury www… 《 from CONSIDERATIONS
24 Outdoor Hard to see www… 《 from CONSIDERATIONS
24 Outdoor Gets cold www… 《 from CONSIDERATIONS
25 Outdoor 《 from DAYS_AVAILABLE
25 Outdoor Risk of injury www… 《 from CONSIDERATIONS
25 Outdoor Hard to see www… 《 from CONSIDERATIONS
25 Outdoor Gets cold www… 《 from CONSIDERATIONS
…
We can then query the consolidated view (e.g., "SELECT * FROM CONSOLIDATED_VIEW where Day = 24") to produce a simple output like:
Court: Indoor
Available: 24th day
Note: Risk of injury (www…)
Hard to see (www…)
Gets cold (www…)
We want to produce the above shown example from a consolidated view because once the data is stored, it won't be changing frequently, and we very likely will not be querying single days at a time anyhow. It's more likely that a web client will fetch all of the rows into a large array (TBD based on determining the total size), and will then present it to users without further server interaction.
Can we produce the CONSLIDATED_TABLE solely with an SQL query or do we need to perform some other coding (e.g., PHP or NodeJS)?

The real deal in your question is: how can I get a list of the available days so I can join my other tables and produce my output, right? I mean, having a list of days, all you need to is JOIN the other tables.
As you have a limited list (days of the year), I'd suggest creating a table with a single column containing the 365 (or 366) days (1, 2, 3, ...) and JOIN it with your other tables. The query would be smtg similar to:
SELECT ... -- fields u want
FROM YOUR_NEW_TABLE n
JOIN DAYS_AVAILABLE D on (n.DAY between D.From and D.To)
JOIN ... -- other tables that you need info

Related

How to set up crosstab queries to count days for negative stock counts?

Hello Stack overflow (and anyone googling similar questions in the future)!
I have a dataset that regularly reports which products are absent on a warehouse stockcheck, which I am trying to use to analyse when stock is or isn’t available. I’m essentially trying to identify “Has a part been reported as missing? -> If so, count the number of days it is missing until another part in the same category is reported as missing, but the original part was not reported as missing on that date (as we can assume it’s back in stock)”.
I’ve managed to make this work in excel, but my spreadsheet began to die from the calculation of 5 locations worth of categories and parts, let alone across the 600+ I’m working on! As a result, I’m trying to set up a similar function in Access to analyse which, and for how long, parts were out of stock.
My dataset looks something like:
Location number
Location
Category
Date reported
Part Number
Part Description
Order number
1
London
Car
03/06/2021
2021
Wheel A
1
2
London
Bus
03/06/2021
1491
Seat C
2
3
Manchester
Car
01/06/2021
2021
Wheel A
3
My assumptions are that:-
• My data is fed by individual workers who each cover a location, and check all stock for a random selection of categories each visit (with the idea that they cover all of their location’s categories within a certain number of visits) and record which parts are missing. There is no particular visit plan – it can be a random number of days between each visit. This data gets fed into a central table, which I have access to.
• As my workers may not check all categories in a location on each visit, I must assume that a previously reported missing part is OOS until they check products in the same category, but do not report that part again.
I made this work on excel by setting up another column that concatenated my location, part number, and date reported, and then set up three tables (all of which are essentially locations, categories, and parts down my X axis, and dates across the Y axis):-
• Table1, to look if my concatenated code was reported for each day (and if so, output 1 – essentially working in days) – essentially, was each part reported as missing for each category and location?
• Table2, to look if any parts were reported for each category, for each location – essentially, how many parts were reported for each category for each location, and a value greater than 0 means we can assume that that category at that location has been checked by my workers for that date.
• Table3, that for each location+category+day asked as a formula – IF(category was checked as per table2 = yes , pull the value of 1 for that part/location/category in table 1 , re-use yesterday’s value for this part/location/day in this table). For the 1st day in my date range, I used the values for table1 for that day as a “starting up” point.
When I look at table 3, I can visually the run of days products were out of stock, and can from there crunch numbers related to that, which is what I want!
My initial Access plan was to set up three crosstab queries, to mirror my three excel tables. I can make Table1 and Table2 very easily, but for the life of me can’t make table3 work (currently have a calculated expression that mirrors the formula I had in table 3, but something has gone amiss…).
I’m looking for a steer/advice on setting up the expression in my crosstab query, or other ideas/approaches I could use to calculate how long each part is missing for. Any help would be greatly appreciated, as I’ve lost my mind going in circles today!
Edit:-
Simplified dataset I'm working with:-
Location
Category
Date Reported
Part number
Part Description
Order number
Concatenate code
Concatenate Code 2
1
London
Car
03/06/2021
2021
Wheel
1
1443502021
1
London
Bus
03/06/2021
1491
Seat
2
1443501491
2
Manchester
Car
05/06/2021
2021
Wheel
3
2443522021
1
London
Car
05/06/2021
2021
Wheel
4
1443522021
1
London
Car
07/06/2021
2021
Wheel
5
1443542021
1
London
Bus
05/06/2021
1860
Seatbelt
6
1443521860
1
London
Bus
05/06/2021
1860
Seatbelt
7
1443521860
2
manchester
Bus
01/06/2021
1860
Seatbelt
8
2443481860
2
Manchester
Bus
06/06/2021
1860
Seatbelt
9
2443531860
2
manchester
Bus
04/06/2021
1491
Seat
10
2443511491
2
Manchester
Bus
06/06/2021
1491
Seat
11
2443531491
I'm trying to output something like (which I've made work in Excel):-
Location
Category
Part code
01/06/2021
02/06/2021
03/06/2021
04/06/2021
05/06/2021
06/06/2021
07/06/2021
1
London
Car
2021
1
1
1
1
1
London
Car
2626
1
London
Bus
1491
1
1
1
London
Bus
1860
1
1
2
Manchester
Car
2021
1
1
2
Manchester
Car
2626
2
Manchester
Bus
1491
1
1
1
2
Manchester
Bus
1860
1
1
1
1
3
Liverpool
Car
2021
3
Liverpool
Car
2626
3
Liverpool
Bus
1491
Or to return the value for how many concurrent days out of stock a part has been, like per day of this version:-
Location
Category
Part code
01/06/2021
02/06/2021
03/06/2021
04/06/2021
05/06/2021
06/06/2021
07/06/2021
1
London
Car
2021
1
2
3
4
1
London
Car
2626
1
London
Bus
1491
1
2
1
London
Bus
1860
1
2
2
Manchester
Car
2021
1
2
2
Manchester
Car
2626
2
Manchester
Bus
1491
1
2
3
2
Manchester
Bus
1860
1
2
3
1
3
Liverpool
Car
2021
3
Liverpool
Car
2626
3
Liverpool
Bus
1491
My Access sql (that I then turned into a crosstab) to identify ordered parts per day:
SELECT DISTINCT T_stores.[Store Nos], T_stores.[Store Name], t_Stands.Brand, t_Productlookup.TPND, t_Productlookup.TITLE, t_gapdata.Quantity, t_gapdata.[Requested Date]
FROM ((T_stores
INNER JOIN t_Stands ON T_stores.[Store Nos] = t_Stands.[Store Nos])
INNER JOIN t_gapdata ON (t_Stands.[Brand] = t_gapdata.[Brand]) AND (t_Stands.[Store Nos] = t_gapdata.[Store No]))
INNER JOIN t_Productlookup ON t_gapdata.[Part Number] = t_Productlookup.[EAN];
And likewise, to identfy is parts were ordered for a location's category:-
SELECT DISTINCT T_stores.[Store Nos], T_stores.[Store Name], t_Stands.Brand, t_Productlookup.TPND, t_Productlookup.TITLE, t_gapdata.Quantity, t_gapdata.[Requested Date]
FROM ((T_stores
INNER JOIN t_Stands ON T_stores.[Store Nos] = t_Stands.[Store Nos])
INNER JOIN t_gapdata ON (t_Stands.[Brand] = t_gapdata.[Brand]) AND (t_Stands.[Store Nos] = t_gapdata.[Store No]))
INNER JOIN t_Productlookup ON t_gapdata.[Part Number] = t_Productlookup.[EAN];
These first two work fine, but I'm struggling to put them together with some sort of Iif calculated field for a third query:-
SELECT First(q_gaps_per_product.[Store Nos]) AS [FirstOfStore Nos], First(q_gaps_per_product.[Store Name]) AS [FirstOfStore Name], First(q_gaps_per_product.Brand) AS FirstOfBrand, First(q_gaps_per_brand_store.[Order Id]) AS [FirstOfOrder Id], First(q_gaps_per_product.TPND) AS FirstOfTPND, First(q_gaps_per_product.TITLE) AS FirstOfTITLE, First(q_gaps_per_product.[Requested Date]) AS [FirstOfRequested Date], First(IIf([q_gaps_per_brand_store]![Requested Date]>=[q_gaps_per_product]![Requested Date],[Quantity],"PREVIOUS DAY")) AS Expr1, [q_gaps_per_product]![Store Nos] & [q_gaps_per_product]![Quantity] & [q_gaps_per_product]![TPND] AS Expr2
FROM q_gaps_per_product LEFT JOIN q_gaps_per_brand_store ON q_gaps_per_product.[Brand] = q_gaps_per_brand_store.[Brand]
GROUP BY [q_gaps_per_product]![Store Nos] & [q_gaps_per_product]![Quantity] & [q_gaps_per_product]![TPND];
Expr1 is supposed to be how many days a product is out of stock, with the idea that "PREVIOUS DAY" would return the same criteria for the previous day, to show either running gaps or that a product was in fact available as a 0, but I haven't got that far yet.
Expr2 is basically something I tried to make up to group the results by, as I had an insane number of results due to my janky table relationships.
I sort of think this query is DOA, and I need to go back to the drawing board to reproduce something like my Excel tables / how many days out of stock products have been concurrently out of stock before.
Sorry for the sheer storm of words!

Is mysql database logic suitable for query?

I am designing a database, and I would like to know;
Can I answer this question with queries, how much skill employees earned from this trainings?
Is this a good structure to do it?
how much money spent per department
how much skill earned per employee
how much skill earned per department
id session_name Skill impact sugg dept function training_value training no
1 PHP Software 3 Sales 2 100usd 1
2 PHP Software 3 Finance 2 100usd 1
3 PHP communication 2 Sales 2 100usd 1
4 PHP communication 2 Finance 2 100usd 1
5 ASP Software 4 Sales 2 200usd 2
6 ASP Software 4 Finance 2 200usd 2
7 ASP database 1 Sales 2 200usd 2
8 ASP database 1 Finance 2 200usd 2
attended training table
id student_id training_no
1 1 1
1 1 2
student table
id name department
1 John 1
2 Mary 2
department table
id name
1 sales
2 finance
In the end I need to find skills for each student
john
software 7
communication 2
database 1
total spent
john 300 usd
total spent by department
sales 300 usd
Your schema looks OK to me.
You should, however, think about entities and relationships.
Your entities seem to be trainings, people, and departments.
You have a many:many relationship for people:trainings. That's good.
You have a one:many relationship for departments:people. That's also good.
It looks like you want some kind of relationship for trainings:departments. I'm guessing here, but you have a sugg dept column in your trainings table. Is that supposed to have a direct relationship to your departments table?
Do you actually need an extra entity called "attendance" rather than just a many-to-many relationship people:trainings. Do you want to record when a person did a training? Do you want to record how much that particular attendance cost? How about what marks they received if there was a quiz?
In that case, you'll want relationships where each person has zero or more attendances, each attendance has exactly one training, and each training has zero or more attendances.
My point: do the hard work of thinking through your entities and relationships, and the result will be a good design for your tables.
If I may put it another way: What part of the real world are you trying to capture in your data base? What's valuable in the real world that you want your data base to hold? In your application ...
Students are people. They are, umm, inherently valuable and persistent entities.
Trainings represent the labor and cost of creating them and presenting them.
Attendances represent the effort of students.
Departments probably pay the bill for attendances. They certainly represent power centers in your application.
What other items of value exist in this corner of the real world? Teachers? Managers? Venues (classrooms)? Equipment? Customers?
My point is, figure out your entities -- the items of value -- and the relationships between them. Then write your table definitions.

Database table structure for storing statistics data

I am trying to create a table in my MYSQL database for storing click data to my posts on daily basis, what I come up is something like this:
ID | post_id | click_type | created_date
1 1 page_click 2015-12-11 18:13:13
2 2 page_click 2015-12-13 11:16:34
3 3 page_click 2015-12-13 13:24:01
4 1 page_click 2015-12-15 15:31:10
For this type of storing I can get how many clicks does the post number 1 get in December 2015 and even I can get how many clicks does the post number something gets in 15 December between 01-11pm. However let's say I am getting 2000 clicks per day which means it will create 2000 rows per day which means 60.000 per month and 720.000 per year.
Another approach that comes to my mind is like this which stores a row for one day per post and if there is more than one click in that day it will increase the count
ID | post_id | click_type | created_date | count
1 1 page_click 2015-12-11 13
2 2 page_click 2015-12-11 26
3 3 page_click 2015-12-11 152
4 1 page_click 2015-12-12 14
5 2 page_click 2015-12-12 123
6 3 page_click 2015-12-12 163
In this approach if every page is clicked at least one time (which means creating the row) in every day it will generate 1000 rows each day (let's say I have 1000 posts) and 30.000 per month and 360.000 per year.
I am looking for an advice to how to store these statistics and if I want to get daily click statistics. I have some concerns about the performance (of course it's nothing for big data guys :D but sorry for my lack of experience). Do you think it will be ok if there is over 1 million rows in that table after 2-3 years? And which one is do you thing is going to be more effective for me?
720,000 records per year is not necessarily a lot of data. One option may be not to worry about it. Something to consider may be how long the click data matters. If after a year you don't really care anymore then you can have an historical data cleanup protocol that removes data that is older than you care about.
If you are worried about storing large amounts of data and you don't want to erase history, then you can consider pre-calculating your summarized statistics and storing them instead of your transaction detail.
The issue with this is that you have to know in advance what the smallest resolution of time will be that you will continue to care about. Also, if your motivation is saving space then you have to be careful that your summary data doesn't end up taking more space than the original transactions. This can easily happen if you store summarized data at multiple resolutions, as you might in a data warehouse arrangement.
This seems like a good application for rrdtool (http://oss.oetiker.ch/rrdtool/). Here you can specify several resolutions for different time intervals, e.g:
average 5 min for 1 day
average 30 min for 1 week
average 2 hours for 1 month
average 1 day for 1 Year
etc. This is also often used for graphs. Usually this is used with rrd-files, but it can also be based on mysql with rrdgraph_libdbi

Isolating unique observations and calculating the average in Stata

Currently I have a dataset that appears as follows:
mnbr firm contribution
1591 2 1
9246 6 1
812 6 1
674 6 1
And so on. The idea is that mnbr is the member number of employees who work at firm # whatever. If contribution is 1 (and I have dropped all the 0s for this purpose) said employee has contributed to a certain fund.
I additionally used codebook to determine the number of unique firms that exist. The goal is to determine the average number of contributions per firm i.e. there was 1 contribution for firm 2, 3 contributions for firm 6 and so on. The problem I arrive at is accessing that the unique values number from codebook.
I read some documentation online for
inspect *varlist*
display r(N_unique)
which suggests to me that using r(N_unique) would store that value, yet unfortunately this method did not work for me. So that is part 1.
Part 2 is I'd also like to create a variable that shows the contributions in each firm i.e.
mnbr firm contribution average
1591 2 1 1
9246 6 . 2/3
812 6 1 2/3
674 6 1 2/3
to show that for firm 6, 2 out of the 3 employees contributed to this fund.
Thanks in advance for the help.
To answer your comment, this works for me:
clear
set more off
input ///
mnbr firm cont
1591 2 1
9246 6 .
812 6 1
674 6 1
end
list
// problem 1
inspect firm
display r(N_unique)
// problem 2
bysort firm: egen totc = total(cont)
by firm: gen share = totc / _N
list
You have to use r(N_unique) before running another Stata command, or it can get lost. You can also save that result to a local or scalar.
Problem 2 is also addressed.

get row associated with the nearest result in mysql

I am trying to get a recipe scaling method in my app which will return some nicer measurements adapted to the amount the use is serving.
For example, the recipe for 6 people calls for 1 cup of flour. If you scale that for one person, 1/6 a cup of flour is 2.5 tablespoons, which is a nicer way of saying it (why search for and dirty a measuring cup when you can just use a spoon?).
So I have in the db a weights table with weight in grams, corresponding measurement and amount.
eg. for flour
amount | measure | grams
----------+-------------+---------
1 | cup | 160
1 | tbsp | 10
1 | pound | 454
in my app (using activerecord preferably) I'm trying to get the best fit measurement for each ingredient in the recipe.
#ingredients = Recipe.select('food_names.name,
ABS(ingredients.grams-weights.grams) as nearest_weight,
weights.amount,
weights.measure'
).joins(
{:ingredients=> :food_name},
{:food_names=> :weights}
).where(
"recipes.recipe_id", :recipe_id
).order(
:nearest_weight
).reverse_order
the nearest_weight searches for the closest match of weights in the database, but I need to find the weight and measure associated with that row, and at the moment, I'm getting all the rows returned.
What I need to do is somehow limit the nearest_weight to one row, and then get that row so I know what the weight and measure are, and I'm hoping I can do that all in one query.