Calculating frequency in MySQL - mysql

I am creating a library database and have four tables as follows;
I have been researching ways to work out the frequency in MySQL but after such as long time and misunderstanding I've decided to try get an example of how to work out the frequency on tables that I'll understand. Below are the four tables I am currently using.
I am looking to workout the loan frequency of every book that has been loaned 2 or more times. By doing this I am able to see how working out frequency would work when selecting specific values instead of all values.
From looking at my tables I would have to select the 'code' from the loan table, select all values that occur twice or more and then workout the frequency of the occurrence.
From my research I would decide to use an INNER JOIN to connect the tables, COUNT to count the number of values, GROUP BY to group the values and HAVING as WHERE may not be used. I am having trouble writing the query and continuously stumble upon errors. Could anyone use the example above to explain how they worked out the frequency of each book loaned two times or more? Thanks in advance
Table 1 - book
isbn title author
111-2-33-444444-5 Pro JavaFX Dave Smith
222-3-44-555555-6 Oracle Systems Kate Roberts
333-4-55-666666-7 Expert jQuery Mike Smith
Table 2 - copy
code isbn duration
1011 111-2-33-444444-5 21
1012 111-2-33-444444-5 14
1013 111-2-33-444444-5 7
2011 222-3-44-555555-6 21
3011 333-4-55-666666-7 7
3012 333-4-55-666666-7 14
Table 3 - student
no name school embargo
2001 Mike CMP No
2002 Andy CMP Yes
2003 Sarah ENG No
2004 Karen ENG Yes
2005 Lucy BUE No
Table 4 - loan
code no taken due return
1011 2002 2015.01.10 2015.01.31 2015.01.31
1011 2002 2015.02.05 2015.02.26 2015.02.23
1011 2003 2015.05.10 2015.05.31
1013 2003 2014.03.02 2014.03.16 2014.03.10
1013 2002 2014.08.02 2014.08.16 2014.08.16
2011 2004 2013.02.01 2013.02.22 2013.02.20
3011 2002 2015.07.03 2015.07.10
3011 2005 2014.10.10 2014.10.17 2014.10.20

You didn't specify the type of frequency, but this query calculates the number of loans per week for each book that was loaned more than once in 2014:
select b.isbn
, b.title
, count(*) / 52 -- loans/week
from loan l
join copy c
on c.code = l.code
join book b
on b.isbn = c.isbn
where '2014-01-01' <= taken and taken < '2015-01-01'
group by
b.isbn
, b.title
having count(*) > 1 -- loaned more than once

Related

Need assistance with a SQL query to find percentage share

I am using MYSQL to run my sql queries
Below is the structure of the table
Table-1: job_data
job_id: unique identifier of jobs
actor_id: unique identifier of actor
event: decision/skip/transfer
language: language of the content
time_spent: time spent to review the job in seconds
org: organization of the actor
ds: date in the yyyy/mm/dd format. It is stored in the form of text and we use presto to run. no need for date function
Dataset:
dates
job_id
actor_id
event
language
time_spent
org
11/30/2020
21
1001
skip
English
15
A
11/30/2020
22
1006
transfer
Arabic
25
B
11/29/2020
23
1003
decision
Persian
20
C
11/28/2020
23
1005
transfer
Persian
22
D
11/28/2020
25
1002
decision
Hindi
11
B
11/27/2020
11
1007
decision
French
104
D
11/26/2020
23
1004
skip
Persian
56
A
11/25/2020
20
1003
transfer
Italian
45
C
I am trying to find percentage share of each language: Share of each language for different contents. Calculate the percentage share of each language in the last 30 days?
My Query
SELECT language,
ROUND(100.0 * SUM(IF(event IN ('transfer', 'decision'), 1, 0)) / COUNT(job_id), 2) AS percentage_share
FROM job_data
WHERE ds BETWEEN DATE_SUB(CURDATE(), INTERVAL 7 DAY) AND CURDATE()
GROUP BY language;
0 rows returned
I am not getting any result whatsoever
You need to parse the dates since they're not stored in the format that MySQL can parse automatically.
WHERE STR_TO_DATE(ds, '%m/%d/%Y') BETWEEN ...

How to use a self join to get the COUNT of how many people on my platform have worked at another company prior to my current one using MySQL?

I have a table in the form:
id
comp
employment_year
1
ShoesCo
2000
1
FeetOrg
2006
1
SizeEight
2012
2
ShoesCo
2001
2
SizeEight
2004
2
FeetOrg
2007
3
SizeEight
2001
3
ShoesCo
2004
3
FeetOrg
2007
I want to count (get the total) number of people who worked at ShoesCo prior to (employment_date) working at SizeEight. The id is the uniqueid for each employee. I am thinking of self-join but have limited experience with SQL.
The answer should be 2 for this example.
If the data have no duplicates by (id,comp) then
SELECT COUNT(DISTINCT id)
FROM table t1
JOIN table t2 USING (id)
WHERE t1.comp = 'ShoesCo'
AND t2.comp = 'SizeEight'
AND t1.employment_year < t2.employment_year

Selecting Numbers from a Range of Years

I am trying to execute a SQL query such that the following table:
id in_year out_year
------- ---------- -------------
1 2001 2002
2 2002 2002
3 2004 2007
can be queried such that I get all the years within that range mapped to the id. For instance, I would like to get:
id year
--------- ---------
1 2001
1 2002
2 2002
3 2004
3 2005
3 2006
3 2007
Specifically, lets say the table represents a shop with elements and their arrival to shop, and sell dates. The query would return all the element ids mapped to the year where they were in the shop.
You can construct a temp table with the years in your range of data i.e.
CREATE TABLE tmp_years (
yr YEAR NOT NULL,
PRIMARY KEY (yr)
) ENGINE=INNODB;
INSERT INTO tmp_years (yr) VALUES (2000), (2001), (2002), (2003), (2004), (2005), (2006), (2007);
and then do a JOIN:
SELECT w.id, y.yr FROM wesams_table w
INNER JOIN tmp_years y ON (y.yr >= w.in_year AND y.yr <= w.out_year);
The tidiest solution would be to create a UDF to return the range of years and use CROSS APPLY.
Performance should be rather good as the UDF will be deterministic
Edit: Sorry, I don't think this applies to MySQL.

Subquery in Access

I have 2 tables in Access with these fields
Student:
ID(PK) Name Family Tel
Lesson:
ID StudentRef(FK(Student)) Name Score
Imagine we have these records
Student :
1 Tom Allen 09370045230
2 Jim leman 09378031380
Lesson:
1 1 Math 18
2 1 Geography 20
3 2 Economic 15
4 2 Math 12
How can I write a query that result will be this (2 fields)?
Tom Math : 18 , Geography 20
Jim Economic :15 , Math :12
SELECT s.Name, l.Name, l.Score
INNER JOIN tbl_lessons as l ON s.student_id = l.student_id
FROM tbl_students as s
That won't give you your formatting, but it'll get you the data.
The most tricky part of your problem is how to aggregate strings in your sub-query. MS Access does not have any aggregation function that is applicable to strings (except for Count()) and there is no way to define your own function. This means you can't just get the desired "subject:score , subject:score" concanetation. As long as you can go without you can easily take the solution provided in the answer by Corith Malin.

Need help with MySQL query getting results to average for year y and y+1

I have a MySQL query:
SELECT px.player, px.pos, px.year, px.age, px.gp, px.goals, px.assists
, 1000 - ABS(p1.gp - px.gp) - ABS(p1.goals - px.goals) - ABS(p1.assists - px.assists) sim
FROM hockey p1
JOIN hockey px
ON px.player <> p1.player
WHERE p1.player = 'John Smith'
AND p1.year = 2010
HAVING sim >= 900
ORDER BY sim DESC
This gets me a table of results, something like this:
player pos year age gp goals assists sim
Player1 LW 2002 25 75 29 32 961
Player2 LW 2000 27 82 29 27 956
Player3 RW 2000 27 78 29 33 955
Player4 LW 2009 26 82 30 30 940
Player5 RW 2001 25 79 33 24 938
Player6 LW 2008 25 82 23 24 936
Player7 LW 2006 27 79 26 33 932
Instead, I would like it to do two things. Average the data and add a player count, so I get something like:
players age gp goals assists sim
7 26 79 28 29 945
I tried avg(px.age), avg(px.gp), avg(px.goals)...etc but I am running into errors with my "sim" formula.
Second issue is that underneath that, I would like to have the average of the data for the FOLLOWING year. In other words data from Player1 in 2003, data from Player2 in 2001, etc.
I am stuck as to HOW to get the data to average AND to get it for the following year.
Can anyone help me with either or both of these issues?
To get a single subtotal of counts and averages, just wrap your original query AS the inner select... something like... (pq = "PreQuery" select result)
Select
max( "Tot Players" ) Players,
max( "->" ) position,
count(*) Year,
avg( pq.age ) AvgAge,
avg( pq.gp ) AvgGP,
avg( pq.goals ) AvgGoals,
avg( pq.assists ) AvgAssists,
avg( pq.sim ) AvgSim
from
( SELECT
px.player,
px.pos,
px.year,
px.age,
px.gp,
px.goals,
px.assists,
1000 - ABS(p1.gp - px.gp)
- ABS(p1.goals - px.goals)
- ABS(p1.assists - px.assists) sim
FROM
hockey p1
JOIN hockey px ON px.player <> p1.player
WHERE
p1.player = 'John Smith'
AND p1.year = 2010
HAVING
sim >= 900
ORDER BY
sim DESC ) pq
If your original query worked, this should get you your overall averages. However, with the INNER query with a having and order, might cause a problem. You might need to kill the order by since it really makes no difference in the outer most query. As for the HAVING clause in the INNER query, might need to be moved to a WHERE pq.sim >= 900 in the OUTER SQL-Select.
Additionally, if you wanted the results of all players first, THEN the total, take your original query and merge it with this one... As you'll see, to keep the columns in synch with BOTH queries, I've put a bogus for player and position so it won't crash on mismatched unions... Notice my COUNT column actually would correspond with the YEAR column of the ORIGINAL query.
For the prior year... As Rob mentioned, you would just do a UNION of the two queries just showing the respective year you were qualifying for in each UNION...
EDIT --- CLARIFICATION for 2nd YEAR....
Per your subsequent comment clarification, you would have to get the basis as the basis of the year +1... if you then want the overall averages again, those would be wrapped to an outer max / avg, etc... But I think THIS is what you want for the subsequent year per player
SELECT
PrimaryQry.PrimaryPlayer,
PrimaryQry.PrimaryPos,
PrimaryQry.PrimaryYear,
PrimaryQry.PrimaryAge,
PrimaryQry.PrimaryGP,
PrimaryQry.PrimaryGoals,
PrimaryQry.PrimaryAssists,
PrimaryQry.player,
PrimaryQry.pos,
PrimaryQry.year,
PrimaryQry.age,
PrimaryQry.gp,
PrimaryQry.goals,
PrimaryQry.assists,
PrimaryQry.sim,
p2.pos PrimaryPos2,
p2.year PrimaryYear2,
p2.age PrimaryAge2,
p2.gp PrimaryGP2,
p2.goals PrimaryGoals2,
p2.assists PrimaryAssists2,
px2.player player2,
px2.pos pos2,
px2.year year2,
px2.age age2,
px2.gp gp2,
px2.goals goals2,
px2.assists assists2,
1000 - ABS(p2.gp - px2.gp)
- ABS(p2.goals - px2.goals)
- ABS(p2.assists - px2.assists) sim2
FROM
( SELECT
p1.player PrimaryPlayer,
p1.pos PrimaryPos,
p1.year PrimaryYear,
p1.age PrimaryAge,
p1.gp PrimaryGP,
p1.goals PrimaryGoals,
p1.assists PrimaryAssists,
px.player,
px.pos,
px.year,
px.age,
px.gp,
px.goals,
px.assists,
1000 - ABS(p1.gp - px.gp)
- ABS(p1.goals - px.goals)
- ABS(p1.assists - px.assists) sim
FROM
hockey p1
JOIN hockey px
ON p1.player <> px.player
WHERE
p1.player = 'John Smith'
AND p1.year = 2010
HAVING
sim >= 900 ) PrimaryQry
JOIN hockey p2
ON PrimaryQry.PrimaryPlayer = p2.player
AND PrimaryQry.PrimaryYear +1 = p2.year
JOIN hockey px2
ON PrimaryQry.Player = px2.Player
AND PrimaryQry.Year +1 = px2.year
If you follow the logic here, you already know the inner query is returning about 10 other players. So, I am keeping the stats of the first person basis IN that query too. THEN, I am joining that result set back to the hockey table TWICE... The join is primary player joined to the first for his/her year +1, the SECOND join works specifically to the one person that qualified against the primary player. The final column results get the entire first year qualifier with the second qualifier, such as
So, it will all be on one row consecutively of
John Smith 2010 Compare Person 1 YearA John Smith 2011 Compare Person 1 YearA+1
John Smith 2010 Compare Person 2 YearB John Smith 2011 Compare Person 2 YearB+1
John Smith 2010 Compare Person 3 YearC John Smith 2011 Compare Person 3 YearC+1
What query are you using to get the averages?
Just applying "AVG" to your expression for 'sim' should work in mysql. e.g.
AVG(1000 - ABS(p1.gp - px.gp) - ABS(p1.goals - px.goals) - ABS(p1.assists - px.assists)) sim
To aggregate over different years, I think there is no alternative to using a subselect or union.
Reference:
http://dev.mysql.com/doc/refman/5.0/en/subqueries.html
http://dev.mysql.com/doc/refman/5.0/en/union.html
Something like:
(ORIGINAL AVG QUERY)
UNION ALL
(ORIGINAL AVG QUERY WITH NEW YEAR)
should do the trick.
(Note that your original query selects data from every year to compare it to the data for John Smith in 2010, which may not be what you want.)