Using SQL to Aggregate and Calculate Stats - mysql

I have shoot 'em game where users compete against each other over the course of a week to accumulate the most points. I want to write a query that aggregates statistical data from the shots table. The tables and relationships of concern here are:
user has many competition_periods
competition_period belongs to user
competition_period has many shots
shot belongs to competition_period
In the shots table I have the following fields to work with:
result --> string values: WON, LOST or TIED
amount_won --> integer values: e.g., -100, 0, 2000, etc.
For each user, I want to return a result set with the following aggregated stats:
won_count
lost_count
tied_count
total_shots_count (won_count + lost_count + tied_count)
total_amount_won (sum of amount_won)
avg_amount_won_per_shot (total_amount_won / total_shots_count)
I've worked on this query for few hours now, but haven't made much headway. The statistical functions trip me up. A friend suggested that I try to return the results in a new virtual table called shot_records.

Here is the basic solution, computing the statistics across all shots for a given player (you didn't specify if you want them on a per-competition-period basis or not):
SELECT user, SUM(IF(result = 'WON', 1, 0)) AS won_count,
SUM(IF(result = 'LOST', 1, 0)) AS lost_count,
SUM(IF(result = 'TIED', 1, 0)) AS tied_count,
COUNT(*) AS total_shots_count,
SUM(amount_won) AS total_amount_won,
(SUM(amount_won) / COUNT(*)) AS avg_amount_won_per_shot
FROM user U INNER JOIN competition_periods C ON U.user_id = C.user_id
INNER JOIN shots S ON C.competition_period_id = S.competition_period_id
GROUP BY user
Note that this includes negatives in calculating the "total won" figure (that is, the total is decreased by losses). If that's not the correct algorithm for your game, you would change SUM(Amount) to SUM(IF(Amount > 0, Amount, 0)) in both places it occurs in the query.

Related

I would like to know if there is a better way to write this query (multiple joins of the same table)

here is the problem:
I have vehicles table in db (fields of this table are not so important), what's important is that each vehicle has a model_id, which refers to the vehicle_models table.
Vehicle models table has id, class, model, series, cm3hp, created_at and updated_at fields.
I need to define the stock age in terms of how many vehicles of the certain model class are on the stock by the given criteria. The criteria being: 0-30 days, 31-60 days, 61-90 days... 360 + days...
I don't know if it is clear enough but let me try to explain even better: For each day range I need to find the count of vehicles with the given model class. There are other criteria but that's not important for what I am trying to find out. To help you better understand the problem I'll include the screenshot of how the structure should look like:
I am using MySQL 8.
The query I wrote is:
SELECT DISTINCT vm.class,
IFNULL(t1.count, 0) as t1c,
IFNULL(t2.count, 0) as t2c,
IFNULL(t3.count, 0) as t3c,
IFNULL(t4.count, 0) as t4c,
IFNULL(t5.count, 0) as t5c,
IFNULL(t6.count, 0) as t6c,
IFNULL(t7.count, 0) as t7c
FROM vehicle_models vm
LEFT JOIN (
SELECT
vm.class as class,
count(*) as count
FROM a3s186jg7ffmm0q8.vehicles v
JOIN vehicle_models vm
ON vm.id = v.model_id
WHERE
DATEDIFF(IFNULL(v.retail_date, now()), v.wholesale_date) BETWEEN 0 AND 30
GROUP BY vm.class
) t1 ON t1.class = vm.class
*** MORE SAME LEFT JOINS ***
ORDER BY vm.class;
Now, this provides desired results, but what I would like to know if there is a better way to write this query in terms of performance and also code structure.
I guesss you are presenting a report of inventory aging (of how long that car sits on the dealer's lot before somebody buys it). You can put the age ranges in your top-level select rather than putting each one in a separate subquery. That will make your query faster (subqueries have a cost) and shorter / easier to read.
Try something like this nested query. The inner query gives back one row per vehicle with its aging number. The outer query aggregates them.
SELECT class,
COUNT(*) total,
SUM(age BETWEEN 0 AND 30) t1c,
SUM(age BETWEEN 31 AND 60) t2c,
SUM(age BETWEEN 61 AND 90) t3c,
... etc ...
FROM (
SELECT vm.class,
DATEDIFF(IFNULL(v.retail_date, now()), v.wholesale_date) age
FROM a3s186jg7ffmm0q8.vehicles v
JOIN vehicle_models vm ON vm.id = v.model_id
) subq
GROUP BY class
ORDER BY class;
This SUM() trick works in MySQL because expressions like age BETWEEN 0 AND 30 have the value 1 when true and 0 when false.

MySQL Left Join throwing off my count numbers

I'm doing a left join on a table to get the number of leads we've generated today and how many times we've called those leads. I figured a left join would be the best thing to do, so I wrote the following query:
SELECT
COUNT(rad.phone_number) as lead_number, rals.lead_source_name as source, COUNT(racl.phone_number) as calls, SUM(case when racl.contacted = 1 then 1 else 0 end) as contacted
FROM reporting_app_data rad
LEFT JOIN reporting_app_call_logs racl ON rad.phone_number = racl.phone_number, reporting_app_lead_sources rals
WHERE DATE(rad.created_at) = CURDATE() AND rals.list_id = rad.lead_source
GROUP BY rad.lead_source;
But the problem with that, is that if in the reporting_app_call_logs table, there are multiple entries for the same phone number (so a phone number has been called multiple times that day), the lead_number (which I want to count how many leads were generated on the current day grouped by lead_source) equals how many calls there are. So the count from the LEFT table equals the count from the RIGHT table.
How do I write a SQL query that gets the number of leads and the total number of calls per lead source?
Try COUNT(DISTINCT expression)
In other words, change COUNT(rad.phone_number) to COUNT(DISTINCT rad.phone_number)

Simple MySQL query using SELF JOIN

I have a table called image which is a table of images taken by a camera which records a car registration under variable reg, the camera number it was taken on under variable camera and a timestamp in the form yyyy-mm-dd 00:00:00 under variable whn. I have been asked to find the following:
"For each of the vehicles caught by camera 19 - show the registration, the earliest time at camera 19 and the time and camera at which it left the zone."
Therefore, I am finding the minimum time any particular cars were captured by camera 19, and then then the latest time on that date each car was captured along with the camera it was captured at. So far, I have the following code:
SELECT early.reg,
LEFT(MIN(early.whn), 10) AS date,
RIGHT(MIN(early.whn), 8) AS 'in',
RIGHT(MAX(late.whn), 8) AS 'out'
FROM image late
JOIN image early ON (early.reg = late.reg)
WHERE (early.camera = 19)
GROUP BY early.reg
This works perfectly fine, I just need to add the camera the maximum time was captured at where the max time is given by RIGHT(MAX(late.whn), 8) AS 'out' and I am struggling to do it. I tried adding late.camera within the SELECT call but then obviously you have to add GROUP BY late.camera which returns the latest time it was captured at each camera. Any help appreciated.
OK, Now a better understanding and clarified explanation to make sure I get what you want... The impression I am getting is as follows:
You are monitoring traffic, such as on toll roads. There are different cameras along the route both north/south or east/west bound. On any given date, you have a list of all recorded transactions read by all cameras along the route. You want to know that for any car that passed a specific camera (and we don't know the directional basis of it), you want to know where did the car finally get out of camera ranges on that route. Ex: Cameras 1-30. You know about cars getting within sight of camera 19, but they may get off the road after camera 22, 26, 29, whatever. So, for those cars that were seen at Camera 19, where was the LAST camera they passed.
If this is correct, I was close on the intent. The inner query was ALMOST the same. For a given vehicle registration ID, I am still storing the minimum and maximum dates it was spotted (could have gotten on the road at camera 4 for example, and off at camera 27 assuming cameras are sequential, but not required). The HAVING clause based on that requires that camera 19 WAS ONE of the cameras that were included in the trip. If someone got on at camera 1 and off at camera 18, they would NOT be included (provided cameras are truly sequential, but more for following along purposes).
So now I have all registrations, min and max date/times. Now, I am re-joining to the same image table based on the registration and respective min or max date since it will only be a single record per registration, no need for group by at the outer level. You would never have duplicate times for a given camera and it had to exist from the PQ query.
Now, just pull the respective camera. So the query below actually gives both the camera they were FIRST identified at, qualified as passing camera 19, and where they were LAST identified by camera.
SELECT
PQ.Reg,
LEFT(PQ.MinDateForDay, 10) AS date,
RIGHT(PQ.MinDateForDay, 8) AS 'in',
iMinDateCam.Camera CameraIn,
PQ.TimeAtCamera19,
RIGHT(PQ.MaxDateForDay, 8) AS 'out',
iMaxDateCam.Camera CameraOut
from
( SELECT
i.reg,
min( i.whn ) as MinDateForDay,
MAX( case when i.Camera = 19 then i.whn else '' end ) TimeAtCamera19,
max( i.whn ) as MaxDateForDay
FROM
image i
GROUP BY
i.reg
having
MAX( case when i.Camera = 19 then 1 else 0 end ) = 1 ) PQ
join image iMinDateCam
ON PQ.reg = iMinDateCam.reg
AND PQ.MinDateForDay = iMinDateCam.whn
join image iMaxDateCam
ON PQ.reg = iMaxDateCam.reg
AND PQ.MaxDateForDay = iMaxDateCam.whn
You are nearly there with the extra groupby required, just add a where clause to restrict the late.whn returned only to the max one
SELECT early.reg,
LEFT(MIN(early.whn), 10) AS date,
RIGHT(MIN(early.whn), 8) AS 'in',
RIGHT(MAX(late.whn), 8) AS 'out',
late.camera
FROM image late
JOIN image early ON (early.reg = late.reg)
WHERE (early.camera = 19)
and (late.whn = (select max(whn) from late))
GROUP BY early.reg, late.camera

Relational Database Logic

I'm fairly new to php / mysql programming and I'm having a hard time figuring out the logic for a relational database that I'm trying to build. Here's the problem:
I have different leaders who will be in charge of a store anytime between 9am and 9pm.
A customer who has visited the store can rate their experience on a scale of 1 to 5.
I'm building a site that will allow me to store the shifts that a leader worked as seen below.
When I hit submit, the site would take the data leaderName:"George", shiftTimeArray: 11am, 1pm, 6pm (from the example in the picture) and the shiftDate and send them to an SQL database.
Later, I want to be able to get the average score for a person by sending a query to mysql, retrieving all of the scores that that leader received and averaging them together. I know the code to build the forms and to perform the search. However, I'm having a hard time coming up with the logic for the tables that will relate the data. Currently, I have a mysql table called responses that contains the following fields,
leader_id
shift_date // contains the date that the leader worked
shift_time // contains the time that the leader worked
visit_date // contains the date that the survey/score was given
visit_time // contains the time that the survey/score was given
score // contains the actual score of the survey (1-5)
I enter the shifts that the leader works at the beginning of the week and then enter the survey scores in as they come in during the week.
So Here's the Question: What mysql tables and fields should I create to relate this data so that I can query a leader's name and get the average score from all of their surveys?
You want tables like:
Leader (leader_id, name, etc)
Shift (leader_id, shift_date, shift_time)
SurveyResult (visit_date, visit_time, score)
Note: omitted the surrogate primary keys for Shift and SurveyResult that I would probably include.
To query you join shifts and surveys group on leader and taking the average then jon that back to leader for a name.
The query might be something like (but I haven;t actually built it in MySQL to verify syntax)
SELECT name
,AverageScore
FROM Leader a
INNER JOIN (
SELECT leader_id
, AVG(score) AverageScore
FROM Shift
INNER JOIN
SurveyResult ON shift_date = visit_date
AND shift_time = visit_time --depends on how you are recording time what this really needs to be
GROUP BY leader ID
) b ON a.leader_id = b.leader_id
I would do the following structure:
leaders
id
name
leaders_timetabke (can be multiple per leader)
id,
leader_id
shift_datetime (I assume it stores date and hour here, minutes and seconds are always 0
survey_scores
id,
visit_datetime
score
SELECT l.id, l.name, AVG(s.score) FROM leaders l
INNER JOIN leaders_timetable lt ON lt.leader_id = l.id
INNER JOIN survey_scores s ON lt.shift_datetime=DATE_FORMAT('Y-m-d H:00:00', s.visit_datetime)
GROUP BY l.id
DATE_FORMAT here helps to cut hours and minutes from visit_datetime so that it could be matched against shift_datetime. This is MYSQL function, so if you use something else you'll need to use different function
Say you have a 'leader' who has 5 survey rows with scores 1, 2, 3, 4 and 5.
if you select all surveys from this leader, sum the survey scores and divide them by 5 (the total amount of surveys that this leader has). You will have the average, in this case 3.
(1 + 2 + 3 + 4 + 5) / 5 = 3
You wouldn't need to create any more tables or fields, you have what you need.

mysql query - multiple counts using left join and where clause

I'm currently trying to get the following data:
UserName, UserImageURL, Total Games Played, Games Completed, Games Lost, Average Won (as percentage) and Points of the user
And as well another set of data:
User Statistics data such as:
Most Games Played on League: 23 - Monster Killers
Games Most Won On: 19/23 - Monster Killers
Games Most Lost On: 3/32 - Frog Racers
Your Game Winning Accuracy (total from all games) - 68% accuracy
Site Stats:
Most Games Played on League: 650 - Helicopter Run
Top Game Played: 1200 - Monster Killers
Whole site winning accuracy: 82%
I have the following Tables:
-User Table-
userID (int-pk), userName (varchar), userImageUrl (text)
-Games table-
gameId (int-pk), gameName (varchar), gameUserID (int), gameLeagueId (int), score1 (int), score2 (int), gameResultOut (0 or 1), gameWon (0 or 1)
-UserBalance table-
ubId(int-pk) userId (int) balance (int)
-League table-
leagueId (int-pk) leagueName (varchar)
Just to give you a heads up on what's happening, when a user plays a game and chooses some results a row is inserted into the games table. Since the game is time based, when the results are out, there is a check that checks if there are any games which have that id and will update the gameResultOut to 1 and gameWon to 1 or 0 according to what the user had selected as a score.
I tried the following:
SELECT u.userID, u.userName, u.userImageUrl, l.leagueName ,
COUNT(g.gameId) AS predTotal,
(SELECT COUNT(g.gameId) FROM games AS g WHERE g.gameResultOut = 1 AND g.gameWon = 1) AS gamesWon,
(SELECT COUNT(g.gameId) FROM games AS g WHERE g.gameResultOut = 1 AND g.gameWon = 0) AS gamesLost,
ub.balance
FROM games AS g
LEFT JOIN league AS l ON l.leagueId = g.gameLeagueId
LEFT JOIN user AS u ON u.user_id = g.gameUserID
LEFT JOIN user_balance AS ub ON ub.userId = u.userID
WHERE l.leagueId = 4
GROUP BY u.userId
ORDER BY ub.balance DESC
I can calculate easily the win percentage after the query so that's not a problem, but the result for the Wins and Lost are all the same and even when it comes to changing the leageId, the results are still the same which is not what I want.
Can anyone help?
Thanks & Regards,
Necron
As far as I see, the games table stores games that users played. So, in order to know how many games each user played/won/lost, you're missing the link in the subqueries between games and users.
Your subqueries are:
(SELECT COUNT(g.gameId ) FROM games AS g WHERE g.gameResultOut = 1 AND g.gameWon = 1) AS gamesWon,
(SELECT COUNT(g.gameId) FROM games AS g WHERE g.gameResultOut = 1 AND g.gameWon = 0) AS gamesLost,
And they should be:
(SELECT COUNT(gw.gameId ) FROM games AS gw WHERE gw.gameResultOut = 1 AND gw.gameWon = 1 AND gw.gameUserID = u.user_id) AS gamesWon,
(SELECT COUNT(gl.gameId) FROM games AS gl WHERE gl.gameResultOut = 1 AND gl.gameWon = 0 AND gl.gameUserID = u.user_id) AS gamesLost,
I guess this is what you're looking for :)
EDIT based on comments, adding tips for User and Site statistics:
For those information you'll need to perform several distinct queries, as most of them are going to sum some values and/or group by a given column, which won't fit for another query. I'll try to give you some ideas so you can work on them.
User Statistics
Most Games Won or Lost
The previous answer for the query you provided counts how many times user has lost/won any game, but does not distinct this data between games.
So, if you want to know in which game user has most wins/losses, you should have something like this:
SELECT
g.gameName,
-- How many times the user won per game
(SELECT COUNT(gw.gameId) FROM games gw WHERE gw.gameResultOut = 1 AND gw.gameWon = 1 AND gw.gameUserID = u.user_id) AS gamesWon,
-- How many times the user payed each game
COUNT(g.gameId) AS gamesPlayed,
-- The Win Ratio. This may need a little work on, depending on what you want.
-- Be aware that if a user played a game 1 time and won, it's ratio will be 1 (100%)
-- So maybe you'll want to add your own rule to determine which game should show up here
(gamesWon / gamesPlayed) AS winRatio
FROM
games g
INNER JOIN user u ON u.user_id = g.gameUserID
-- Groups and counts data based on games + users
GROUP BY g.gameId, u.user_id
-- Now you order by the win ratio
ORDER BY winRatio DESC
-- And get only the first result, which means the game the player has most wins.
LIMIT 1
For lost games, it's pretty much the same query, changing the desired fields and maths.
Game winning accuracy
Somewhat the previous query, except that you won't group by the gameID anymore. Just group by the user and do your math.
Site Statistics
Well, as far as I see, we're still on a similar query. The difference is that for the whole Site statistics you won't ever group by user. You may group by game or league, depending on what you are trying to achieve.
Bottom line: looks that most queries are similar, you'll have to play with them and adapt for each information you need to retrieve. Please note that they might not work plenty as I could not test them on your DB. You may need to correct some inconsistence according to your database/tables schema.
I hope this may give you some insight to work on.