Getting difference between counts of two subqueries - mysql

I'm trying to determine the score of an entry by finding the difference between the number of upvotes and downvotes it has received in MYSQL by running SELECT (SELECT COUNT(vote_id) AS vote_up FROMvotesWHERE vote='UP'),(SELECT COUNT(vote_id) AS vote_down FROMvotesWHERE vote='DOWN'),(vote_up - vote_down AS vote_score). When I try to run this though, it tells me that I do not have proper syntax. What am I doing wrong?
Also, is there a better way to write this?
And finally, what is the ideal way to find the item with the highest and lowest number of votes? Would I just ORDER BY [above query]?

You can do it with
SELECT some_id
, SUM(
CASE
WHEN vote = 'UP'
THEN 1
WHEN vote = 'DOWN'
THEN -1
ELSE 0
END
) as vote_score
FROM votes
GROUP BY some_id
Note that the better approach is to have +1 or -1 stored in vote, then you can just do:
SELECT some_id, SUM(vote) as vote_score
FROM votes
GROUP BY some_id
BTW if my formatting looks odd to you, I explained it in http://bentilly.blogspot.com/2011/02/sql-formatting-style.html.

You can do it by pulling that last clause into a (SELECT ...) block as well:
SELECT
(SELECT COUNT(vote_id) FROM votes WHERE vote='UP') AS vote_up,
(SELECT COUNT(vote_id) FROM votes WHERE vote='DOWN') AS vote_down,
(SELECT vote_up - vote_down) AS vote_score
ORDER BY vote_whatever;
Note btilly's answer about having +/- 1 be the upvote / downvote representation. It makes a lot more sense in this context, and allows for smaller tables, faster comparisons, and use of the SUM() function when necessary:
SELECT SUM(vote) from votes;
Also note: You'll only get vote_up and vote_down counts using the multiple (SELECT ...) method - SUM(CASE) will only give you the total.

Following up on btilly's answer, If you need to know the lowest and highest but do not need to know what ID has the highest/lowest:
SELECT MIN(score), MAX(score)
FROM (
SELECT SUM(IF(vote = 'DOWN', -1, vote = 'UP')) AS score
FROM votes
GROUP BY ID
)
If you do need to know the ID, use the inner query (add the ID to the select) with a ORDER BY score LIMIT 1 to get the lowest and ORDER BY score DESC LIMIT 1 to get the highest.
note in the case of ties, this will choose only 1 of them.

Related

MySQL Return latest and longest streak of rows with certain conditions

Please have a look at this fiddle.
https://www.db-fiddle.com/f/71CxYHKkzwmXJnovzpFheV/7
I'm trying to accomplish 2 things:
How do I get the length and date of the LATEST STREAK OF CORRECT GUESSES (meaning Result = Guess) without any skipped dates? In this case, it would be 4, starting from 2021-01-05 to 2021-01-08. (Although 2021-01-03 is correct, because there was no guess on 2021-01-04, it should not be included).
How do I get the length and date of the LONGEST STREAK OF CORRECT GUESSES OF ALL TIME? Again meaning Result = Guess, but can be anywhere in the table. Let's say it's 10 from 3 months ago.
To further complicate things, guesses can be made by multiple users AND there will be multiple results (for different game categories for example) on the same day. So the table above is for one user and one game category. I think I can handle this if I can get some guidance on the goals up above.
This is beyond my understanding. Any and all help would be appreciated.
EDIT: I've changed the table to show that the date is not always sequential. Also, I was informed that I should be using MySQL 8.0 for this task as using variables is not good practice for this problem.
Edit: Using the window functions, starting to get somewhere:
Please check the fiddle. It's pretty close to what I'm trying to get to, but the '4' in the total column should be a 1. In other words, the "sum" should restart. Not sure how to achieve this, because it's clear that the window function will group based on the conditions, breaking the order and thus the streak.
Updated: I've updated the fiddle per #The Impaler's request. The table here is more representative to what I'm actually working with (still not exact, but much closer).
Since this new fiddle is more representative, I'll also explain my final goal. I'd also like to get the streak for each game_type. The way I've been comparing game_type result on a given day to "community" (basically all the users) guess is by summing all the 0's and 1's for each game_type on that date from all the users and then using whichever greater as the 'guess'. This way, I can get how the "community" is doing as a whole. This works for individual dates, but to do a streak, I'm not sure.
Update 2
So this is as far as I've got:
https://www.db-fiddle.com/f/71CxYHKkzwmXJnovzpFheV/11
I tried to do a nested window function but that's not allowed. I have the proper groupings and column for when guess = result. Now I need help figuring out the streak within the groups.
This is a typical "Gaps & Islands" problem. Once you assemble the islands the query becomes easy.
For example, for a single user, as stated in the fiddle you can get the LONGEST STREAK by doing:
with
i as (
select
min(dayt) as starting_day,
max(dayt) as ending_day,
count(*) as streak_length
from (
select *, sum(beach) over(order by dayt) as island
from (
select *,
guess = result as inland,
case when (guess = result) <> (
lag(guess) over(order by dayt) = lag(result) over(order by dayt))
then 1 else 0 end as beach
from mytable
) x
where inland = 1
) y
group by island
)
select *
from i
order by streak_length desc
limit 1;
Result:
starting_day ending_day streak_length
------------- ----------- -------------
2021-01-06 2021-01-08 3
To get the LATEST STREAK you just need to change the ORDER BY clause at the end as shown below:
with
i as (
select
min(dayt) as starting_day,
max(dayt) as ending_day,
count(*) as streak_length
from (
select *, sum(beach) over(order by dayt) as island
from (
select *,
guess = result as inland,
case when (guess = result) <> (
lag(guess) over(order by dayt) = lag(result) over(order by dayt))
then 1 else 0 end as beach
from mytable
) x
where inland = 1
) y
group by island
)
select *
from i
order by ending_day desc
limit 1;
Result (same result as before):
starting_day ending_day streak_length
------------- ----------- -------------
2021-01-06 2021-01-08 3
See running example at DB Fiddle.
Note: You can remove the LIMIT clause at the end to see all the islands, not just the selected one.
For multi-users it's just a matter of modifying the windows (adding partitioning) and the rest of the query remains the same. If you provide a fiddle for multi-users I can add the solution as well.
So, it took a while, but thanks to #The Impaler providing me the basis and the link below, I was able to solve the problem.
https://www.red-gate.com/simple-talk/sql/t-sql-programming/efficient-solutions-to-gaps-and-islands-challenges/
Here is the full solution:
with GAME_LOG as (
select
*,
guess = result as correct,
lag(case when (guess = result) then 1 else 0 end) over(partition by user_id, game_type) as previous_game_result,
lead(case when (guess = result) then 1 else 0 end) over(partition by user_id, game_type) as next_game_result,
row_number() over(partition by user_id, game_type order by dayt DESC) as ilocation
from mytable
),
CTE_ISLAND_START as (
select
*,
row_number() over(partition by user_id, game_type order by dayt DESC) as inumber,
dayt as island_start_time,
ilocation as island_start_location
from GAME_LOG
where correct = 1 AND
(previous_game_result <> 1 OR previous_game_result is null)
),
CTE_ISLAND_END as (
select
*,
row_number() over(partition by user_id, game_type order by dayt DESC) as inumber,
dayt as island_end_time,
ilocation as island_end_location
from GAME_LOG
where correct = 1 AND
(next_game_result <> 1 OR next_game_result is null)
)
select
CTE_ISLAND_START.user_id,
CTE_ISLAND_START.game_type,
CTE_ISLAND_START.island_start_time as streak_end,
CTE_ISLAND_END.island_end_time as streak_start,
cast(CTE_ISLAND_END.island_end_location as signed) -
cast(CTE_ISLAND_START.island_start_location as signed) + 1 as streak
from CTE_ISLAND_START
inner join CTE_ISLAND_END
on CTE_ISLAND_START.inumber = CTE_ISLAND_END.inumber AND
CTE_ISLAND_START.user_id = CTE_ISLAND_END.user_id AND
CTE_ISLAND_START.game_type = CTE_ISLAND_END.game_type
This will give all the streaks for each user_id, each game_type, as well as the start and end dates of the streak.
You can simply add a WHERE clause to filter by game_type and user_id.
Here's the fiddle with slightly updated dataset.
Fiddle

PHP/MYSQL Displaying hiscores with a group by working but not as intended

Output is displaying things by the username, and the correct score is there, but it wont grab the other data with it on the same row.
I've tried adding a group by with proof as well but it's not working. Adding a group by proof will just add another record in the output showing two hiscores when I really only want one to show per a user.
SELECT MAX(total) AS total, sUsername, proof, approved
FROM userrankings
WHERE category = 0
GROUP BY sUsername
ORDER BY total DESC
This is the output:
Rank: 1
User: Test User
Score: 2414
Proof: html site 1
However, in the database the score is correct, but the proof section should be HTML 2 because the 2nd entry has the highest score, not the first entry, but with the group by sUsername, it's forcing to only grab the very first entry rather than the entry I need it to be displaying.
I understand that, for each user, you want to pull out the record that has the highest total.
If you are running MySQL 8.0, you can use window function ROW_NUMBER() to rank the records of each user in a subquery, and do the filtering in the outer query:
SELECT total, sUsername, proof, approved
FROM (
SELECT
total,
sUsername,
proof,
approved,
ROW_NUMBER() OVER(PARTITION BY sUsername ORDER BY total DESC ) rn
FROM userrankings
WHERE category = 0
) x WHERE rn = 1
ORDER BY total DESC
On previous versions of MySQL, one solution is to use a correlated subquery with a NOT EXISTS condition to ensure that only the relevant records are displayed,, like:
SELECT total, sUsername, proof, approved
FROM userrankings u
WHERE
category = 0
AND NOT EXISTS (
SELECT 1
FROM userrankings u1
WHERE
u1.category = 0
AND u1.sUsername = u.sUsername
AND u1.total > u.total
)
ORDER BY total DESC
You need to GROUP_CONCAT all the proofs:
SELECT MAX(total) AS total, sUsername, GROUP_CONCAT(proof ORDER BY total DESC), approved
FROM userrankings
WHERE category = 0
GROUP BY sUsername
ORDER BY total DESC
And looking at ORDER BY total DESC, the first of proof should be your result

Calculating rank not working - Mysql

I have a DB Table user_points which contains user's points and I am trying to calculate ranking based on points. It is working fine for all users except users having 1 point.
If user have 1 point it is showing it's rank as 0 but it should display it's rank as last or in last numbers like: 12083 etc.
Higher points are, ranking should be higher as well. For example:
1000 points = rank 1
1 point = rank 1223
Following is the query.
SELECT id, mobileNo, points,
FIND_IN_SET( points, (SELECT GROUP_CONCAT( points ORDER BY points DESC )
FROM users_points )) AS rank
FROM users_points
WHERE mobileNo = '03214701777'
What should I change to fix it?
SELECT a.id, a.mobileNo, a.points,
IFNULL((SELECT COUNT(*) AS rank
FROM users_points b
WHERE b.points<a.points), 0)+1 as rank
FROM user_points a
WHERE a.mobileNo = '03214701777'
Seems to be what you are looking for. While it is still very innefficient it is better than your approach using FIND_IN_SET(). If you really want to use FIND_IN_SET() then you need to pad the scores to a consistent width and divide by the width+1 to get the rank.

sql return most prevalent column value

I'm a beginner at SQL, how do I get a query which returns the most prevalent column value? Probably there is an answer somewhere but I don't know how to google it.
For example in the user_id column the query should return the value 1 because this is the most prevalent number.
One approach is to do a GROUP BY aggregation and then apply a LIMIT trick:
SELECT user_id, COUNT(*) AS cnt
FROM yourTable
GROUP BY user_id
ORDER BY COUNT(*) DESC
LIMIT 1;
If you want something more complex, then you would be getting into the realm of rank functionality. MySQL (at least as of the current release) does not support built-in rank support, so it can be tricky to perform such queries.
SELECT top 1 user_id, COUNT(*) AS cnt
FROM yourTable
GROUP BY user_id
ORDER BY COUNT(*) DESC
Have a common table expression that counts each user_id. Select user_id where the count is the max count. Will return both user_id's in case of a tie.
with cte as
(
SELECT user_id, COUNT(*) AS cnt
FROM yourTable
GROUP BY user_id
)
select user_id
from cte
where cnt = (select max(cnt) from cte)

SQL Distinct - Get all values

Thanks for looking, I'm trying to get 20 entries from the database randomly and unique, so the same one doesn't appear twice. But I also have a questionGroup field, which should also not appear twice. I want to make that field distinct, but then get the ID of the field selected.
Below is my NOT WORKING script, because it does the ID as distinct too which
SELECT DISTINCT `questionGroup`,`id`
FROM `questions`
WHERE `area`='1'
ORDER BY rand() LIMIT 20
Any advise is greatly appreciated!
Thanks
Try doing the group by/distinct first in a subquery:
select *
from (select distinct `questionGroup`,`id`
from `questions`
where `area`='1'
) qc
order by rand()
limit 20
I see . . . What you want is to select a random row from each group, and then limit it to 20 groups. This is a harder problem. I'm not sure if you can do this accurately with a single query in mysql, not using variables or outside tables.
Here is an approximation:
select *
from (select `questionGroup`
coalesce(max(case when rand()*num < 1 then id end), min(id)) as id
from `questions` q join
(select questionGroup, count(*) as num
from questions
group by questionGroup
) qg
on qg.questionGroup = q.questionGroup
where `area`='1'
group by questionGroup
) qc
order by rand()
limit 20
This uses rand() to select an id, taking, on average two per grouping (but it is random, so sometimes 0, 1, 2, etc.). It chooses the max() of these. If none appear, then it takes the minimum.
This will be slightly biased away from the maximum id (or minimum, if you switch the min's and max's in the equation). For most applications, I'm not sure that this bias would make a big difference. In other databases that support ranking functions, you can solve the problem directly.
Something like this
SELECT DISTINCT *
FROM (
SELECT `questionGroup`,`id`
FROM `questions`
WHERE `area`='1'
ORDER BY rand()
) As q
LIMIT 20