Aggregation and finding mode of data set - sql-server-2008

I am aggregating data and I cannot sum certain columns so I would like to take the most frequent observation from that column, or the mode value. Each ID can have only one site and number, so if there are ties then pick the smaller of the two numbers.
Example follows:
ID site number
1 3 45
1 3 45
1 2 56
1 3 56
2 4 5
2 5 5
2 5 3
2 5 5
I want it to look like:
ID site number
1 3 45
2 5 5

Here's one way of doing it:
with aggregation as
(
select id
, site
, number
, numberCount = count(1)
from SiteNumbers
group by id
, site
, number
), aggregateRanks as
(
select *
, idRank = row_number() over (partition by id order by numberCount desc, number, site)
from aggregation
)
select id
, site
, number
from aggregateRanks
where idRank = 1
SQL Fiddle with demo.
It matches your results, but depending on all your different cases might need some tweaking; hopefully it gives you some ideas.

Related

mysql complex request by distinct pair (non commutative)

I'm trying to create an sql (mariadb) request that select multiples columns but need two columns to be a unique pair but making sure the pair selected has its created_at value the least than the other duplicata pairs.
Here is what my table approximately looks like :
id
from_user_id
to_user_id
created_at
1
1
2
1000000005
2
2
1
1000000002
3
2
3
1000000008
4
5
6
999999999
5
6
5
100000006
I made this table precise to explain the request I want.
So I want to select the distinct pair (from_user_id, to_user_id) implying that the couple (1,2) which could also be (2,1) should be unique. The second rule is it should pick the couple with the minimum created_at value.
So the result table I want is :
id
from_user_id
to_user_id
created_at
2
2
1
1000000002
3
2
3
1000000008
4
5
6
999999999
2,1,1000000002 because the created_at is lesser than the other same couple case (1,2,1000000005).
In this case if I want only the values above created_at:999999999 to be selected I just have to add one condition.
I really hope my question is clear. I'm struggling to make distinct pairs work with other columns.
Thanks in advance for your answers.
WITH
cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY GREATEST(from_user_id,to_user_id),
LEAST(from_user_id,to_user_id)
ORDER BY created_at) rn
FROM table
)
SELECT *
FROM cte
WHERE rn = 1

SQL - Max value from a group by when creating a new field

I have a database with a table called BOOKINGS containing the following values
main-id place-id start-date end-date
1 1 2018-8-1 2018-8-8
2 2 2018-6-6 2018-6-9
3 3 2018-5-5 2018-5-8
4 4 2018-4-4 2018-4-5
5 5 2018-3-3 2018-3-10
5 1 2018-1-1 2018-1-6
4 2 2018-2-1 2018-2-10
3 3 2018-3-1 2018-3-28
2 4 2018-4-1 2018-4-6
1 5 2018-5-1 2018-5-15
1 3 2018-6-1 2018-8-8
1 4 2018-7-1 2018-7-6
1 1 2018-8-1 2018-8-18
1 2 2018-9-1 2018-9-3
1 5 2018-10-1 2018-10-6
2 5 2018-11-1 2018-11-5
2 3 2018-12-1 2018-12-25
2 2 2018-2-2 2018-2-19
2 4 2018-4-4 2018-4-9
2 1 2018-5-5 2018-5-23
What I need to do is for each main-id I need to find the largest total number of days for every place-id. Basically, I need to determine where each main-id has spend the most time.
This information must then be put into a view, so unfortunately I can't use temporary tables.
The query that gets me the closest is
CREATE VIEW `MOSTTIME` (`main-id`,`place-id`,`total`) AS
SELECT `BOOKINGS`.`main-id`, `BOOKINGS`.`place-id`, SUM(DATEDIFF(`end-date`, `begin-date`)) AS `total`
FROM `BOOKINGS`
GROUP BY `BOOKINGS`.`main-id`,`RESERVATION`.`place-id`
Which yields:
main-id place-id total
1 1 24
1 2 18
1 5 5
2 1 2
2 2 20
2 4 9
3 1 68
3 2 24
3 3 30
4 1 5
4 2 10
4 4 1
5 1 19
5 2 4
5 5 7
What I need is then the max total for each distinct main-id:
main-id place-id total
1 1 24
2 2 20
3 1 68
4 2 10
5 1 19
I've dug through a large amount of similar posts that recommend things like self joins; however, due to the fact that I have to create the new field total using an aggregate function (SUM) and another function (DATEDIFF) rather than just querying an existing field, my attempts at implementing those solutions have been unsuccessful.
I am hoping that my query that got me close will only require a small modification to get the correct solution.
Having hyphen character - in column name (which is also minus operator) is a really bad idea. Do consider replacing it with underscore character _.
One possible way is to use Derived Tables. One Derived Table is used to determine the total on a group of main id and place id. Another Derived Table is used to get maximum value out of them based on main id. We can then join back to get only the row corresponding to the maximum value.
CREATE VIEW `MOSTTIME` (`main-id`,`place-id`,`total`) AS
SELECT b1.main_id, b1.place_id, b1.total
FROM
(
SELECT `main-id` AS main_id,
`place-id` AS place_id,
SUM(DATEDIFF(`end-date`, `begin-date`)) AS total
FROM BOOKINGS
GROUP BY main_id, place_id
) AS b1
JOIN
(
SELECT dt.main_id, MAX(dt.total) AS max_total
FROM
(
SELECT `main-id` AS main_id,
`place-id` AS place_id,
SUM(DATEDIFF(`end-date`, `begin-date`)) AS total
FROM BOOKINGS
GROUP BY main_id, place_id
) AS dt
GROUP BY dt.main_id
) AS b2
ON b1.main_id = b2.main_id AND
b1.total = b2.max_total
MySQL 8+ solution would be utilizing the Row_Number() functionality:
CREATE VIEW `MOSTTIME` (`main-id`,`place-id`,`total`) AS
SELECT b.main_id, b.place_id, b.total
FROM
(
SELECT dt.main_id,
dt.place_id,
dt.total
ROW_NUMBER() OVER (PARTITION BY dt.main_id
ORDER BY dt.total DESC) AS row_num
FROM
(
SELECT `main-id` AS main_id,
`place-id` AS place_id,
SUM(DATEDIFF(`end-date`, `begin-date`)) AS total
FROM BOOKINGS
GROUP BY main_id, place_id
) AS dt
GROUP BY dt.main_id
) AS b
WHERE b.row_num = 1

What should be the MySQL query having dynamic group by cluase?

Need MySQL query for below problem
Consider a table having student and their marks in a particular subject
Schema
std_id int(11)
marks int(11)
Sample data
std_id marks
1 10
2 15
3 90
4 120
5 25
6 29
7 121
8 122
Now I have an web app in which a form will take a input (int) from user.
For eg 12
then I am required to show total number of student ids (std_id) and their corresponding marks group.
Eg
std_total (tot no of students) group (marks range we got from form)
1 0-11
1 12-23
2 24-35
1 84-95
3 120-131
#Barmar Your answer was almost correct, I made few changes to clean the output. Your query gives output as below :
0-11 2
1-12 2
2-13 1
3-14 1
4-15 1
6-17 1
7-18 2
My query return Outout as
0-11 2
12-23 2
24-35 1
36-47 1
48-59 1
72-83 1
84-95 2
SELECT CONCAT(FLOOR(marks/12)*12, '-', FLOOR(marks/12)+11*(FLOOR(marks/12))+11) AS `group`, COUNT(*) as `std_total`
FROM yourTable
GROUP BY `group`
Use division and FLOOR() to get the beginning of each range.
SELECT CONCAT(FLOOR(marks/12), '-', FLOOR(marks/12)+11) AS `group`, COUNT(*) as `std_total`
FROM yourTable
GROUP BY `group`

MySQL Winning Streak for every Player

I have a table with winner and loser statistics from a game:
id winner_id loser_id
1 1 2
2 1 2
3 3 4
4 4 3
5 1 2
6 2 1
7 3 4
8 3 2
9 3 5
10 3 6
11 2 3
12 3 6
13 2 3
I want a result table where i can find the highest winning streak of every player in the game. A streak of a player is broken, when he lost a game (player_id = loser_id). It should look like:
player_id win_streak
1 3
2 2
3 4
4 1
5 0
6 0
I tried many queries with user defined variables etc. but i can't find a solution. Thanks!
SQL Fiddle : http://sqlfiddle.com/#!9/3da5f/1
Is this the same as Alex's approach; I'm not quite sure, except that it seems to have one distinct advantage.... ;-)
SELECT player_id, MAX(CASE WHEN result = 'winner' THEN running ELSE 0 END) streak
FROM
( SELECT *
, IF(player_id = #prev_player,IF(result=#prev_result,#i:=#i+1,#i:=1),#i:=1) running
, #prev_result := result
, #prev_player:=player_id
FROM
( SELECT id, 'winner' result, winner_id player_id FROM my_table
UNION
SELECT id, 'loser', loser_id FROM my_table
) x
,
( SELECT #i:=1,#prev_result = '',#prev_player:='' ) vars
ORDER
BY x.player_id
, x.id
) a
GROUP
BY player_id;
I guess you should better to do that on php (or any other language you use) side.
But just to give you some idea and as experiment and example for some unique cases (hope it could be useful somewhere)
Here is my approach:
http://sqlfiddle.com/#!9/57cc65/1
SELECT r.winner_id,
(SELECT MAX(IF(winner_id=r.winner_id,IF(#i IS NULL, #i:=1,#i:=#i+1), IF(loser_id = r.winner_id, #i:=0,0)))
FROM Results r1
WHERE r1.winner_id = r.winner_id
OR r1.loser_id = r.winner_id
GROUP BY IF(winner_id=r.winner_id, winner_id,loser_id)) win_streak
FROM ( SELECT winner_id
FROM Results
GROUP BY winner_id
) r
It returns not all ids now but only who had ever win. So to make it better, probably you have user table. If so it would simplify a query. If you have no user table you need to union all somehow users who had never win.
You are welcome if any questions.

Access Totals Query Not Necessarily Returning First Record

I have a table of data like this:
id user_id A B C
=====================
1 15 1 2 3
2 15 1 2 5
3 20 1 3 9
4 20 1 3 7
I need to remove duplicate user ids and keep the record that sorts lowest when sorting by A then B then C. So using the above table, I set up a temp query (qry_temp) that simply does the sort--first on user_id, then on A, then on B, then on C. It returns the following:
id user_id A B C
====================
1 15 1 2 3
2 15 1 2 5
4 20 1 3 7
3 20 1 3 9
Then I wrote a Totals Query based on qry_temp that just had user_id (Group By) and then id (First), and I assumed this would return the following:
user_id id
===========
15 1
20 4
But it doesn't seem to do that--instead it appears to be just returning the lowest id in a group of duplicate user ids (so I get 1 and 3 instead of 1 and 4). Shouldn't the Totals query use the order of the query it's based upon? Is there a property setting in the query that might impact this or another way to get what I need? If it helps, here is the SQL:
SELECT qry_temp.user_id, First(qry_temp.ID) AS FirstOfID
FROM qry_temp
GROUP BY qry_temp.user_id;
You need a different type of query, for example:
SELECT tmp.id,
tmp.user_id,
tmp.a,
tmp.b,
tmp.c
FROM tmp
WHERE (( ( tmp.id ) IN (SELECT TOP 1 id
FROM tmp t
WHERE t.user_id = tmp.user_id
ORDER BY t.a,
t.b,
t.c,
t.id) ));
Where tmp is the name of your table. First, Last, Min and Max are not dependent on a sort order. In relational databases, sort orders are quite ephemeral.