How to optimize the subqueries in SQL? - mysql

I have a data-set, the columns sample information are like below:
Date ID Cost
05/01 1001 30
05/01 1024 19
05/01 1001 29
05/02 1001 28
05/02 1002 19
05/02 1008 16
05/03 1017 89
05/04 1003 28
05/04 1001 16
05/05 1017 28
05/06 1002 44
... etc...
And I want to create a table to display the top one payer(cost the most) on each day, which means there are only two columns in the table, and the output sample should be like this:
Date ID
05/01 1001
05/02 1001
05/03 1017
05/04 1003
...etc...
I know this question is simple, and my problem is that I want to simplify the queries.
My query:
select Date, ID
from (select Date, ID, max(SumCost)
from (select Date, ID, sum(cost) as SumCost
from table1
group by Date, ID) a
group by Date, ID) b;
It seems kind of stupid, and I want to optimize the queries. The point is that I want to only output the Date and the Id, these two columns.
Any suggestions?

Here is a method using a correlated subquery:
select t.*
from t
where t.cost = (select max(t2.cost) from t t2 where t2.date = t.date);

If we take a max cost when there are multiple costs for the player on the same day, then this query will work. The query that you have written above is incorrect.
Select date, ID
from
(
Select Date, ID, row_number() over(partition by date order by cost desc) as rnk
from table
) a
where rnk = 1

Related

SQL nested query under WHERE

One of the test questions came by with following schemas, to look for the best doctor in terms of:
Best scored;
The most times/attempts;
For each medical procedures (in terms of name)
[doctor] table
id
first_name
last_name
age
1
Phillip
Singleton
50
2
Heidi
Elliott
34
3
Beulah
Townsend
35
4
Gary
Pena
36
5
Doug
Lowe
45
[medical_procedure] table
id
doctor_id
name
score
1
3
colonoscopy
44
2
1
colonoscopy
37
3
4
ulcer surgery
98
4
2
angiography
79
5
3
angiography
84
6
3
embolization
87
and list goes on...
Given solution as follow:
WITH cte AS(
SELECT
name,
first_name,
last_name,
COUNT(*) AS procedure_count,
RANK() OVER(
PARTITION BY name
ORDER BY COUNT(*) DESC) AS place
FROM
medical_procedure p JOIN doctor d
ON p.doctor_id = d.id
WHERE
score >= (
SELECT AVG(score)
FROM medical_procedure pp
WHERE pp.name = p.name)
GROUP BY
name,
first_name,
last_name
)
SELECT
name,
first_name,
last_name
FROM cte
WHERE place = 1;
It'll mean a lot to be clarified on/explain on how the WHERE clause worked out under the subquery:
How it worked out in general
Why must we match the two pp.name and p.name for it to reflect the correct rows...
...
WHERE
score >= (
SELECT AVG(score)
FROM medical_procedure pp
WHERE pp.name = p.name)
...
Thanks a heap!
Above is join with doctor and medical procedure and group by procedure name and you need doctor names with most attempt and best scored.
Subquery will join by procedure avg score and those who have better score than avg will be filtered.
Now there can be multiple doctor better than avg so taken rank by procedure count so most attempted will come first and then you taken first to pick top one

Get original RANK() value based on row create date

Using MariaDB and trying to see if I can get pull original rankings for each row of a table based on the create date.
For example, imagine a scores table that has different scores for different users and categories (lower score is better in this case)
id
leaderboardId
userId
score
submittedAt ↓
rankAtSubmit
9
15
555
50.5
2022-01-20 01:00:00
2
8
15
999
58.0
2022-01-19 01:00:00
3
7
15
999
59.1
2022-01-15 01:00:00
3
6
15
123
49.0
2022-01-12 01:00:00
1
5
15
222
51.0
2022-01-10 01:00:00
1
4
14
222
87.0
2022-01-09 01:00:00
1
5
15
555
51.0
2022-01-04 01:00:00
1
The "rankAtSubmit" column is what I'm trying to generate here if possible.
I want to take the best/smallest score of each user+leaderboard and determine what the rank of that score was when it was submitted.
My attempt at this failed because in MySQL you cannot reference outer level columns more than 1 level deep in a subquery resulting in an error trying to reference t.submittedAt in the following query:
SELECT *, (
SELECT ranking FROM (
SELECT id, RANK() OVER (PARTITION BY leaderboardId ORDER BY score ASC) ranking
FROM scores x
WHERE x.submittedAt <= t.submittedAt
GROUP BY userId, leaderboardId
) ranks
WHERE ranks.id = t.id
) rankAtSubmit
FROM scores t
Instead of using RANK(), I was able to accomplish this by with a single subquery that counts the number of users that have a score that is lower than and submitted before the given score.
SELECT id, userId, score, leaderboardId, submittedAt,
(
SELECT COUNT(DISTINCT userId) + 1
FROM scores t2
WHERE t2.userId = t.userId AND
t2.leaderboardId = t.leaderboardId AND
t2.score < t.score AND
t2.submittedAt <= t.submittedAt
) AS rankAtSubmit
FROM scores t
What I understand from your question is you want to know the minimum and maximum rank of each user.
Here is the code
SELECT userId, leaderboardId, score, min(rankAtSubmit),max(rankAtSubmit)
FROM scores
group BY userId,
leaderboardId,
scorescode here

select rows in mysql with latest date for each quiz_id repeated multiple times

I have a table where each quiz ID is repeated multiple times. there is a date in front of each quiz id in each row. I want to select entire row for each quiz ID where date is latest with user. The date format is mm/dd/YYYY.
Sample -
USER_ID Quiz_id Name Date Marks .. .. ..
1 2 poly 4/3/2020 27
1 2 poly 4/3/2019 98
1 4 moro 4/3/2020 09
2 5 cat 4/12/2015 87
2 4 moro 4/3/2009 56
2 6 PP 4/3/2011 76
3 2 poly 4/3/2020 12
3 2 poly 5/3/2020 09
3 7 dog 4/3/2011 23
I want result look like this:Result
USER_ID Quiz_id Name Date Marks .. .. ..
1 2 poly 4/3/2020 27
1 4 moro 4/3/2020 09
2 5 cat 4/12/2015 87
2 4 moro 4/3/2009 56
2 6 PP 4/3/2011 76
3 2 poly 5/3/2020 09
3 7 dog 4/3/2011 23
You can use rank function to get the desired result:
Demo
SELECT A.* FROM (
SELECT A.*, RANK() OVER(PARTITION BY USER_ID,QUIZ_ID, NAME ORDER BY DATE DESC) RN FROM
Table1 A ORDER BY USER_ID) A WHERE RN = 1 ORDER BY USER_ID, QUIZ_ID;
I don't have MySQL installed so you will need to test and report back. The general idea is to identify the row of interest using max and a group by (table t). As the Date column appears to be text column (MySQL uses the format YYYY-MM-DD for dates) you will need to convert it to a date with str_to_date() so you can use the max() aggregate function. Finally, join with the original table (here table t2 to do the date conversion), as only the aggregate column(s) and columns named in the group by are well defined (in table t1), i.e.:
select USER_ID, Quiz_id, Date, Marks from (
select USER_ID, Quiz_id, max(str_to_date(Date, '%m/%d/%Y')) as Date2 from quiz group by 1, 2
) as t natural join (
select *, str_to_date(Date, '%m/%d/%Y') Date2 from Quiz
) as t2;
I don't recall off-hand but Date might be reserved word, in which case you will need to quote the column name, or ideally rename said column to use a better name.
Also, the original table is not in 3rd normal form as Quiz_id depends on Name. Quiz_id, as implied, should be a foreign key to a lookup table that holds the Name.

Get last updated value SQL

I have the following table structure..
emp_id | base_rate | base_sal | effective_on
1001 26.22 1200 2015-10-12
1001 26.00 1100 2015-11-12
1001 26.00 1100 2015-12-12
1002 18 1200 2015-10-12
1002 19 1100 2015-11-12
I need to find get the last updated base_rate with effective_on date for each emp_id
Like output ..
1001 26.00 1100 2015-11-12
1002 19 1100 2015-11-12
See, for 1001 2015-11-12 is selected instead of 2015-12-12 which is latest as the base_rate is same and hence previously effective from 2015-11-12
I have tried.. everything.. not able to find the exact query..
This method is simple and easy to understand.
1) Assign rank for all the effective dates in descending order by partitioning
for each employee.
2) Select all the required fields for the last updated effective date from the
inner query and display the result.
SELECT emp_id,base_rate,base_sal
FROM
(
SELECT *,
ROW_NUMBER() OVER ( PARTITION BY emp_id ORDER BY effective_on DESC ) AS rn
FROM table
)
WHERE rn = 1;
One method is to generate a subset of employees with max effective on and join back to the base set..
In the below we generate set "B" with Emp_ID and ME (max effective) and then we join back to the entire data set in the table and use the columns emp_ID and ME to limit the data in the base set and return all columns we care about.
Put in English:
We generated a data set for all the employess with only their max effective date, and then joined this data set back to the base set to limit the data in the base set to only contain records for employees with their most recent effective_on date.
SELECT A.Emp_ID, A.Base_Rate, A.Base_Sal, min(C.Effective_On)
FROM Table A
INNER JOIN (SELECT emp_ID, Max(Effective_on) ME
FROM Table A
GROUP BY Emp_ID) B
on A.Emp_ID = B.Emp_ID
and A.Effective_ON = B.ME
INNER JOIN TABLE C
on C.Emp_ID = A.Emp_ID
and C.Base_Rate= A.Base_rate
and C.base_Sal = A.Base_Sal
GROUP BY A.Emp_ID, A.Base_Rate, A.Base_Sal
This is more or less database agnostic whereas a row_number and limit would not work on mySQL as it doesn't support window functions.
You can first get the minimum date each base_rate becomes effective on for every employee and then take the max from there. Here is how you can do it using row_number() in oracle:
with temp(emp_id, base_rate, base_sal, effective_on)
as (select 1001, 26.22, 1200, '2015-10-12' from dual union all
select 1001, 26.00, 1100, '2015-11-12' from dual union all
select 1001, 26.00, 1100, '2015-12-12' from dual union all
select 1002, 18, 1200, '2015-10-12' from dual union all
select 1002, 19, 1100, '2015-11-12' from dual
)
SELECT emp_id,base_rate,base_sal,effective_on FROM(
SELECT temp2.*,
row_number() OVER (PARTITION BY EMP_ID ORDER BY effective_on DESC) AS rn2
FROM
(
SELECT temp.*,
row_number() OVER (PARTITION BY EMP_ID, BASE_RATE ORDER BY effective_on) AS rn
FROM temp
) temp2
WHERE rn = 1
)
WHERE rn2 = 1;

Select distinct ID's

We would like an SQL statement that lists the number of times a unique IP/uniqueID pair has visited on any unique date ordered by the maximum numbers of times that the UniqueID/IP pair has visited.
Here is the table structure:
Column Type
------------------------------
Date Timestamp
NumberofUsers smallint
ipaddress varchar(16)
location varchar(2)
Count bigint(20)
Here is the sql we have been trying:
SELECT
LicenseID,
MAX(Date) AS LatestAccess,
COUNT(DISTINCT Location) AS DifferentCountries,
COUNT(DISTINCT IPAddress) AS DistinctIPCount,
COUNT(DISTINCT Date,IPAddress) AS DistinctDate
FROM
LicenseHistory
WHERE
(LicenseID<>30002)
GROUP BY
LicenseID
ORDER BY
DistinctDate DESC
Here is some sample date from the table in CSV format:
2009-10-08 10:37,30002,8,24.108.64.80,CA,2399
2009-05-27 16:57,24508,50,24.108.64.80,CA,645
2008-11-06 12:04,30,100,24.108.64.80,CA,282
2008-02-04 10:51,24508,30,24.69.19.207,CA,62
2009-10-08 14:52,13136,5,24.108.64.80,CA,285
2013-05-13 13:10,718,10,66.251.68.106,US,23860
2008-02-12 11:10,30002,8,24.69.19.207,CA,36
2008-04-09 17:49,18504,10,70.90.32.57,US,121
2007-07-26 13:38,30002,8,76.226.201.191,US,2
2009-12-03 22:35,30002,8,196.25.255.214,ZA,14
2013-05-13 6:49,20341,4,66.232.201.125,US,2676
2007-07-28 23:57,30002,8,75.81.107.238,US,1
2007-07-29 10:39,30002,8,70.63.54.162,US,1
2007-07-30 3:53,30002,8,121.210.199.31,AU,4
2007-07-30 5:11,30002,8,41.207.67.10,KE,2
Here is some sample results (not correct yet, last column should not match second to last):
uniqueID LatestAccess DifferentCountries DistinctIPCount DistinctDate
--------------------------------------------------------------------------------
20677 2013-05-13 18:20:15 4 162 162
27749 2013-05-14 05:30:59 7 155 155
459 2013-05-13 11:12:47 2 143 143
24965 2013-05-14 13:44:56 6 123 123
25226 2013-05-06 16:11:56 3 104 104
20370 2013-05-14 05:54:04 4 100 100
The problem I think is in the "COUNT(DISTINCT Date,IPAddress) AS DistinctDate" piece.
You need a COUNT DISTINCT. Here's a guess because there's no table structure provided:
SELECT
VisitDate,
COUNT(DISTINCT IPAddress, UniqueID) AS UniqueVisits
FROM MyTable
GROUP BY VisitDate
ORDER BY UniqueVisits DESC
Or if your visit date is a datetime or timestamp, cut out the time part with the DATE function (note the changes on the second and fifth lines):
SELECT
DATE(VisitDate),
COUNT(DISTINCT IPAddress, UniqueID) AS UniqueVisits
FROM MyTable
GROUP BY DATE(VisitDate)
ORDER BY UniqueVisits DESC
Your date format has a time in it. So, I think all the dates are unique. Try this:
SELECT
LicenseID,
MAX(Date) AS LatestAccess,
COUNT(DISTINCT Location) AS DifferentCountries,
COUNT(DISTINCT IPAddress) AS DistinctIPCount,
COUNT(DISTINCT date(Date), IPAddress) AS DistinctDate
FROM
LicenseHistory
WHERE
(LicenseID<>30002)
GROUP BY
LicenseID
ORDER BY
DistinctDate DESC