Get Max "laps" but with minimum "time" from 3 results as join - mysql

I'm at a loss and hoping for some help. I've searched SOF, google and tried as many things as I can think but can't get anything even close to what I'm after (so far away there is no point in posting my attempts).
results table
result_id
wcics_live_id
class_id
main
round_num
results_drivers table
rd_id
result_id
user_id
race_time (ex. 5:06.231, this is minutes:seconds)
laps (ex. 25)
For each class_id a driver will have 3 entries in the results_drivers table, for example:
Luke Pittman 25 laps 5:06.231
Luke Pittman 24 laps 5:00.691
Luke Pittman 25 laps 5:05.914
Additionally, each class will have multiple drivers - could be as many as 40 or 50.
I need to be able to gather a list of all the drivers, in order of the fastest time (highest laps with lowest race_time), but only returning one result for each driver. For example:
Faster Guy 26 laps 5:11.134
Luke Pittman 25 laps 5:05.914
Joe Doe 25 laps 5:06.014
Other Guy 24 laps 5:00.141
... and so on
Normally I would do a group by with a max value (or something similar) on a column, but I have no idea how to make that happen with 2 separate columns.

I'd do a query with MAX(laps) ... GROUP BY user_id, and then reference that as an inline view in another query, to get the minimum race_time
Something like this:
SELECT ft.user_id
, ft.laps
, MIN(ft.race_time) AS race_time
FROM ( -- maximum laps
SELECT dr.user_id
, MAX(dr.laps) AS max_laps
FROM driver_results dr
GROUP BY dr.user_id
) ml
JOIN driver_results ft
ON ft.user_id = ml.user_id
AND ft.laps = ml.laps
GROUP
BY ft.user_id
, ft.laps
ORDER
BY laps ASC
, race_time DESC
(I'm assuming here that the laps and race_time columns are canonical, such that ORDER BY and MIN/MAX will work to get the highest number of laps and fastest time. If these are stored as strings, then it won't necessarily work right. i.e. if comparing strings: '10:23.456' will be less than '8:15.555'.

Related

Count the number of registration by date

I am stuck with a select I have to do, I have a data base where a new claim file is registered in the table called “claims”, in this table every file is registered as follows :
Sorry, i have attached above a print screen with how the tables look, i don't know why are as bellow when i post it.
ClaimFileNumber || Vehicle number || ……. || OpeningDate
1 abc 20170302
2 bcd 20170302
3 efg 20170301
4 hij 20170301
I need a select which can help me to find out how many claim files are open on each day from when this year started until now, ordered by top 5 days for each month like for example, on the month of May we have: 20170506 - 300 claims, 20170511 – 295 claims, 20170509 – 200 claims etc.
Or it is ok a select which can give me the number of claims opened per day and order them desc.
The problem is that the date stored in table OpeningDate it is stored as numeric and not as date, this is the tricky part at least for me.
I cannot use a select like “select count (OpeningDate) from claim where openingdate = 20170302” for each day because there are more than 200 days from when the year have started.
Thank you in advance for your help.
This should do it:
SELECT OpeningDate, COUNT(OpeningDate)
FROM claim
WHERE LEFT(OpeningDate, 4) = '2017'
GROUP BY OpeningDate
ORDER BY OpeningDate ASC, COUNT(OpeningDate) DESC
You need group by:
select OpeningDate,count(1) from your_table group by OpeningDate
For top 5, you need order by and limit
select OpeningDate,count(1)
from your_table
group by OpeningDateorder
order by 2 desc
limit 5

ORDER BY and GROUP BY those results in a single query

I am trying to query a dataset from a single table, which contains quiz answers/entries from multiple users. I want to pull out the highest scoring entry from each individual user.
My data looks like the following:
ID TP_ID quiz_id name num_questions correct incorrect percent created_at
1 10154312970149546 1 Joe 3 2 1 67 2015-09-20 22:47:10
2 10154312970149546 1 Joe 3 3 0 100 2015-09-21 20:15:20
3 125564674465289 1 Test User 3 1 2 33 2015-09-23 08:07:18
4 10153627558393996 1 Bob 3 3 0 100 2015-09-23 11:27:02
My query looks like the following:
SELECT * FROM `entries`
WHERE `TP_ID` IN('10153627558393996', '10154312970149546')
GROUP BY `TP_ID`
ORDER BY `correct` DESC
In my mind, what that should do is get the two users from the IN clause, order them by the number of correct answers and then group them together, so I should be left with the 2 highest scores from those two users.
In reality it's giving me two results, but the one from Joe gives me the lower of the two values (2), with Bob first with a score of 3. Swapping to ASC ordering keeps the scores the same but places Joe first.
So, how could I achieve what I need?
You're after the groupwise maximum, which can be obtained by joining the grouped results back to the table:
SELECT * FROM entries NATURAL JOIN (
SELECT TP_ID, MAX(correct) correct
FROM entries
WHERE TP_ID IN ('10153627558393996', '10154312970149546')
GROUP BY TP_ID
) t
Of course, if a user has multiple records with the maximal score, it will return all of them; should you only want some subset, you'll need to express the logic for determining which.
MySql is quite lax when it comes to group-by-clauses - but as a rule of thumb you should try to follow the rule that other DBMSs enforce:
In a group-by-query each column should either be part of the group-by-clause or contain a column-function.
For your query I would suggest:
SELECT `TP_ID`,`name`,max(`correct`) FROM `entries`
WHERE `TP_ID` IN('10153627558393996', '10154312970149546')
GROUP BY `TP_ID`,`name`
Since your table seems quite denormalized the group by name-par could be omitted, but it might be necessary in other cases.
ORDER BY is only used to specify in which order the results are returned but does nothing about what results are returned - so you need to apply the max()-function to get the highest number of right answers.

SELECT the oldest of the most recent lines

I have a table storing the scores (with the date) of players they did at each game.
Example:
john 154 10/02/2014
mat 178 09/02/2014
eric 270 08/02/2014
mat 410 07/02/2014
john 155 06/02/2014
In this example I want "eric 270 08/02/2014" because thins is the oldest of the most recents.
Which request must I do to retrieve that ?
As I understand it, you request the oldest entry among the set containing the most recent one of each user.
In such a case, you can deal with your problem using a subquery given the last date for each user, then used in the main query to select and sort only the most recent entry of each user.
SELECT scores.*
FROM scores
INNER JOIN
(
SELECT max(date) last, name
FROM scores
GROUP BY name
) last_temp_table
ON scores.name = last_temp_table.name
AND scores.date = last_temp_table.last
ORDER BY scores.date ASC LIMIT 1;
More info in different SO threads such as MySQL order by before group by
The question as worded doesn't make much sense unless you define most recent.
If I assume that you have some criteria like: "Give the oldest event that happened within the last 3 days" then that is a simple matter of ordering and limiting across a date range.
select * from events where ts >= CURDATE() - 3
order by ts asc
limit 1

INSERT interpolated rows into existing table

I have a MySQL table similar to this simplified example:
orders table
--------------------------------
orderid stockid rem_qty reported
--------------------------------
1000000 100 500 00:01:00
1000000 100 200 01:10:00
1000000 100 200 03:20:00
1000000 100 100 04:30:00
1000000 100 50 11:30:00
:
1000010 100 100 00:01:00
1000010 100 100 01:10:00
1000010 100 20 03:20:00
:
1000020 200 1000 03:20:00
1000020 200 995 08:20:00
1000020 200 995 11:50:00
--------------------------------
The table comes from a 3rd party, weighs in at some 80-100M rows daily, and the format is fixed. It would be good, except it lacks rows showing when rem_qty reaches zero. The good news is, I can estimate them, at least a good upper/lower bound:
The 3rd party scans each distinct stockid at essentially random times throughout the day, and returns one row for each open orderid at that time. For example, stockid = 100 was scanned at (00:01, 01:10, 03:20, 04:30, 11:30). At each time, there will be a row for every current orderid with that stockid. Hence, one can see that orderid = 1000000 was still open at 11:30 (the last scan in our data), but sometime between 03:20 and 04:30, orderid = 1000010 sold out. (The times for stockid = 200 have no bearing on stockid = 100).
So, what I would like to do is INSERT the interpolated rows with rem_qty = 0 for each sold-out order. In this case, we can (only) say that orderid = 1000010 went to 0 at AVG('03:20:00','04:30:00'), so I would like to INSERT the following row:
orders table INSERT
--------------------------------
orderid stockid rem_qty reported
--------------------------------
1000010 100 0 03:55:00
--------------------------------
Trouble is, my SQL is rusty and I've not been able to figure out this complex query. Among other failed attempts, I've tried various JOINs, made a TEMPORARY TABLE stock_report(stockid,last_report), and I can do something like this:
SELECT orders.stockid,
orderid,
MAX(reported),
TIMEDIFF(last_report,MAX(reported)) as timediff
FROM orders
INNER JOIN stock_report
ON orders.stockid = stock_report.stockid
GROUP BY orderid
HAVING timediff > 0
ORDER BY orderid
This would show every sold-out order, along with the HH:MM:SS difference between the last time the orderid was reported, and the last time its stockid was reported. It's maybe a good start, but instead of last_report, I need to be able to calculate a next_report column (specific to this orderid, which would basically be:
SELECT MIN(reported) AS next_report
FROM orders
WHERE reported > #order_max_reported
ORDER BY reported
LIMIT 1
But that's just a vain attempt to illustrate part of what I'm after. Again, what I really need is a way to INSERT new rows into the orders() table at the AVG() time the order's rem_qty went to 0, as in the orders table INSERT example table, above. Or, maybe the 64,000 GFLOP question: would I be better off moving this logic to my main (application) language? I'm working with 100 million rows/day, so efficiency is a concern.
Apologies for the lengthy description. This really is the best I could do to edit for conciseness! Can anyone offer any helpful suggestions?
Possible to do. Have a sub query that gets the max reported time for each order id / stock id and join that against the orders table where the stock id is the same and the latest time is less that the order time. This gets you all the report times for that stock id that are greater than the latest time for that stock id and order id.
Use MIN to get the lowest reported time. Convert the 2 times to seconds, add them together and divide by 2, then convert back from seconds to a time.
Something like this:-
SELECT orderid, stockid, 0, SEC_TO_TIME((TIME_TO_SEC(next_poss_order_report) + TIME_TO_SEC(last_order_report)) / 2)
FROM
(
SELECT a.orderid, a.stockid, last_order_report, MIN(b.reported) next_poss_order_report
FROM
(
SELECT orderid, stockid, MAX(reported) last_order_report
FROM orders_table
GROUP BY orderid, stockid
) a
INNER JOIN orders_table b
ON a.stockid = b.stockid
AND a.last_order_report < b.reported
GROUP BY a.orderid, a.stockid, a.last_order_report
) sub0;
SQL fiddle here:-
http://www.sqlfiddle.com/#!2/cf129/17
Possible to simplify this a bit to:-
SELECT a.orderid, a.stockid, 0, SEC_TO_TIME((TIME_TO_SEC(MIN(b.reported)) + TIME_TO_SEC(last_order_report)) / 2)
FROM
(
SELECT orderid, stockid, MAX(reported) last_order_report
FROM orders_table
GROUP BY orderid, stockid
) a
INNER JOIN orders_table b
ON a.stockid = b.stockid
AND a.last_order_report < b.reported
GROUP BY a.orderid, a.stockid, a.last_order_report;
These queries might take a while, but are probably more efficient than running many queries from scripted code.

MySQL query for items where average price is less than X?

I'm stumped with how to do the following purely in MySQL, and I've resorted to taking my result set and manipulating it in ruby afterwards, which doesn't seem ideal.
Here's the question. With a dataset of 'items' like:
id state_id price issue_date listed
1 5 450 2011 1
1 5 455 2011 1
1 5 490 2011 1
1 5 510 2012 0
1 5 525 2012 1
...
I'm trying to get something like:
SELECT * FROM items
WHERE ([some conditions], e.g. issue_date >= 2011 and listed=1)
AND state_id = 5
GROUP BY id
HAVING AVG(price) <= 500
ORDER BY price DESC
LIMIT 25
Essentially I want to grab a "group" of items whose average price fall under a certain threshold. I know that my above example "group by" and "having" are not correct since it's just going to give the AVG(price) of that one item, which doesn't really make sense. I'm just trying to illustrate my desired result.
The important thing here is I want all of the individual items in my result set, I don't just want to see one row with the average price, total, etc.
Currently I'm just doing the above query without the HAVING AVG(price) and adding up the individual items one-by-one (in ruby) until I reach the desired average. It would be really great if I could figure out how to do this in SQL. Using subqueries or something clever like joining the table onto itself are certainly acceptable solutions if they work well! Thanks!
UPDATE: In response to Tudor's answer below, here are some clarifications. There is always going to be a target quantity in addition to the target average. And we would always sort the results by price low to high, and by date.
So if we did have 10 items that were all priced at $5 and we wanted to find 5 items with an average < $6, we'd simply return the first 5 items. We wouldn't return the first one only, and we wouldn't return the first 3 grouped with the last 2. That's essentially how my code in ruby is working right now.
I would do almost an inverse of what Jasper provided... Start your query with your criteria to explicitly limit the few items that MAY qualify instead of getting all items and running a sub-select on each entry. Could pose as a larger performance hit... could be wrong, but here's my offering..
select
i2.*
from
( SELECT i.id
FROM items i
WHERE
i.issue_date > 2011
AND i.listed = 1
AND i.state_id = 5
GROUP BY
i.id
HAVING
AVG( i.price) <= 500 ) PreQualify
JOIN items i2
on PreQualify.id = i2.id
AND i2.issue_date > 2011
AND i2.listed = 1
AND i2.state_id = 5
order by
i2.price desc
limit
25
Not sure of the order by, especially if you wanted grouping by item... In addition, I would ensure an index on (state_id, Listed, id, issue_date)
CLARIFICATION per comments
I think I AM correct on it. Don't confuse "HAVING" clause with "WHERE". WHERE says DO or DONT include based on certain conditions. HAVING means after all the where clauses and grouping is done, the result set will "POTENTIALLY" accept the answer. THEN the HAVING is checked, and if IT STILL qualifies, includes in the result set, otherwise throws it out. Try the following from the INNER query alone... Do once WITHOUT the HAVING clause, then again WITH the HAVING clause...
SELECT i.id, avg( i.price )
FROM items i
WHERE i.issue_date > 2011
AND i.listed = 1
AND i.state_id = 5
GROUP BY
i.id
HAVING
AVG( i.price) <= 500
As you get more into writing queries, try the parts individually to see what you are getting vs what you are thinking... You'll find how / why certain things work. In addition, you are now talking in your updated question about getting multiple IDs and prices at apparent low and high range... yet you are also applying a limit. If you had 20 items, and each had 10 qualifying records, your limit of 25 would show all of the first item and 5 into the second... which is NOT what I think you want... you may want 25 of each qualified "id". That would wrap this query into yet another level...
What MySQL does makes perfectly sense. What you want to do does not make sense:
if you have let's say 4 items, each with price of 5 and you put HAVING AVERAGE <= 7 what you say is that the query should return ALL the permutations, like:
{1} - since item with id 1, can be a group by itself
{1,2}
{1,3}
{1,4}
{1,2,3}
{1,2,4}
...
and so on?
Your algorithm of computing the average in ruby is also not valid, if you have items with values 5, 1, 7, 10 - and seek for an average value of less than 7, element with value 10 can be returned just in a group with element of value 1. But, by your algorithm (if I understood correctly), element with value 1 is returned in the first group.
Update
What you want is something like the Knapsack problem and your approach is using some kind of Greedy Algorithm to solve it. I don't think there are straight, easy and correct ways to implement that in SQL.
After a google search, I found this article which tries to solve the knapsack problem with AI written in SQL.
By considering your item price as a weight, having the number of items and the desired average, you could compute the maximum value that can be entered in the 'knapsack' by multiplying desired_cost with number_of_items
I'm not entirely sure from your question, but I think this is a solution to your problem:
SELECT * FROM items
WHERE (some "conditions", e.g. issue_date > 2011 and listed=1)
AND state_id = 5
AND id IN (SELECT id
FROM items
GROUP BY id
HAVING AVG(price) <= 500)
ORDER BY price DESC
LIMIT 25
note: This is off the top of my head and I haven't done complex SQL in a while, so it might be wrong. I think this or something like it should work, though.