MYSQL - Query to extract all columns from the top N distinct elements - mysql

I have design an event where you register multiple fishes and I wanted a query to extract the top 3 heaviest fishes from different people. In case of tie, it should be decided by a third parameter: who registered it first. I've tested several ways I found here on stack overflow but none of them worked the way I needed.
My schema is the following:
id | playerid | playername | itemid | weight | date | received | isCurrent
Where:
id = PK, AUTO_INCREMENT - it's basically an index
playerid = the unique code of the person who registered the fish
playername = name of the person who registered the fish
itemid = the code of the fish
weight = the weight of the fish
date = pre-defined as CURRENT_TIMESTAMP, the exact time the fish was registered
received = pre-defined as 0, it really don't matter for this analysis
isCurrent = pre-defined as 1, basically every time this event runs it updates this field to 0, meaning the registers don't belong to the current version of the event.
Here you can see the data I'm testing with
my problem is: How to avoid counting the same playerid for this rank more than once?
Query 1:
SELECT `playerid`, `playername`, `itemid`, `weight`
FROM `event_fishing`
WHERE `isCurrent` = 1 AND `weight` IN (
SELECT * FROM
(SELECT MAX(`weight`) as `fishWeight`
FROM `event_fishing`
WHERE `isCurrent` = 1
GROUP BY `playerid`
LIMIT 3) as t)
ORDER BY `weight` DESC, `date` ASC
LIMIT 3
Query 2:
SELECT * FROM `event_fishing`
INNER JOIN
(SELECT playerid, MAX(`weight`) as `fishWeight`
FROM `event_fishing`
WHERE `isCurrent` = 1
GROUP BY `playerid`
LIMIT 3) as t
ON t.playerid = `event_fishing`.playerid AND t.fishWeight = `event_fishing`.weight
WHERE `isCurrent` = 1
ORDER BY weight DESC, date ASC
LIMIT 3
Keep in mind that I must return at least the fields: playerid, playername, itemid, weight, that the version of the event must be the actual (isCurrent = 1), one playerid per line with the heaviest weight he registered for this version of the event and the date is registered.
Expected output for the data I've sent:
id |playerid|playername|itemid|weight| date |received| isCurrent
7 | 3734 |Mago Xxx | 7963 | 1850 | 2018-07-26 00:17:41 | 0 | 1
14 | 228 |Night Wolf| 7963 | 1750 | 2018-07-26 19:45:49 | 0 | 1
8 | 3646 |Test Spell| 7159 | 1690 | 2018-07-26 01:16:51 | 0 | 1
Output I'm getting (with both queries):
playerid|playername|itemid|weight
3734 |Mago Xxx | 7963 | 1850
228 |Night Wolf| 7963 | 1750
228 |Night Wolf| 7963 | 1750
Thank you for the attention.
EDIT: I've followed How can I SELECT rows with MAX(Column value), DISTINCT by another column in SQL? since my query is very similar to the accepted answer, in the comments I've found something that at a first glance seem to have solved my problem but I've found a case where the accepted answer fail. Check http://sqlfiddle.com/#!9/72aeef/1
If you take a look at data you'll notice that the id 14 was the first input of 1750 and therefore should be second place, but the MAX(id) returns the last input of the same playerid and therefore give us a wrong result.
Despite the problems seems alike, mine has a greater complexity and therefore the queries that were suggested doesn't work
EDIT 2:
I've managed to solve my problem with the following query:
http://sqlfiddle.com/#!9/d711c7/6
But I'll leave this question open because of two things:
1- I don't know if there's a case where this query might fail
2- Despite we limit a lot the first query, I still think this can be more optimized, so I'll leave it open to any one that might know a better way to solve the issue.

Related

Mysql-> Group after rand()

I have the following table in Mysql
Name Age Group
abel 7 A
joe 6 A
Rick 7 A
Diana 5 B
Billy 6 B
Pat 5 B
I want to randomize the rows, but they should still remain grouped by the Group column.
For exmaple i want my result to look something like this.
Name Age Group
joe 6 A
abel 7 A
Rick 7 A
Billy 6 B
Pat 5 B
Diana 5 B
What query should i use to get this result? The entire table should be randomised and then grouped by "Group" column.
What you describe in your question as GROUPing is more correctly described as sorting. This is a particular issue when talking about SQL databases where "GROUP" means something quite different and determines the scope of aggregation operations.
Indeed "group" is a reserved word in SQL, so although mysql and some other SQL databases can work around this, it is a poor choice as an attribute name.
SELECT *
FROM yourtable
ORDER BY `group`
Using random values also has a lot of semantic confusion. A truly random number would have a different value every time it is retrieved - which would make any sorting impossible (and databases do a lot of sorting which is normally invisible to the user). As long as the implementation uses a finite time algorithm such as quicksort that shouldn't be a problem - but a bubble sort would never finish, and a merge sort could get very confused.
There are also degrees of randomness. There are different algorithms for generating random numbers. For encryption it's critical than the random numbers be evenly distributed and completely unpredictable - often these will use hardware events (sometimes even dedicated hardware) but I don't expect you would need that. But do you want the ordering to be repeatable across invocations?
SELECT *
FROM yourtable
ORDER BY `group`, RAND()
...will give different results each time.
OTOH
SELECT
FROM yourtable
ORDER BY `group`, MD5(CONCAT(age, name, `group`))
...would give the results always sorted in the same order. While
SELECT
FROM yourtable
ORDER BY `group`, MD5(CONCAT(DATE(), age, name, `group`))
...will give different results on different days.
DROP TABLE my_table;
CREATE TABLE my_table
(name VARCHAR(12) NOT NULL
,age INT NOT NULL
,my_group CHAR(1) NOT NULL
);
INSERT INTO my_table VALUES
('Abel',7,'A'),
('Joe',6,'A'),
('Rick',7,'A'),
('Diana',5,'B'),
('Billy',6,'B'),
('Pat',5,'B');
SELECT * FROM my_table ORDER BY my_group,RAND();
+-------+-----+----------+
| name | age | my_group |
+-------+-----+----------+
| Joe | 6 | A |
| Abel | 7 | A |
| Rick | 7 | A |
| Pat | 5 | B |
| Diana | 5 | B |
| Billy | 6 | B |
+-------+-----+----------+
Do the random first then sort by column group.
select Name, Age, Group
from (
select *
FROM yourtable
order by RAND()
) t
order by Group
Try this:
SELECT * FROM table order by Group,rand()

update rate for unique productId by each userID

I'm going to implement a method on my own SQL. I have two tables in MySQL. Suppose that each row is updated in the FirstTable and the values of the rate and countView are variable, I'm trying to update them with the same command:
UPDATE FirstTable SET `countView`= `countView`+1,
`rate`=('$MyRate' + (`countView`-1)*`rate`)/`countView`
WHERE `productId`='$productId'
FirstTable:
productId | countView | rate | other column |
------------+-----------+------+-------------------+---
21 | 12 | 4 | anything |
------------+-----------+------+-------------------+---
22 | 18 | 3 | anything |
------------+-----------+------+-------------------+---
But in this way, a user can vote every time he wants to. So I tried to create a table with two columns productId and userID. Like below:
SecondTable:
productId | userID |
------------+---------------|
21 | 100001 |
------------+---------------|
22 | 100002 |
------------+---------------|
21 | 100001 |
------------+---------------|
21 | 100003 |
------------+---------------|
Now, as in the example given in the SecondTable, a user has given to a productId two vote. So I don't want both of these votes to be recorded.
Problems with this method:
The value of the counter is added to each vote.
I can not properly link the SecondTable and FirstTable to manage the update of the FirstTable.
Of course, this question may not be completely new, but I searched a lot to get the right answer. One of the questions from this site came through this method. Using this method, you can manage the update of a table. This method is as follows:
UPDATE `FirstTable` SET `countView`= `countView`+1,
`rate`=('$MyRate' + (`countView`-1)*`rate`)/`countView`
WHERE `productId`='$productId' IN ( SELECT DISTINCT productId, userID
FROM SecondTable)
But the next problem is that even when I use this command, I encounter the following error:
1241 - Operand should contain 1 column(s)
So thank you so much if you can guide me. And I'm sure my question is not duplicate... thank you again.
This fixes your specific syntax problem:
UPDATE FirstTable
SET countView = countView + 1,
rate = ($MyRate + (countView - 1) * rate) / countView
WHERE productId = $productId AND
productId IN (SELECT t2.productId FROM SecondTable t2);
But if two different users vote on the same product, FirstTable will be updated only once. It is unclear if that is intentional behavior or not.
Note that SELECT DISTINCT is not needed in the subquery.
The error is being generated because you can't return 2 fields in an "in" statement. You'll want to use group by:
Try:
IN ( SELECT DISTINCT productId FROM rating group by product, UserID)
Here's documentation to look over for mysql group by if you want: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html

Fewest grouped by distinct - SQL

Ok, I think the answer of this is somewhere but I can't find it...
(and even my title is bad)
To be short, I want to get the fewest number of group I can make from a part of an association table
1st, Keep in mind this is already a result of a 5 table (+1k line) join with filter and grouping, that I'll have to run many time on a prod server as powerful as a banana...
2nd, This is a fake case that picture you my problem
After some Querying, I've got this data result :
+--------------------+
|id_course|id_teacher|
+--------------------+
| 6 | 1 |
| 6 | 4 |
| 6 | 14 |
| 33 | 1 |
| 33 | 4 |
| 34 | 1 |
| 34 | 4 |
| 34 | 10 |
+--------------------+
As you can see, I've got 3 courses, witch are teach by up to 3 teacher. I need to attend at one of every course, but I want as few different teacher as possible (I'm shy...).
My first query
Should answer : what is the smallest number of teacher I need to cover every unique course ?
With this data, it's a 1, cause Teacher 1 or Teacher 4 make courses for these 3 one.
Second query
Now that I've already get these courses, I want to go to two other courses, the 32 and the 50, with this schedule :
+--------------------+
|id_course|id_teacher|
+--------------------+
| 32 | 1 |
| 32 | 12 |
| 50 | 12 |
+--------------------+
My question is : For id_course N, will I have to get one more teacher ?
I want to check course by course, so "check for course 32", no need to check many at the same time
The best way I think is to count an inner join with a list of teacher of same fewest rank from the first query, so with our data we got only two : Teacher(1, 4).
For the Course 32, Teacher2 don't do this one, but as the Teacher1 do Courses(6, 33, 34, 32) I don't have to get another teacher.
For the Course 50, the only teacher to do it is the Teacher12, so I'll not find a match in my choice of teacher, and I'll have to get one more (so two in total with these data)
Here is a base [SQLFiddle
Best regards, Blag
You want to get a distinct count of ID_Teachers with the least count then... get a distinct count and limit the results to 1 record.
So perhaps something like...
SELECT count(Distinct ID_Teacher), Group_concat(ID_Teacher) as TeachersIDs
FROM Table
WHERE ID_Course in ('Your List')
ORDER BY count(Distinct ID_Teacher) ASC Limit 1
However this will randomly select if a tie exists... so do you want to provide the option to select which group of teachers and classes should ties exist? Meaning there are multiple paths to fulfill all classes involving the same number of teachers... For example teachers A, B and A, C fulfill all required classes.... should both records return in the result or is 1 sufficient?
So I've finally found a way to do what I want !
For the first query, as my underlying real need was "is there a single teacher to do everything", I've lower a bit my expectation and go for this one (58 lines on my true case u_u") :
SELECT
(
SELECT count(s.id_teacher) nb
FROM t AS m
INNER JOIN t AS s
ON m.id_teacher = s.id_teacher
GROUP BY m.id_course, m.id_teacher
ORDER BY nb DESC
LIMIT 1
) AS nbMaxBySingleTeacher,
(
SELECT COUNT(DISTINCT id_course) nb
FROM t
) AS nbTotalCourseToDo
[SQLFiddle
And I get back two value that answer my question "is one teacher enough ?"
+--------------------------------------+
|nbMaxBySingleTeacher|nbTotalCourseToDo|
+--------------------------------------+
| 4 | 5 |
+--------------------------------------+
The 2nd query use the schedule of new course, and take the id of one I want to check. It should tell me if I need to get one more teacher, or if it's ok with my actual(s) one.
SELECT COUNT(*) nb
FROM (
SELECT
z.id_teacher
FROM z
WHERE
z.id_course = 50
) t1
WHERE
FIND_IN_SET(t1.id_teacher, (
SELECT GROUP_CONCAT(t2.id_teacher) lst
FROM (
SELECT DISTINCT COUNT(s.id_teacher) nb, m.id_teacher
FROM t AS m
INNER JOIN t AS s
ON m.id_teacher = s.id_teacher
GROUP BY m.id_course, m.id_teacher
ORDER BY nb DESC
) t2
GROUP BY t2.nb
ORDER BY nb DESC
LIMIT 1
));
[SQLFiddle
This tell me the number of teacher that are able to teach the courses I already have AND the new one I want. So if it's over zero, then I don't need a new teacher :
+--+
|nb|
+--+
|1 |
+--+

Return the query when count of a query is greater than a number?

I want to return all rows that have a certain value in a column and have more than 5 instances in which a number is that certain value. For example, I would like to return all rows of the condition in which if the value in the column M has the number 1 in it and there are 5 or more instances of M having the number 1 in it, then it will return all rows with that condition.
select *
from tab
where M = 1
group by id --ID is the primary key of the table
having count(M) > 5;
EDIT: Here is my table:
id | M | price
--------+-------------+-------
1 | | 100
2 | 1 | 50
3 | 1 | 30
4 | 2 | 20
5 | 2 | 10
6 | 3 | 20
7 | 1 | 1
8 | 1 | 1
9 | 1 | 1
10 | 1 | 1
11 | 1 | 1
Originally I just want to insert into a trigger so that if the number of M = 1's is greater than 5, then I want to create an exception. The query I asked for would be inserted into the trigger. END EDIT.
But my table is always empty. Can anyone help me out? Thanks!
Try this :
select *
from tab
where M in (select M from tab where M = 1 group by M having count(id) > 5);
SQL Fiddle Demo
please try
select *,count(M) from table where M=1 group by id having count(M)>5
Since you group on your PK (which seems a futile excercise), you are counting per ID, whicg will indeed always return 1.
As i explain after this code, this query is NOT good, it is NOT the answer, and i also explain WHY. Please do not expect this query to run correctly!
select *
from tab
where M = 1
group by M
having count(*) > 5;
Like this, you group on what you are counting, which makes a lot more sense. At the same time, this will have unexpected behaviour, as you are selecting all kinds of columns that are not in the group by or in any aggregate. I know mySQL is lenient on that, but I don;t even want to know what it will produce.
Try indeed a subquery along these lines:
select *
from tab
where M in
(SELECT M
from tab
group by M
having count(*) > 5)
I've built a SQLFiddle demo (i used 'Test' as table name out of habit) accomplishing this (I don't have a mySQL at hand now to test it).
-- Made up a structure for testing
CREATE TABLE Test (
id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY(id),
M int
);
SELECT id, M FROM tab
WHERE M IN (
SELECT M
FROM Test
WHERE M = 1
GROUP BY M
HAVING COUNT(M) > 5
)
The sub-query is a common "find the duplicates" kind of query, with the added condition of a specific value for the column M, also stating that there must be at least 5 dupes.
It will spit out a series of values of M which you can use to query the table against, ending with the rows you need.
You shouldn't use SELECT * , it's a bad practice in general: don't retrieve data you aren't actually using, and if you are using it then take the little time needed to type in a list of field, you'll likely see faster querying and on the other hand the code will be way more readable.

Query database in weekly interval

I have a database with a created_at column containing the datetime in Y-m-d H:i:s format.
The latest datetime entry is 2011-09-28 00:10:02.
I need the query to be relative to the latest datetime entry.
The first value in the query should be the latest datetime entry.
The second value in the query should be the entry closest to 7 days from the first value.
The third value should be the entry closest to 7 days from the second value.
REPEAT #3.
What I mean by "closest to 7 days from":
The following are dates, the interval I desire is a week, in seconds a week is 604800 seconds.
7 days from the first value is equal to 1316578202 (1317183002-604800)
the value closest to 1316578202 (7 days) is... 1316571974
unix timestamp | Y-m-d H:i:s
1317183002 | 2011-09-28 00:10:02 -> appear in query (first value)
1317101233 | 2011-09-27 01:27:13
1317009182 | 2011-09-25 23:53:02
1316916554 | 2011-09-24 22:09:14
1316836656 | 2011-09-23 23:57:36
1316745220 | 2011-09-22 22:33:40
1316659915 | 2011-09-21 22:51:55
1316571974 | 2011-09-20 22:26:14 -> closest to 7 days from 1317183002 (first value)
1316499187 | 2011-09-20 02:13:07
1316064243 | 2011-09-15 01:24:03
1315967707 | 2011-09-13 22:35:07 -> closest to 7 days from 1316571974 (second value)
1315881414 | 2011-09-12 22:36:54
1315794048 | 2011-09-11 22:20:48
1315715786 | 2011-09-11 00:36:26
1315622142 | 2011-09-09 22:35:42
I would really appreciate any help, I have not been able to do this via mysql and no online resources seem to deal with relative date manipulation such as this. I would like the query to be modular enough to be able to change the interval weekly, monthly, or yearly. Thanks in advance!
Answer #1 Reply:
SELECT
UNIX_TIMESTAMP(created_at)
AS unix_timestamp,
(
SELECT MIN(UNIX_TIMESTAMP(created_at))
FROM my_table
WHERE created_at >=
(
SELECT max(created_at) - 7
FROM my_table
)
)
AS `random_1`,
(
SELECT MIN(UNIX_TIMESTAMP(created_at))
FROM my_table
WHERE created_at >=
(
SELECT MAX(created_at) - 14
FROM my_table
)
)
AS `random_2`
FROM my_table
WHERE created_at =
(
SELECT MAX(created_at)
FROM my_table
)
Returns:
unix_timestamp | random_1 | random_2
1317183002 | 1317183002 | 1317183002
Answer #2 Reply:
RESULT SET:
This is the result set for a yearly interval:
id | created_at | period_index | period_timestamp
267 | 2010-09-27 22:57:05 | 0 | 1317183002
1 | 2009-12-10 15:08:00 | 1 | 1285554786
I desire this result:
id | created_at | period_index | period_timestamp
626 | 2011-09-28 00:10:02 | 0 | 0
267 | 2010-09-27 22:57:05 | 1 | 1317183002
I hope this makes more sense.
It's not exactly what you asked for, but the following example is pretty close....
Example 1:
select
floor(timestampdiff(SECOND, tbl.time, most_recent.time)/604800) as period_index,
unix_timestamp(max(tbl.time)) as period_timestamp
from
tbl
, (select max(time) as time from tbl) most_recent
group by period_index
gives results:
+--------------+------------------+
| period_index | period_timestamp |
+--------------+------------------+
| 0 | 1317183002 |
| 1 | 1316571974 |
| 2 | 1315967707 |
+--------------+------------------+
This breaks the dataset into groups based on "periods", where (in this example) each period is 7-days (604800 seconds) long. The period_timestamp that is returned for each period is the 'latest' (most recent) timestamp that falls within that period.
The period boundaries are all computed based on the most recent timestamp in the database, rather than computing each period's start and end time individually based on the timestamp of the period before it. The difference is subtle - your question requests the latter (iterative approach), but I'm hoping that the former (approach I've described here) will suffice for your needs, since SQL doesn't lend itself well to implementing iterative algorithms.
If you really do need to determine each period based on the timestamp in the previous period, then your best bet is going to be an iterative approach -- either using a programming language of your choice (like php), or by building a stored procedure that uses a cursor.
Edit #1
Here's the table structure for the above example.
CREATE TABLE `tbl` (
`id` int(10) unsigned NOT NULL auto_increment PRIMARY KEY,
`time` datetime NOT NULL
)
Edit #2
Ok, first: I've improved the original example query (see revised "Example 1" above). It still works the same way, and gives the same results, but it's cleaner, more efficient, and easier to understand.
Now... the query above is a group-by query, meaning it shows aggregate results for the "period" groups as I described above - not row-by-row results like a "normal" query. With a group-by query, you're limited to using aggregate columns only. Aggregate columns are those columns that are named in the group by clause, or that are computed by an aggregate function like MAX(time)). It is not possible to extract meaningful values for non-aggregate columns (like id) from within the projection of a group-by query.
Unfortunately, mysql doesn't generate an error when you try to do this. Instead, it just picks a value at random from within the grouped rows, and shows that value for the non-aggregate column in the grouped result. This is what's causing the odd behavior the OP reported when trying to use the code from Example #1.
Fortunately, this problem is fairly easy to solve. Just wrap another query around the group query, to select the row-by-row information you're interested in...
Example 2:
SELECT
entries.id,
entries.time,
periods.idx as period_index,
unix_timestamp(periods.time) as period_timestamp
FROM
tbl entries
JOIN
(select
floor(timestampdiff( SECOND, tbl.time, most_recent.time)/31536000) as idx,
max(tbl.time) as time
from
tbl
, (select max(time) as time from tbl) most_recent
group by idx
) periods
ON entries.time = periods.time
Result:
+-----+---------------------+--------------+------------------+
| id | time | period_index | period_timestamp |
+-----+---------------------+--------------+------------------+
| 598 | 2011-09-28 04:10:02 | 0 | 1317183002 |
| 996 | 2010-09-27 22:57:05 | 1 | 1285628225 |
+-----+---------------------+--------------+------------------+
Notes:
Example 2 uses a period length of 31536000 seconds (365-days). While Example 1 (above) uses a period of 604800 seconds (7-days). Other than that, the inner query in Example 2 is the same as the primary query shown in Example 1.
If a matching period_time belongs to more than one entry (i.e. two or more entries have the exact same time, and that time matches one of the selected period_time values), then the above query (Example 2) will include multiple rows for the given period timestamp (one for each match). Whatever code consumes this result set should be prepared to handle such an edge case.
It's also worth noting that these queries will perform much, much better if you define an index on your datetime column. For my example schema, that would look like this:
ALTER TABLE tbl ADD INDEX idx_time ( time )
If you're willing to go for the closest that is after the week is out then this'll work. You can extend it to work out the closest but it'll look so disgusting it's probably not worth it.
select unix_timestamp
, ( select min(unix_tstamp)
from my_table
where sql_tstamp >= ( select max(sql_tstamp) - 7
from my_table )
)
, ( select min(unix_tstamp)
from my_table
where sql_tstamp >= ( select max(sql_tstamp) - 14
from my_table )
)
from my_table
where sql_tstamp = ( select max(sql_tstamp)
from my_table )