How to select a random row with a group by clause? - mysql

I have the following table
SQLFiddle
What I'm attempting to do is to select three random images but to make sure that no two images have the same object, what I attempted to do is to do a GROUP BY along with an ORDER BY rand() but that is failing as it is always giving me cat1.jpg, dog1.jpg, box1.jpg (All images whose path ends with 1 and not the others)
The fiddle includes the query I ran and how it is not working.

What you need is a Random aggregate function. Usually there are no such functions in the current RDBMSs.
Similar question has been asked.
So the basic idea is shuffle the elements, then group by, and then for every group just select the first row for every group. If we modify one of answers provided on the link we get this.
select object_id, name, image_path
from
(SELECT images.image_path AS image_path, objects.id AS object_id, objects.name
FROM objects LEFT JOIN images ON images.object_id = objects.id
ORDER BY RAND()) as z
group by z.object_id, z.name

You can't get a random image as MySQL always returns that data based on the time of insert (first come, first serve), i.e. internal order.
But you can get a random result using following approach (fiddle):
SELECT images.image_path AS image_path, objects.name
FROM objects
LEFT JOIN
(
SELECT object_id,
SUBSTRING_INDEX(GROUP_CONCAT(image_path order by rand()), ',', 1) AS image_path
FROM images
GROUP BY object_id
) as images
ON images.object_id = objects.id
GROUP BY objects.name
If there's a restrictive WHERE-condition on the objects table you might get a better performance when you join first and the GROUP_CONCAT.

I think this should do:
ORDER BY random()
LIMIT 1

Related

MySQL GROUP BY ignores the ORDER BY and always returns the 1st row

I have read through tons of similar questions and none is answering what is wrong with mine.
I want to select the entire row that includes the maximum value of one of the columns for each group.
SELECT * FROM (
SELECT t1.* FROM `t1` JOIN `t2` ON t2.id=t1.raceId ORDER BY t1.points DESC
) AS new GROUP BY new.athleteId ORDER BY new.points DESC
This works, giving me a single row for each athlete, but the row it shows is just the earliest row in the DB, not the row with the maximum points.
The sub query alone shows all the rows in the correct order, but when I try to group them, it still takes the earliest row and ignores the ordering.
I can retrieve the maximum points for each grouping, but the rest of the row info still comes from the earliest entry.
The GROUP BY clause is meant to be used with aggeregate functions.
What is it that you are trying to achieve with the GROUP BY?
Maybe one way to achieve what you're after..
As a general rule of thumb; it's wise if you're using a "GROUP BY" to define what aggregate functions to use. MySQL allows you to group by without aggerate functions defined but i've found this to be very confusing whiteout being very specific on what I want to aggregate on. Maybe it's because of my background in SQL server and oracle; which DO NOT allow you to use a group by this way...
essentially get the max points for each athlete then join back to your entire data set to limit by that athlete and points. may need to do it by race if you want athlete by race as well, i'm unsure if you want max athlete points by race, but based on the group by/order by I'm guessing not.
SELECT t1.*, t2.*
FROM (SELECT athlete, max(t1.points)
FROM `t1`
INNER JOIN `t2` ON t2.id=t1.raceId
GROUP BY athlete) new
INNER JOIN `t1` on T1.athletID = new.athletID
and t1.points = new.points
INNER JOIN JOIN `t2` ON t2.id=t1.raceId
ORDER BY new.points DESC
Another way depending on version of mySQL would be to use analytic functions along with aggregate functions... but w/o version number, i'll not go into detail.

How to improve performance getting recent records to display in list, recent top 5 most

I'm making a sample recent screen that will display a list, it displays the list, with id set as primary key.
I have done the correct query as expected but the table with big amount of data can cause slow performance issues.
This is the sample query below:
SELECT distinct H.id -- (Primary Key),
H.partnerid as PartnerId,
H.partnername AS partner, H.accountname AS accountName,
H.accountid as AccountNo,
FROM myschema.mytransactionstable H
INNER JOIN (
SELECT S.accountid, S.partnerid, S.accountname,
max(S.transdate) AS maxDate
from myschema.mytransactionstable S
group by S.accountid, S.partnerid, S.accountname
) ms ON H.accountid = ms.accountid
AND H.partnerid = ms.partnerid
AND H.accountname =ms.accountname
AND H.transdate = maxDate
WHERE H.accountid = ms.accountid
AND H.partnerid = ms.partnerid
AND H.accountname = ms.accountname
AND H.transdate = maxDate
GROUP BY H.partnerid,H.accountid, H.accountname
ORDER BY H.id DESC
LIMIT 5
In my case, there are values which are similar in the selected columns but differ only in their id's
Below is a link to an image without executing the query above. They are all the records that have not yet been filtered.
Sample result query click here
Since I only want to get the 5 most recent by their id but the other columns can contain similar values
accountname,accountid,partnerid.
I already got the correct query but,
I want to improve the performance of the query. Any suggestions for the improvement of query?
You can try using row_number()
select * from
(
select *,row_number() over(order by transdate desc) as rn
from myschema.mytransactionstable
)A where rn<=5
Don't repeat ON and WHERE clauses. Use ON to say how the tables (or subqueries) are "related"; use WHERE for filtering (that is, which rows to keep). Probably in your case, all the WHERE should be removed.
Please provide SHOW CREATE TABLE
This 'composite' index would probably help because of dealing with the subquery and the JOIN:
INDEX(partnerid, accountid, accountname, transdate)
That would also avoid a separate sort for the GROUP BY.
But then the ORDER BY is different, so it cannot avoid a sort.
This might avoid the sort without changing the result set ordering: ORDER BY partnerid, accountid, accountname, transdate DESC
Please provide EXPLAIN SELECT ... and EXPLAIN FORMAT=JSON SELECT ... if you have further questions.
If we cannot get an index to handle the WHERE, GROUP BY, and ORDER BY, the query will generate all the rows before seeing the LIMIT 5. If the index does work, then the outer query will stop after 5 -- potentially a big savings.

Mysql DISTINCT with more than one column (remove duplicates)

My database is called: (training_session)
I try to print out some information from my data, but I do not want to have any duplicates. I do get it somehow, may someone tell me what I do wrong?
SELECT DISTINCT athlete_id AND duration FROM training_session
SELECT DISTINCT athlete_id, duration FROM training_session
It works perfectly if i use only one column, but when I add another. it does not work.
I think you misunderstood the use of DISTINCT.
There is big difference between using DISTINCT and GROUP BY.
Both have some sort of goal, but they have different purpose.
You use DISTINCT if you want to show a series of columns and never repeat. That means you dont care about calculations or group function aggregates. DISTINCT will show different RESULTS if you keep adding more columns in your SELECT (if the table has many columns)
You use GROUP BY if you want to show "distinctively" on a certain selected columns and you use group function to calculate the data related to it. Therefore you use GROUP BY if you want to use group functions.
Please check group functions you can use in this link.
https://dev.mysql.com/doc/refman/8.0/en/group-by-functions.html
EDIT 1:
It seems like you are trying to get the "latest" of a certain athlete, I'll assume the current scenario if there is no ID.
Here is my alternate solution:
SELECT a.athlete_id ,
( SELECT b.duration
FROM training_session as b
WHERE b.athlete_id = a.athlete_id -- connect
ORDER BY [latest column to sort] DESC
LIMIT 1
) last_duration
FROM training_session as a
GROUP BY a.athlete_id
ORDER BY a.athlete_id
This syntax is called IN-SELECT subquery. With the help of LIMIT 1, it shows the topmost record. In-select subquery must have 1 record to return or else it shows error.
MySQL's DISTINCT clause is used to filter out duplicate recordsets.
If your query was SELECT DISTINCT athlete_id FROM training_session then your output would be:
athlete_id
----------
1
2
3
4
5
6
As soon as you add another column to your query (in your example, the column called duration) then each record resulting from your query are unique, hence the results you're getting. In other words the query is working correctly.

access get top and last rows

let's say I have a table CData with the columns CName, Amount1, Amount2.
Now I want to use a query to get calculate the difference between Amount1 and Amount2 for each distinct CName and, as a result of the query, get the ~1000 rows with the biggest difference and the 1000~ rows with the smallest (or most negative) difference. It doesn't matter if the results come in one table or two.
1) I am aware of the function TOP and so I could do this with two queries and sort by Difference (once ascending, once descending). Is there a way to do this in one query, though? This would save some time.
2) General question: When I define a field in my query (in this example "Difference"), can I somehow use it to, for example, sort the data by it? Like this (well, it's not working, but to give you an idea of what I mean):
SELECT CData.CName, CData.Amount2-CData.Amount1 AS Difference
FROM CData
GROUP BY CData.CName
ORDER BY Difference
Or do I always have to do the following:
...
ORDER BY CData.Amount2-CData.Amount1
Not much of a difference in this example, I just wanted to know if that's possible in general.
Sort the first time ASC (Ascending) and the second time DESC (Descending)
SELECT TOP 1000
CData.CName,
CData.Amount2 - CData.Amount1 AS Difference
FROM CData
GROUP BY CData.CName
ORDER BY CData.Amount2 - CData.Amount1 ASC
SELECT TOP 1000
CData.CName,
CData.Amount2 - CData.Amount1 AS Difference
FROM CData
GROUP BY CData.CName
ORDER BY CData.Amount2 - CData.Amount1 DESC
which aggregate functino do you want to perform for your differences? Avg? Sum?
SELECT CName, avg(Amount2-Amount1) AS Difference
FROM CData
GROUP BY CName
btw, to do it in 'one' query, you could use a union query on two subqueries, one with the TOP 1000 asc, one with the TOP 1000 desc
looks like Access is not allowing you to use an alias in the ORDER BY Clause, if you use the QBE grid you can change the format from the UI to SQL and it repeats the calculation in the ORDER BYclause.
Hi, John.
Check out the SO tour for instructions on how to use options such as formatting code.
Not sure if this will work for you, but you can try something like:
select * from
(SELECT TOP 3
CName, Date_Sale, Sum(Amount) AS SumA, 99999-Sum(Amount) as srt
FROM
Data
GROUP BY
CName, Date_Sale
UNION
SELECT TOP 3
CName, Date_Sale, Sum(Amount) AS SumA, Sum(Amount) as srt
FROM
Data
GROUP BY
CName, Date_Sale) u
order by
srt

sql where condition for getting the first and second maxium values

I am working on SQL and I have the following problem:
select * from(
select tname,teacher.tid,grade from teacher
inner join
_view
on(_view.tid=teacher.tid))as D
group by grade
where // what should I do here to get the rows having the first and the second maxium values?
order by grade desc,tid;
I want to select only the rows that have the first maxium value and the second maxium value
, I have tried a lot of thing since yesterday but no benfits from that!!
when I use some thing like MAX,COUNT or AND I get an ERROR of aggregate function, plaese help me with that because I did all I could !!
I believe that you can do:
select tname,teacher.tid,grade
from teacher
inner join _view on _view.tid=teacher.tid
order by grade desc,tid
limit 2;
LIMIT 2 gets you the two first rows of the list you just got from the SELECT. Since you have order by grade desc, the records with two highest grades are going to be returned.
From the docs:
The LIMIT clause can be used to constrain the number of rows returned
by the SELECT statement. LIMIT takes one or two numeric arguments,
which must both be nonnegative integer constants (except when using
prepared statements).
You were also doing a derived query, but i can't see why you would need it if you are not doing anything with it. And the GROUP BY shouldn't be necessary.
Try:
ORDER BY grade DESC LIMIT 2
ok after too much thinking I got this to work right and smooth, more over TOP would not work just LIMIT in the end of the query , here is my answer:
select * from(
select tname,teacher.tid,grade from teacher
inner join
_view
on(_view.tid=teacher.tid)
)as D
where grade in(select grade from _view order by grade desc limit 2)
order by grade desc,tid;
thanks everybody for your collaboration.