I have a table with a million rows, how do i select the most common(the value which appears most in the table) value from a field?
You need to group by the interesting column and for each value, select the value itself and the number of rows in which it appears.
Then it's a matter of sorting (to put the most common value first) and limiting the results to only one row.
In query form:
SELECT column, COUNT(*) AS magnitude
FROM table
GROUP BY column
ORDER BY magnitude DESC
LIMIT 1
This thread should shed some light on your issue.
Basically, use COUNT() with a GROUP BY clause:
SELECT foo, COUNT(foo) AS fooCount
FROM table
GROUP BY foo
ORDER BY fooCount DESC
And to get only the first result (most common), add
LIMIT 1
To the end of your query.
In case you don't need to return the frequency of the most common value, you could use:
SELECT foo
FROM table
GROUP BY foo
ORDER BY COUNT(foo) DESC
LIMIT 1
This has the additional benefit of only returning one column and therefore working in subqueries.
Related
It is a very simple query. For every query, I get a different result. Similar things happen when I used TOP 1. I would like a random sub-sample and it works. But am I missing something? Why does it return a different value every time?
SELECT DISTINCT user_id FROM table1
where day_id>="2009-01-09" and day_id<"2011-02-16"
LIMIT 1;
There's no guarantee that you will get a random result with your query. It's quite likely you'll get the same result each time (although the actual result returned will be indeterminate). To guarantee that you get a random, unique user_id, you should SELECT a random value from the list of DISTINCT values:
SELECT user_id
FROM (SELECT DISTINCT user_id
FROM table1
WHERE day_id >= "2009-01-09" AND day_id < "2011-02-16"
) u
ORDER BY RAND()
LIMIT 1
SQL statements represent unordered sets, add order by clause such as
...
ORDER BY user_id
LIMIT 1
I have the query SELECT id, MAX(value) FROM table1 and it returns the correct value, but it takes the first id of the table instead of the one corresponding to the value returned (id is primary key).
I've already seen solutions, but they all needed a WHERE clause which i can't use in my case.
I believe what you're trying to do is return the id of the row with the max value. Is that right?
I'm curious why you can't use a WHERE clause?
But ok, using that constraint this can be solved. I'm going to assume that your table is unique on id (if not, you should really talk to whoever built it and ask why ?)
SELECT id, value
FROM table1
ORDER BY value DESC
LIMIT 1
This will sort your table, by value descending (greatest -> least), and then only show the first row (ie, the row with the largest "value").
If your table is not unique on id, you can still group by ID and get the same
SELECT id, max(value) as max_value
FROM table1
GROUP BY id
ORDER BY max_value DESC
LIMIT 1
First, to answer why your query is behaving in the way you observe: I suspect you are running without sql_mode = only_full_group_by as your query would likely generate an error otherwise. As you've noticed, this can lead to somewhat odd results.
If ONLY_FULL_GROUP_BY is disabled, a MySQL extension to the standard SQL use of GROUP BY permits the select list, HAVING condition, or ORDER BY list to refer to nonaggregated columns even if the columns are not functionally dependent on GROUP BY columns. This causes MySQL to accept the preceding query. In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are nondeterministic, which is probably not what you want.
In this case, since you have no GROUP BY clause, the entire table is effectively the group.
To get one id associated with the largest value in the table, you can select all the rows, order by the value (descending), and then just limit to the first result, no need for the aggregation operator (or a WHERE caluse):
SELECT id, value FROM table1 ORDER BY value DESC LIMIT 1
Note that if there are multiple ids with the (same) max value, this only returns one of them. In the comments, #RaymondNijland points out that this may give different results (for id, the value will always be the maximum) each time you run it, and you can make it more deterministic by ordering by id as well:
SELECT id, value FROM table1 ORDER BY value DESC, id ASC LIMIT 1
Likewise, if there are for some reason multiple values for the same ID, it will still return that ID if one of its rows happens to be the max value -- thankfully this doesn't apply in this case, as you mentioned that id is the primary key.
I think you forgot a group by clause :
SELECT id, MAX(value) FROM table1 GROUP BY id
EDIT : To answer your need you could do
SELECT id, MAX(value)
FROM table1
GROUP BY id
HAVING MAX(value) = (SELECT MAX(value) FROM table1)
This could give you multiple results if you have multiple ids with the max value. In this case you could add "LIMIT 1" to get only one result but that would be quite strange and random.
Suppose a SELECT query returns 10 rows. Is there any one line query such as this (which I tried but did not work) to select one random row from return results of a SELECT query -
select name from (select * from my_table where age > 10 ORDER BY age ASC AS rows)
ORDER BY RAND() LIMIT 1;
One approach is to do LIMIT RAND() and enclose this in another SELECT statement which does LIMIT 1.
Another approach is to add a new column to the table, initialize it with RAND(), and then select from it, ordering by the random column with LIMIT 1. You may even be able to do this on the fly, by JOINing your original table with another table consisting of a single column that takes values from RAND().
Your logic is correct, you just have the syntax wrong.
select name from (
select *
FROM my_table
where age > 10) AS rows
ORDER BY RAND()
LIMIT 1;
You were missing FROM in the subquery, and the alias for the subquery goes outside the parentheses.
DEMO
I am working on SQL and I have the following problem:
select * from(
select tname,teacher.tid,grade from teacher
inner join
_view
on(_view.tid=teacher.tid))as D
group by grade
where // what should I do here to get the rows having the first and the second maxium values?
order by grade desc,tid;
I want to select only the rows that have the first maxium value and the second maxium value
, I have tried a lot of thing since yesterday but no benfits from that!!
when I use some thing like MAX,COUNT or AND I get an ERROR of aggregate function, plaese help me with that because I did all I could !!
I believe that you can do:
select tname,teacher.tid,grade
from teacher
inner join _view on _view.tid=teacher.tid
order by grade desc,tid
limit 2;
LIMIT 2 gets you the two first rows of the list you just got from the SELECT. Since you have order by grade desc, the records with two highest grades are going to be returned.
From the docs:
The LIMIT clause can be used to constrain the number of rows returned
by the SELECT statement. LIMIT takes one or two numeric arguments,
which must both be nonnegative integer constants (except when using
prepared statements).
You were also doing a derived query, but i can't see why you would need it if you are not doing anything with it. And the GROUP BY shouldn't be necessary.
Try:
ORDER BY grade DESC LIMIT 2
ok after too much thinking I got this to work right and smooth, more over TOP would not work just LIMIT in the end of the query , here is my answer:
select * from(
select tname,teacher.tid,grade from teacher
inner join
_view
on(_view.tid=teacher.tid)
)as D
where grade in(select grade from _view order by grade desc limit 2)
order by grade desc,tid;
thanks everybody for your collaboration.
I have this part of a query :
GROUP BY trackid
ORDER BY MAX(history.date) DESC
And I see that, when I grouping, it returns the row with maximum date for each group. Why this behaviour? Order should works on the whole rows...not on the grouping ?!?!?
EDIT (Example)
This is my whole query :
SELECT tracklist.value, history.date
FROM history JOIN tracklist ON history.trackid=tracklist.trackid
ORDER BY history.date DESC
The result is :
tracklist3 2011-04-27 15:40:36
tracklist2 2011-04-27 13:15:43
tracklist2 2011-04-02 00:30:02
tracklist2 2011-04-01 14:20:12
tracklist1 2011-03-02 14:13:58
tracklist1 2011-03-01 12:11:50
As you can see, all line is correctly ordered by history.date.
Now I'd like to group them, keeping for each group the row with MAX history.date.
So the output should be :
tracklist3 2011-04-27 15:40:36
tracklist2 2011-04-27 13:15:43
tracklist1 2011-03-02 14:13:58
I see that :
GROUP BY trackid
ORDER BY MAX(history.date) DESC
works correctly, but I really don't understand why it works :) Order by is for the whole rows, not for the grouping....
When you say SELECT trackid, MAX(history.date) ... GROUP BY trackid ORDER BY MAX(history.date) DESC you are really saying: "Show me for each tracklist the most recent history entry and please show me the tracklist first whose most recent history entry is (overall) most recent."
The ORDER BY is applied after the grouping, (that's why it comes after the GROUP BY in the SQL).
Note that in your example, you have SELECT tracklist.value, history.date instead of SELECT tracklist.value, MAX(history.date). That is just wrong and unfortunately MySQL does not give a warning but it choses a random history.date at its discretion.
ORDER BY MAX(history.date) DESC is somewhat redundant if all you want to do is order the result set. Ordering applies to the result set.
Consider the results if you remove the ORDER BY clause. Without that, your query would only be grouping on the trackid column, so it wouldn't be valid--you would need to add an aggregate function to the date column or add the date column to the GROUP BY clause. By adding the aggregate function to the ORDER BY clause, you are essentially telling the SQL engine that for each group of trackid, get the maximum date. This seems to be what you want.
It seems you do not fully understand the GROUP BY statement. I would recommend looking up a tutorial on it.
But essentially, the GROUP BY statement combines a number of rows into one. The column names you GROUP BY determine how unique the new combined rows will be. SQL doesn't know how to handle all of the non-grouped columns since each new combined row will be pulling data from a number of rows that contain different values in these columns. That's why you use aggregate functions on the non-grouped columns in the SELECT statement. The aggregate function MAX() looks at all of the values in the history.date column for the rows that are being combined and returns only one of them. Additionally, the ORDER BY clause can only use columns that are being selected, that's why ORDER BY also must contain aggregate functions.