I've used a UNION to create an artificial row for "All Countries":
SELECT '0', 'all countries', '10000000000000' AS users
UNION
SELECT country_id, country_name, country_internet_users AS users
FROM countries
ORDER BY users DESC
LIMIT 10
The problem comes when I try to sort by country_internet_users.
MySQL doesn't seem to behave as expected. It doesn't sort at all. It wouldn't even run the query until I added the "AS users". But despite that it seems to just ignore the ORDER BY completely.
How do I order by a column, when I've added an artificial row using a UNION?
One option is to add a fourth column to each sub select which ranks the order in which each portion of the UNION appears. In your case, you want the first portion of the UNION to appear on top as a sort of table header. Then in the outer query, select only the three columns you really want to appear in your report.
SELECT t.country_id, t.country_name, t.users
FROM
(
SELECT '0' AS country_id, 'all countries' AS country_name, 10000000000000 AS users, 1 AS val
UNION ALL
SELECT country_id, country_name, country_internet_users AS users, 0 AS val
FROM countries
) AS t
ORDER BY t.val DESC, t.users DESC
This approach is one way to guarantee that the first half of the UNION will appear on top (assuming that is what you want). As #KubaWyrostek pointed out, if you can rely on the natural ordering of the ID values, then you don't need a subquery.
MySQL doesn't seem to behave as expected. It doesn't sort at all. It wouldn't even run the query until I added the "AS users".
Run the first query individually and you'll see that, because you didn't alias the expressions, MySQL uses the values as the names of the columns it produces in the result set. That's it, the first query produces the columns named 0, all countries and 10000000000000. Of course MySQL cannot sort the rows by column users since there is no users column in the result set.
After you alias the third expression as users, the first query returns the columns 0, all countries and users and now MySQL knows what column you want to use for sorting.
But despite that it seems to just ignore the ORDER BY completely.
No, it doesn't ignore the ORDER BY. I cannot tell for sure but I think that, because the value you generate for column users is a string, ORDER BY uses string comparison for values of the column users, even the values extracted from the table are integers (or are they also strings?).
If the type of column users.country_internet_users is a number type, all you have to do is to use a number in the generated row:
SELECT '0', 'all countries', 10000000000000 AS users
# ... the rest of the query here
If the type of column users.country_internet_users is a string type then you have to options:
cast the value to integer in the query; this makes the query run very slow;
change the type of the column to a number type (this is how you should have created it in the first place):
ALTER TABLE country
MODIFY country_internet_users INT UNSIGNED NOT NULL;
Related
My database is called: (training_session)
I try to print out some information from my data, but I do not want to have any duplicates. I do get it somehow, may someone tell me what I do wrong?
SELECT DISTINCT athlete_id AND duration FROM training_session
SELECT DISTINCT athlete_id, duration FROM training_session
It works perfectly if i use only one column, but when I add another. it does not work.
I think you misunderstood the use of DISTINCT.
There is big difference between using DISTINCT and GROUP BY.
Both have some sort of goal, but they have different purpose.
You use DISTINCT if you want to show a series of columns and never repeat. That means you dont care about calculations or group function aggregates. DISTINCT will show different RESULTS if you keep adding more columns in your SELECT (if the table has many columns)
You use GROUP BY if you want to show "distinctively" on a certain selected columns and you use group function to calculate the data related to it. Therefore you use GROUP BY if you want to use group functions.
Please check group functions you can use in this link.
https://dev.mysql.com/doc/refman/8.0/en/group-by-functions.html
EDIT 1:
It seems like you are trying to get the "latest" of a certain athlete, I'll assume the current scenario if there is no ID.
Here is my alternate solution:
SELECT a.athlete_id ,
( SELECT b.duration
FROM training_session as b
WHERE b.athlete_id = a.athlete_id -- connect
ORDER BY [latest column to sort] DESC
LIMIT 1
) last_duration
FROM training_session as a
GROUP BY a.athlete_id
ORDER BY a.athlete_id
This syntax is called IN-SELECT subquery. With the help of LIMIT 1, it shows the topmost record. In-select subquery must have 1 record to return or else it shows error.
MySQL's DISTINCT clause is used to filter out duplicate recordsets.
If your query was SELECT DISTINCT athlete_id FROM training_session then your output would be:
athlete_id
----------
1
2
3
4
5
6
As soon as you add another column to your query (in your example, the column called duration) then each record resulting from your query are unique, hence the results you're getting. In other words the query is working correctly.
I've been trying to learn MySQL, and I'm having some trouble creating a join query to not select duplicates.
Basically, here's where I'm at :
SELECT atable.phonenumber, btable.date
FROM btable
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
However, in my database, there is the possibility of having duplicate rows in column atable.phonenumber.
For example (added asterisks for clarity)
phonenumber | date
-------------|-----------
*555-681-2105 | 2015-08-12
555-425-5161 | 2015-08-15
331-484-7784 | 2015-08-17
*555-681-2105 | 2015-08-25
.. and so on.
I tried using SELECT DISTINCT but that doesn't work. I also was looking through other solutions which recommended GROUP BY, but that threw an error, most likely because of my WHERE clause and condition. Not really sure how I can easily accomplish this.
DISTINCT applies to the whole row being returned, essentially saying "I want only unique rows" - any row value may participate in making the row unique
You are getting phone numbers duplicated because you're only looking at the column in isolation. The database is looking at phone number and also date. The rows you posted have different dates, and these hence cause the rows to be different
I suggest you do as the commenter recommended and decide what you want to do with the dates. If you want the latest date for a phone number, do this:
SELECT atable.phonenumber, max(btable.date)
FROM battle
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
GROUP BY atable.phonenumber
When you write a query that uses grouping, you will get a set of rows where there is only one set of value combinations for anything that is in the group by list. In this case, only unique phone numbers. But, because you want other values as well (I.e. Date) you MUST use what's called an aggregate function, to specify what you want to do with all the various values that aren't part of the unique set. Sometimes it will be MAX or MIN, sometimes it will be SUM, COUNT, AVG and so on.
if you're familiar with hash tables or dictionaries from elsewhere in programming, this is what a group by is: it maps a set of values (a key) to a list of rows that have those key values, and then the aggregating function is applied to any of the values in the list associated with the key
The simple rule when using group by (and one that MySQL will do implicitly for you) is to write queries thus:
SELECT
List,
of,
columns,
you,
want,
in,
unique,
combination,
FN(List),
FN(of),
FN(columns),
FN(you),
FN(want),
FN(aggregating)
FROM table
GROUP BY
List,
of,
columns,
you,
want,
in,
unique,
combination
i.e. You can copy paste from your select list to your group list. MySQL does this implicitly for you if you don't do it (i.e. If you use one or more aggregate functions like max in your select list, but forget or omit the group by clause- it will take everything that isn't in an agggregate function and run the grouping as if you'd written it). Whether group by is hence largely redundant is often debated, but there do exist other things you can do with a group by, such as rollup, cube and grouping sets. Also you can group on a column, if that column is used in a deterministic function, without having to group on the result of he deterministic function. Whether there is any point to doing so is a debate for another time :)
You should add GROUP BY, and an aggregate to the date field, something like this:
SELECT atable.phonenumber, MAX(btable.date)
FROM btable
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
GROUP BY atable.phonenumber
This will return the maximum date, hat is the latest date...
Below is the data in my table:
TABLE:
abc-ac
abc-dc
aax-i
bcs-o-dc
ddd-o-poe-dc
I need to write a query which will display only the unique entries as a result:
abc-ac
aax-i
bcs-o-dc
ddd-o-poe-dc
So basically, since the first two entries start with "abc", it should be treated as one and displayed.
Thanks.
If you're not picky about which one of the two abc-* records that it shows you can use this:
SELECT f1 FROM mytable GROUP BY substring_index(f1, '-', 1)
SQLFiddle Here
That substring_index() function will split the value in your field by - and return the first bit. So essentially your records get grouped by only the first part. This is one of the few times that we can take advantage of MySQLs strange GROUP BY behavior where it will allow you to leave out non-aggregated fields from the group by.
According to another SO post (SQL: How to keep rows order with DISTINCT?), distinct has pretty undefined behavior as far as sorting.
I have a query:
select col_1 from table order by col_2
This can return values like
3
5
3
2
I need to then select a distinct on these that preserves ordering, meaning I want
select distinct(col_1) from table order by col_2
to return
3
5
2
but not
5
3
2
Here is what I am actually trying to do. Col_1 is a user id, and col_2 is a log in timestamp event by that user. So the same user (col_1) can have many login times. I am trying to build a historical list of users in which they were seen in the system. I would like to be able to say "our first user ever was, our second user ever was", and so on.
That post seems to suggest to use a group by, but group by is not meant to return an ordering of rows, so I do not see how or why this would be applicable here, since it does not appear group by will preserve any ordering. In fact, another SO post gives an example where group by will destroy the ordering I am looking for: see "Peter" in what is the difference between GROUP BY and ORDER BY in sql. Is there anyway to guarantee the latter result? The strange thing is, if I were implementing the DISTINCT clause, I would surely do the order by first, then take the results and do a linear scan of the list and preserve the ordering naturally, so I am not sure why the behavior is so undefined.
EDIT:
Thank you all! I have accepted IMSoP answer because not only was there an interative example that I could play around with (thanks for turning me on to SQL Fiddle), but they also explained why several things worked the way they worked, instead of simply "do this". Specifically, it was unclear that GROUP BY does not destroy (rather, keeps them in some sort of internal list) values in the other columns outside of the group by, and these values can still be examined in an ORDER BY clause.
This all has to do with the "logical ordering" of SQL statements. Although a DBMS might actually retrieve the data according to all sorts of clever strategies, it has to behave according to some predictable logic. As such, the different parts of an SQL query can be considered to be processed "before" or "after" one another in terms of how that logic behaves.
As it happens, the ORDER BY clause is the very last step in that logical sequence, so it can't change the behaviour of "earlier" steps.
If you use a GROUP BY, the rows have been bundled up into their groups by the time the SELECT clause is run, let alone the ORDER BY, so you can only look at columns which have been grouped by, or "aggregate" values calculated across all the values in a group. (MySQL implements a controversial extension to GROUP BY where you can mention a column in the SELECT that can't logically be there, and it will pick one from an arbitrary row in that group).
If you use a DISTINCT, it is logically processed after the SELECT, but the ORDER BY still comes afterwards. So only once the DISTINCT has thrown away the duplicates will the remaining results be put into a particular order - but the rows that have been thrown away can't be used to determine that order.
As for how to get the result you need, the key is to find a value to sort by which is valid after the GROUP BY/DISTINCT has (logically) been run. Remember that if you use a GROUP BY, any aggregated values are still valid - an aggregate function can look at all the values in a group. This includes MIN() and MAX(), which are ideal for ordering by, because "the lowest number" (MIN) is the same thing as "the first number if I sort them in ascending order", and vice versa for MAX.
So to order a set of distinct foo_number values based on the lowest applicable bar_number for each, you could use this:
SELECT foo_number
FROM some_table
GROUP BY foo_number
ORDER BY MIN(bar_number) ASC
Here's a live demo with some arbitrary data.
EDIT: In the comments, it was discussed why, if an ordering is applied before the grouping / de-duplication takes place, that order is not applied to the groups. If that were the case, you would still need a strategy for which row was kept in each group: the first, or the last.
As an analogy, picture the original set of rows as a set of playing cards picked from a deck, and then sorted by their face value, low to high. Now go through the sorted deck and deal them into a separate pile for each suit. Which card should "represent" each pile?
If you deal the cards face up, the cards showing at the end will be the ones with the highest face value (a "keep last" strategy); if you deal them face down and then flip each pile, you will reveal the lowest face value (a "keep first" strategy). Both are obeying the original order of the cards, and the instruction to "deal the cards based on suit" doesn't automatically tell the dealer (who represents the DBMS) which strategy was intended.
If the final piles of cards are the groups from a GROUP BY, then MIN() and MAX() represent picking up each pile and looking for the lowest or highest value, regardless of the order they are in. But because you can look inside the groups, you can do other things too, like adding up the total value of each pile (SUM) or how many cards there are (COUNT) etc, making GROUP BY much more powerful than an "ordered DISTINCT" could be.
I would go for something like
select col1
from (
select col1,
rank () over(order by col2) pos
from table
)
group by col1
order by min(pos)
In the subquery I calculate the position, then in the main query I do a group by on col1, using the smallest position to order.
Here the demo in SQLFiddle (this was Oracle, the MySql info was added later.
Edit for MySql:
select col1
from (
select col1 col1,
#curRank := #curRank + 1 AS pos
from table1, (select #curRank := 0) p
) sub
group by col1
order by min(pos)
And here the demo for MySql.
The GROUP BY in the referenced answer isn't attempting to perform an ordering... it is simply picking a single associated value for the column that we want to be distinct.
Like #bluefeet states, if you want a guaranteed ordering, you must use ORDER BY.
Why can't we specify a value in the ORDER BY that isn't included in the SELECT DISTINCT?
Consider the following values for col1 and col2:
create table yourTable (
col_1 int,
col_2 int
);
insert into yourTable (col_1, col_2) values (1, 1);
insert into yourTable (col_1, col_2) values (1, 3);
insert into yourTable (col_1, col_2) values (2, 2);
insert into yourTable (col_1, col_2) values (2, 4);
With this data, what should SELECT DISTINCT col_1 FROM yourTable ORDER BY col_2 return?
That's why you need the GROUP BY and the aggregate function, to decide which of the multiple values for col_2 you should order by... could be MIN(), could be MAX(), maybe even some other function such as AVG() would make sense in some cases; it all depends on the specific scenario, which is why you need to be explicit:
select col_1
from yourTable
group by col_1
order by min(col_2)
SQL Fiddle Here
For MySQL only, when you select columns that are not in the GROUP BY it will return columns from the first record in the group. You can use this behavior to select which record is returned from each group like this:
SELECT foo_number, bar_number
FROM
(
SELECT foo_number, bar_number
FROM some_table
ORDER BY bar_number
) AS t
GROUP BY foo_number
ORDER BY bar_number DESC;
This is more flexible because it allows you to order the records within each group using expressions that are not possible with aggregates - in my case I wanted to return the one with the shortest string in another column.
For completeness, my query looks like this:
SELECT
s.NamespaceId,
s.Symbol,
s.EntityName
FROM
(
SELECT
m.NamespaceId,
i.Symbol,
i.EntityName
FROM ImportedSymbols i
JOIN ExchangeMappings m ON i.ExchangeMappingId = m.ExchangeMappingId
WHERE
i.Symbol NOT IN
(
SELECT Symbol
FROM tmp_EntityNames
WHERE NamespaceId = m.NamespaceId
)
AND
i.EntityName IS NOT NULL
ORDER BY LENGTH(i.RawSymbol), i.RawSymbol
) AS s
GROUP BY s.NamespaceId, s.Symbol;
What this does is return a distinct list of symbols in each namespace, and for duplicated symbols returns the one with the shortest RawSymbol. When the RawSymbol lengths are the same, it returns the one who's RawSymbol comes first alphabetically.
I want to get the distinct value of a particular column however duplicity is not properly managed if more than 3 columns are selected.
The query is:
SELECT DISTINCT
ShoppingSessionId, userid
FROM
dbo.tbl_ShoppingCart
GROUP BY
ShoppingSessionId, userid
HAVING
userid = 7
This query produces correct result, but if we add another column then result is wrong.
Please help me as I want to use the ShoppingSessionId as a distinct, except when I want to use all the columns from the table, including with the where clause .
How can I do that?
The DISTINCT keyword applies to the entire row, never to a column.
Presently DISTINCT is not needed at all, because your script already makes sure that ShoppingSession is distinct: by specifying the column in GROUP BY and filtering on the other grouping column (userid).
When you add a third column to GROUP BY and it results in duplicated ShoppingSession, it means that some ShoppingSession values are associated with many different values of the added column.
If you want ShoppingSession to remain distinct after including that third column, you should decide which values of the the added column should be left in the output and which should be discarded. This is called aggregating. You could apply the MAX() function to that column, or MIN() or any other suitable aggregate function. Note that the column should not be included in GROUP BY in this case.
Here's an illustration of what I'm talking about:
SELECT
ShoppingSessionId,
userid,
MAX(YourThirdColumn) AS YourThirdColumn
FROM dbo.tbl_ShoppingCart
GROUP BY
ShoppingSessionId,
userid
HAVING userid = 7
There's one more note on your query. The HAVING clause is typically used for filtering on aggregated columns. If your filter does not involve aggregated columns, you'll be better off using the WHERE clause instead:
SELECT
ShoppingSessionId,
userid,
MAX(YourThirdColumn) AS YourThirdColumn
FROM dbo.tbl_ShoppingCart
WHERE userid = 7
GROUP BY
ShoppingSessionId,
userid
Although both queries would produce identical results, their efficiency would be different, because the first query would have to pull all rows, group/aggregate them, then discard all rows except userid = 7, but the second one would discard rows first and only then group/aggregate the remaining, which is much more efficient.
You could go even further and exclude the userid column from GROUP BY and pull its value with an aggregate function:
SELECT
ShoppingSessionId,
MAX(userid) AS userid,
MAX(YourThirdColumn) AS YourThirdColumn
FROM dbo.tbl_ShoppingCart
WHERE userid = 7
GROUP BY
ShoppingSessionId
Since all userid values in your output are supposed to contain 7 (because that's in your filter), you can just pick a maximum value per every ShoppingSession, knowing that it'll always be 7.