Is there a way to go through a FOR LOOP in a SELECT-query? (1)
I am asking because I do not know how to commit in a single
SELECT-query collection of some data from table t_2 for each row of
table t_1 (please, see UPDATE for an example). Yes, it's true that we can GROUP BY a UNIQUE INDEX but
what if it's not present? Or how to request all rows from t_1, each concatenated with a specific related row from t_2. So, it seems like in a Perfect World we would have to be able to loop through a table by a proper SQL-command (R). Maybe, ANY(...) will help?
Here I've tried to find maximal count of repetitions in column prop among all values of the column in table t.
I.e. I've tried to carry out something alike Pandas'
t.groupby(prop).max() in an SQL query (Q1):
SELECT Max(C) FROM (SELECT Count(t_1.prop) AS C
FROM t AS t_1
WHERE t_1.prop = ANY (SELECT prop
FROM t AS t_2));
But it only throws the error:
Every derived table must have its own alias.
I don't understand this error. Why does it happen? (2)
Yes, we can implement Pandas' value_counts(...) way easier
by using SELECT prop, COUNT() GROUP BY prop. But I wanted to do it in a "looping" way staying in a "single non-grouping SELECT-query mode" for reason (R).
This sub-query, which attempts to imitate Pandas' t.value_counts(...)) (Q2):
SELECT Count(t_1.prop) AS C FROM t AS t_1 WHERE t_1.prop = ANY(SELECT prop FROM t AS t_2)
results in 6, which is simply a number of rows in t. The result is logical. The ANY-clause simply returned TRUE for every row and once all rows had been gathered COUNT(...) returned simply the number of the gathered (i.e. all) rows.
By the way, it seems to me that in the "full" previous SELECT-query (Q1) should return that very 6.
So, the main question is how to loop in such a query? Is there such
an opportunity?
UPDATE
The answer to the question (2) is found here, thanks to
Luuk. I just assigned an alias to the (...) subquery in SELECT Max(C) FROM (...) AS sq and it worked out. And of course, I got 6. So, the question (1) is still unclear.
I've also tried to do an iteration this way (Q3):
SELECT (SELECT prop_2 FROM t_2 WHERE t_2.prop_1 = t_1.prop) AS isq FROM t_1;
Here in t_2 prop_2 is connected to prop_1 (a.k.a. prop in t_1) as many to one. So, along the course, our isq (inner select query) returns several (rows of) prop_2 values per each prop value in t_1.
And that is why (Q3) throws the error:
Subquery returns more than 1 row.
Again, logical. So, I couldn't create a loop in a single non-grouping SELECT-query.
This query will return the value for b with the highest count:
SELECT b, count(*)
FROM table1
GROUP BY b
ORDER BY count(*) DESC
LIMIT 1;
see: DBFIDDLE
EDIT: Without GROUP BY
SELECT b,C1
FROM (
SELECT
b,
ROW_NUMBER() OVER (PARTITION BY B ORDER BY A) C1,
ROW_NUMBER() OVER (PARTITION BY B ORDER BY A DESC) C2
FROM table1
) x
WHERE x.C2=1
see: DBFIDDLE
Related
I am trying to learn of a better way in achieving the desired result of a select query - details below, thank you in advance.
MySQL version: 5.7
Table:
id int(11)
product_number int(8)
service_group int (4)
datetime datetime
value int (6)
Indexes on all but value column.
MySql table has the following data:
id,product_number, service_group,datetime,value
1,1234,1,2022-02-10 00:00:00,0
2,1234,1,2022-02-10 00:01:30,25
3,1234,1,2022-02-10 00:02:30,11
4,1234,2,2022-02-10 01:00:30,0
5,1234,2,2022-02-10 01:01:30,65
6,1234,2,2022-02-10 01:02:30,55
In essence, the value for each product within the service group is wrongly recorded, and the correct value for the "current" row is actually recorded against the next row for the product within the same service group - correct output should look like this:
id,product_number, service_group,datetime,value
1,1234,1,2022-02-10 00:00:00,25
2,1234,1,2022-02-10 00:01:30,11
3,1234,1,2022-02-10 00:02:30,0
4,1234,2,2022-02-10 01:00:30,65
5,1234,2,2022-02-10 01:01:30,55
6,1234,2,2022-02-10 01:02:30,0
The below query is what seems to be hugely inefficient way of returning the correct results - what would be a better way to go about this in MySql? Thank you.
Select
a.id,
a.product_number,
a.service_group,
a.datetime,
(
Select b.value FROM products b
Where b.product_number=a.product_number AND b.service_group=a.service_group
AND b.datetime>a.datetime
Order by b.datetime ASC
Limit 1
)
FROM products a```
If there's no skipped id (the number is in sequence) then you could probably use simple select like below
1.
Select
a.id,
a.product_number,
a.service_group,
a.datetime,
(Select b.value FROM products b Where b.id = a.id+1)
FROM products a
Select
a.id,
a.product_number,
a.service_group,
a.datetime,
b.value
FROM products a
INNER JOIN products b ON b.id = a.id+1
Note that both SQL 1 and 2 is assuming your ID is primary key as I see that's an incrementing value
Either way you need to run an explain query so you could analyze which one is the most efficient one
And more importantly I suggest to update it if it's "wrongly recorded", you should put the your service on maintenance mode and do update+fix on the data using query
Edit:
based on your comment "Hi, Gunawan. Thank you for your suggestion. Unfortunately IDs will not be in sequences to support the proposed approach."
You could alter the subquery on (1) a bit to
Select b.value
FROM products b
Where b.id > a.id order by id asc limit 1
so it became
Select
a.id,
a.product_number,
a.service_group,
a.datetime,
(Select b.value FROM products b Where b.id > a.id order by b.id asc limit 1)
FROM products a
I think what you need in THIS case is the Windows LEAD() function and can be found Here for example and clarification
In summary, LEAD() looks at the NEXT possible record for the given column in question, LAG() looks at the prior.
So in this example, I am asking for the LEAD() of the record (the next one in line) and getting the VALUE column of that record. The 1 represents how many records to skip ahead, in this case 1. The last parameter 0 is what should be returned if no such record exists.
The second half of the clause with the ORDER BY clause identifies how you want the records returned, in this case the datetime sequence.
I included both the value and the NEXTVALUE so you could see the results, but can remove the extra column once you see and understand how it works.
But since you have each service group to its own, and dont want to carry-over the value from another, you need the PARTITION clause as well. So I added that as an additional column... both so you can see how the results work with OR without it and the impact of the query you need.
select
t.id,
t.product_number,
t.service_group,
t.datetime,
t.value,
-- lead (t.value, 1, 0)
-- over (order by t.datetime) as NextValue,
-- without the ,1, 0 in the lead call as sample above
-- you can see the NULLs are not getting the value
-- from the next row when the service group changes.
lead (t.value)
over ( PARTITION BY
t.service_group
order by
t.datetime) as NextValuePerServiceGroup
from
Tmp1 t
order by
t.service_group,
t.datetime
My database is called: (training_session)
I try to print out some information from my data, but I do not want to have any duplicates. I do get it somehow, may someone tell me what I do wrong?
SELECT DISTINCT athlete_id AND duration FROM training_session
SELECT DISTINCT athlete_id, duration FROM training_session
It works perfectly if i use only one column, but when I add another. it does not work.
I think you misunderstood the use of DISTINCT.
There is big difference between using DISTINCT and GROUP BY.
Both have some sort of goal, but they have different purpose.
You use DISTINCT if you want to show a series of columns and never repeat. That means you dont care about calculations or group function aggregates. DISTINCT will show different RESULTS if you keep adding more columns in your SELECT (if the table has many columns)
You use GROUP BY if you want to show "distinctively" on a certain selected columns and you use group function to calculate the data related to it. Therefore you use GROUP BY if you want to use group functions.
Please check group functions you can use in this link.
https://dev.mysql.com/doc/refman/8.0/en/group-by-functions.html
EDIT 1:
It seems like you are trying to get the "latest" of a certain athlete, I'll assume the current scenario if there is no ID.
Here is my alternate solution:
SELECT a.athlete_id ,
( SELECT b.duration
FROM training_session as b
WHERE b.athlete_id = a.athlete_id -- connect
ORDER BY [latest column to sort] DESC
LIMIT 1
) last_duration
FROM training_session as a
GROUP BY a.athlete_id
ORDER BY a.athlete_id
This syntax is called IN-SELECT subquery. With the help of LIMIT 1, it shows the topmost record. In-select subquery must have 1 record to return or else it shows error.
MySQL's DISTINCT clause is used to filter out duplicate recordsets.
If your query was SELECT DISTINCT athlete_id FROM training_session then your output would be:
athlete_id
----------
1
2
3
4
5
6
As soon as you add another column to your query (in your example, the column called duration) then each record resulting from your query are unique, hence the results you're getting. In other words the query is working correctly.
I'm trying to add features to a preexisting application and I came across a MySQL view something like this:
SELECT
AVG(table_name.col1),
AVG(table_name.col2),
AVG(table_name.col3),
table_name.personID,
table_name.col4
FROM table_name
GROUP BY table_name.personID;
OK so there's a few aggregate functions. You can select personID because you're grouping by it. But it also is selecting a column that is not in an aggregate function and is not a part of the GROUP BY clause. How is this possible??? Does it just pick a random value because the values definitely aren't unique per group?
Where I come from (MSSQL Server), that's an error. Can someone explain this behavior to me and why it's allowed in MySQL?
It's true that this feature permits some ambiguous queries, and silently returns a result set with an arbitrary value picked from that column. In practice, it tends to be the value from the row within the group that is physically stored first.
These queries aren't ambiguous if you only choose columns that are functionally dependent on the column(s) in the GROUP BY criteria. In other words, if there can be only one distinct value of the "ambiguous" column per value that defines the group, there's no problem. This query would be illegal in Microsoft SQL Server (and ANSI SQL), even though it cannot logically result in ambiguity:
SELECT AVG(table1.col1), table1.personID, persons.col4
FROM table1 JOIN persons ON (table1.personID = persons.id)
GROUP BY table1.personID;
Also, MySQL has an SQL mode to make it behave per the standard: ONLY_FULL_GROUP_BY
FWIW, SQLite also permits these ambiguous GROUP BY clauses, but it chooses the value from the last row in the group.†
† At least in the version I tested. What it means to be arbitrary is that either MySQL or SQLite could change their implementation in the future, and have some different behavior. You should therefore not rely on the behavior staying they way it is currently in ambiguous cases like this. It's better to rewrite your queries to be deterministic and not ambiguous. That's why MySQL 5.7 now enables ONLY_FULL_GROUP_BY by default.
I should have Googled for just a bit longer... It seems I found my answer.
MySQL extends the use of GROUP BY so
that you can use nonaggregated columns
or calculations in the SELECT list
that do not appear in the GROUP BY
clause. You can use this feature to
get better performance by avoiding
unnecessary column sorting and
grouping. For example, you do not need
to group on customer.name in the
following query
In standard SQL, you would have to add
customer.name to the GROUP BY clause.
In MySQL, the name is redundant.
Still, that just seems... wrong.
Let's say you have a query like this:
SELECT g, v
FROM t
GROUP BY g;
In this case, for each possible value for g, MySQL picks one of the corresponding values of v.
However, which one is chosen, depends on some circumstances.
I read somewhere that for each group of g, the first value of v is kept, in the order how the records were inserted into the table t.
This is quite ugly, because the records in a table should be treated as a set where the order of the elements should not matter. This is so "mysql-ish"...
If you want to determine which value for v to keep, you need to apply a subselect for t like this:
SELECT g, v
FROM (
SELECT *
FROM t
ORDER BY g, v DESC
) q
GROUP BY g;
This way you define which order the records of the subquery are processed by the external query, thus you can trust which value of v it will pick for the individual values of g.
However, if you need some WHERE conditions then be very careful. If you add the WHERE condition to the subquery then it will keep the behaviour, it will always return the value you expect:
SELECT g, v
FROM (
SELECT *
FROM t
WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
ORDER BY g, v DESC
) q
GROUP BY g;
This is what you expect, the subselect filters and orders the table. It keeps the records where g has the given value and the external query returns that g and the first value for v.
However, if you add the same WHERE condition to the outer query then you get a non-deterministic result:
SELECT g, v
FROM (
SELECT *
FROM t
-- WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
ORDER BY g, v DESC
) q
WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
GROUP BY g;
Surprisingly, you may get different values for v when executing the same query again and again which is... strange. The expected behaviour is to get all the records in the appropriate order from the subquery, filtering them in the outer query and then picking the same as it picked in the previous example. But it does not.
It picks a value for v seemingly randomly. The same query returned different values for v if I executed more (~20) times, but the distribution was not uniform.
If instead of adding an outer WHERE, you specify a HAVING condition like this:
SELECT g, v
FROM (
SELECT *
FROM t1
-- WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
ORDER BY g, v DESC
) q
-- WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
GROUP BY g
HAVING g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9';
Then you get a consistent behaviour again.
CONCLUSION
I would suggest not to rely on this technique at all. If you really want/need to then avoid WHERE conditions in the outer query. Use it in the inner query if you can or a HAVING clause in the outer query.
I tested it with this data:
CREATE TABLE t1 (
v INT,
g VARCHAR(36)
);
INSERT INTO t1 VALUES (1, '737a8783-110c-447e-b4c2-1cbb7c6b72c9');
INSERT INTO t1 VALUES (2, '737a8783-110c-447e-b4c2-1cbb7c6b72c9');
in MySQL 5.6.41.
Maybe it is just a bug that gets/got fixed in newer versions, please give feedback if you have experience with newer versions.
select * from personel where p_id IN(select
min(dbo.personel.p_id)
FROM
personel
GROUP BY dbo.personel.p_adi)
I am working on SQL and I have the following problem:
select * from(
select tname,teacher.tid,grade from teacher
inner join
_view
on(_view.tid=teacher.tid))as D
group by grade
where // what should I do here to get the rows having the first and the second maxium values?
order by grade desc,tid;
I want to select only the rows that have the first maxium value and the second maxium value
, I have tried a lot of thing since yesterday but no benfits from that!!
when I use some thing like MAX,COUNT or AND I get an ERROR of aggregate function, plaese help me with that because I did all I could !!
I believe that you can do:
select tname,teacher.tid,grade
from teacher
inner join _view on _view.tid=teacher.tid
order by grade desc,tid
limit 2;
LIMIT 2 gets you the two first rows of the list you just got from the SELECT. Since you have order by grade desc, the records with two highest grades are going to be returned.
From the docs:
The LIMIT clause can be used to constrain the number of rows returned
by the SELECT statement. LIMIT takes one or two numeric arguments,
which must both be nonnegative integer constants (except when using
prepared statements).
You were also doing a derived query, but i can't see why you would need it if you are not doing anything with it. And the GROUP BY shouldn't be necessary.
Try:
ORDER BY grade DESC LIMIT 2
ok after too much thinking I got this to work right and smooth, more over TOP would not work just LIMIT in the end of the query , here is my answer:
select * from(
select tname,teacher.tid,grade from teacher
inner join
_view
on(_view.tid=teacher.tid)
)as D
where grade in(select grade from _view order by grade desc limit 2)
order by grade desc,tid;
thanks everybody for your collaboration.
According to another SO post (SQL: How to keep rows order with DISTINCT?), distinct has pretty undefined behavior as far as sorting.
I have a query:
select col_1 from table order by col_2
This can return values like
3
5
3
2
I need to then select a distinct on these that preserves ordering, meaning I want
select distinct(col_1) from table order by col_2
to return
3
5
2
but not
5
3
2
Here is what I am actually trying to do. Col_1 is a user id, and col_2 is a log in timestamp event by that user. So the same user (col_1) can have many login times. I am trying to build a historical list of users in which they were seen in the system. I would like to be able to say "our first user ever was, our second user ever was", and so on.
That post seems to suggest to use a group by, but group by is not meant to return an ordering of rows, so I do not see how or why this would be applicable here, since it does not appear group by will preserve any ordering. In fact, another SO post gives an example where group by will destroy the ordering I am looking for: see "Peter" in what is the difference between GROUP BY and ORDER BY in sql. Is there anyway to guarantee the latter result? The strange thing is, if I were implementing the DISTINCT clause, I would surely do the order by first, then take the results and do a linear scan of the list and preserve the ordering naturally, so I am not sure why the behavior is so undefined.
EDIT:
Thank you all! I have accepted IMSoP answer because not only was there an interative example that I could play around with (thanks for turning me on to SQL Fiddle), but they also explained why several things worked the way they worked, instead of simply "do this". Specifically, it was unclear that GROUP BY does not destroy (rather, keeps them in some sort of internal list) values in the other columns outside of the group by, and these values can still be examined in an ORDER BY clause.
This all has to do with the "logical ordering" of SQL statements. Although a DBMS might actually retrieve the data according to all sorts of clever strategies, it has to behave according to some predictable logic. As such, the different parts of an SQL query can be considered to be processed "before" or "after" one another in terms of how that logic behaves.
As it happens, the ORDER BY clause is the very last step in that logical sequence, so it can't change the behaviour of "earlier" steps.
If you use a GROUP BY, the rows have been bundled up into their groups by the time the SELECT clause is run, let alone the ORDER BY, so you can only look at columns which have been grouped by, or "aggregate" values calculated across all the values in a group. (MySQL implements a controversial extension to GROUP BY where you can mention a column in the SELECT that can't logically be there, and it will pick one from an arbitrary row in that group).
If you use a DISTINCT, it is logically processed after the SELECT, but the ORDER BY still comes afterwards. So only once the DISTINCT has thrown away the duplicates will the remaining results be put into a particular order - but the rows that have been thrown away can't be used to determine that order.
As for how to get the result you need, the key is to find a value to sort by which is valid after the GROUP BY/DISTINCT has (logically) been run. Remember that if you use a GROUP BY, any aggregated values are still valid - an aggregate function can look at all the values in a group. This includes MIN() and MAX(), which are ideal for ordering by, because "the lowest number" (MIN) is the same thing as "the first number if I sort them in ascending order", and vice versa for MAX.
So to order a set of distinct foo_number values based on the lowest applicable bar_number for each, you could use this:
SELECT foo_number
FROM some_table
GROUP BY foo_number
ORDER BY MIN(bar_number) ASC
Here's a live demo with some arbitrary data.
EDIT: In the comments, it was discussed why, if an ordering is applied before the grouping / de-duplication takes place, that order is not applied to the groups. If that were the case, you would still need a strategy for which row was kept in each group: the first, or the last.
As an analogy, picture the original set of rows as a set of playing cards picked from a deck, and then sorted by their face value, low to high. Now go through the sorted deck and deal them into a separate pile for each suit. Which card should "represent" each pile?
If you deal the cards face up, the cards showing at the end will be the ones with the highest face value (a "keep last" strategy); if you deal them face down and then flip each pile, you will reveal the lowest face value (a "keep first" strategy). Both are obeying the original order of the cards, and the instruction to "deal the cards based on suit" doesn't automatically tell the dealer (who represents the DBMS) which strategy was intended.
If the final piles of cards are the groups from a GROUP BY, then MIN() and MAX() represent picking up each pile and looking for the lowest or highest value, regardless of the order they are in. But because you can look inside the groups, you can do other things too, like adding up the total value of each pile (SUM) or how many cards there are (COUNT) etc, making GROUP BY much more powerful than an "ordered DISTINCT" could be.
I would go for something like
select col1
from (
select col1,
rank () over(order by col2) pos
from table
)
group by col1
order by min(pos)
In the subquery I calculate the position, then in the main query I do a group by on col1, using the smallest position to order.
Here the demo in SQLFiddle (this was Oracle, the MySql info was added later.
Edit for MySql:
select col1
from (
select col1 col1,
#curRank := #curRank + 1 AS pos
from table1, (select #curRank := 0) p
) sub
group by col1
order by min(pos)
And here the demo for MySql.
The GROUP BY in the referenced answer isn't attempting to perform an ordering... it is simply picking a single associated value for the column that we want to be distinct.
Like #bluefeet states, if you want a guaranteed ordering, you must use ORDER BY.
Why can't we specify a value in the ORDER BY that isn't included in the SELECT DISTINCT?
Consider the following values for col1 and col2:
create table yourTable (
col_1 int,
col_2 int
);
insert into yourTable (col_1, col_2) values (1, 1);
insert into yourTable (col_1, col_2) values (1, 3);
insert into yourTable (col_1, col_2) values (2, 2);
insert into yourTable (col_1, col_2) values (2, 4);
With this data, what should SELECT DISTINCT col_1 FROM yourTable ORDER BY col_2 return?
That's why you need the GROUP BY and the aggregate function, to decide which of the multiple values for col_2 you should order by... could be MIN(), could be MAX(), maybe even some other function such as AVG() would make sense in some cases; it all depends on the specific scenario, which is why you need to be explicit:
select col_1
from yourTable
group by col_1
order by min(col_2)
SQL Fiddle Here
For MySQL only, when you select columns that are not in the GROUP BY it will return columns from the first record in the group. You can use this behavior to select which record is returned from each group like this:
SELECT foo_number, bar_number
FROM
(
SELECT foo_number, bar_number
FROM some_table
ORDER BY bar_number
) AS t
GROUP BY foo_number
ORDER BY bar_number DESC;
This is more flexible because it allows you to order the records within each group using expressions that are not possible with aggregates - in my case I wanted to return the one with the shortest string in another column.
For completeness, my query looks like this:
SELECT
s.NamespaceId,
s.Symbol,
s.EntityName
FROM
(
SELECT
m.NamespaceId,
i.Symbol,
i.EntityName
FROM ImportedSymbols i
JOIN ExchangeMappings m ON i.ExchangeMappingId = m.ExchangeMappingId
WHERE
i.Symbol NOT IN
(
SELECT Symbol
FROM tmp_EntityNames
WHERE NamespaceId = m.NamespaceId
)
AND
i.EntityName IS NOT NULL
ORDER BY LENGTH(i.RawSymbol), i.RawSymbol
) AS s
GROUP BY s.NamespaceId, s.Symbol;
What this does is return a distinct list of symbols in each namespace, and for duplicated symbols returns the one with the shortest RawSymbol. When the RawSymbol lengths are the same, it returns the one who's RawSymbol comes first alphabetically.