Windowing function in My-Sql does not work as expected - mysql

I have three columns in a my-sql table: Id, name and mark. All rows are distinct with each other.
I use the below sql statements. Inside the windowing function, I don't use order by in both the SQL statements. I have only partition and range frame.
Ideally they should give same results under the derived column from windowing function; but the first one always gives the maximum mark under the window; whereas the second one compares the previous row and current row+1 and gives the expected result. The first one is really weird even though I give unbounded preceding and current row; It in fact, considers the whole window rather than the given frame.
Can someone please help.
Statement-1:
select *
,max(mark) over( partition by name rows between unbounded preceding and current row) as w_f
from ( select * from student order by name, mark asc) a
Statement-2:
select *
,max(mark) over( partition by name rows between 1 preceding and 1 following) as w_f
from ( select * from student order by name, mark asc) a

A row (or range) frame without an order by clause does not make sense: how do you define which row is preceding or following if you don't specify which column(s) should be used for ordering.
Also note that the subquery the order by clause probably does not do what you expect it to do. There is no guarantee that the inner sort propagate to the outer query whatsoever.
In absence of sample data and desired results, it is a bit unclear what you actually are trying to do. Assuming that you have ordering column id, the first query would phrase as:
select s.*,
max(mark) over(partition by name rows order by id) as w_f
from student
order by name, id
rows between unbounded preceding and current row is the default window specification (actually that's range between ..., which is equivalent if you have a unique sorting key).
And the second query would go like:
select s.*,
max(mark) over(partition by name rows order by id rows between 1 preceding and 1 following) as w_f
from student
order by name, id

Related

How to get one output for avg function? [duplicate]

My MySQL table contains a field that stores the user’s weight and a date field. I am trying to make a query that shows the average weight for a week.
My query so far looks like this:
SELECT Table.`id`,
Table.`weight`,
Table.`date`
FROM Table
WHERE id=%CURRENT_USER_ID%
ORDER BY date ASC
GROUP BY week(date);
This query generates a list where the results are grouped by week.
But I also need the average weight for each week. Right now it only shows the weight for one entry.
How can I make this query?
Be careful. Your SQL is not valid, even though MySQL can be configured to allow it.
This is an issue of functional dependence. The SELECT list terms must be functionally dependent on the GROUP BY terms.
You've selected id, which is not functionally dependent on week(date). In other words, your schema/logic does not guarantee that there is at most one id value per week(date) group. The same is true of trying to select date, which is also not guaranteed to resolve to just one value per week.
Your WHERE clause is also a problem. % can be used as a pattern in a LIKE expression, but not with the = operator, and was not properly quoted as a literal. I'll leave 'CURRENT_USER_ID' to represent your value, assuming that's the correct type. The table definition wasn't shown in the question.
A corrected version of your original query is:
SELECT week(`date`) AS the_week
, MIN(`weight`) AS min_weight
, MIN(`id`) AS min_id
FROM Table
WHERE id='CURRENT_USER_ID'
GROUP BY week(date)
ORDER BY the_week ASC
;
Note: The above uses the aggregate function MIN in the SELECT list. These expressions are functionally dependent on the GROUP BY terms. The AVG function can also be used, like this:
SELECT week(`date`) AS the_week
, AVG(`weight`) AS avg_weight
FROM Table
WHERE id='CURRENT_USER_ID'
GROUP BY week(date)
ORDER BY the_week ASC
;
and in MySQL, we can use the derived column name / alias in the GROUP BY terms.
SELECT week(`date`) AS the_week
, AVG(`weight`) AS avg_weight
FROM Table
WHERE id='CURRENT_USER_ID'
GROUP BY the_week
ORDER BY the_week ASC
;
That's a start.
You can make a little changing and get your desired output.
use this
SELECT Table.`id`,
Table.`weight`,
Table.`date`,
Avg('Table.weight')'Average Weight'
FROM Table
WHERE id=%CURRENT_USER_ID%
ORDER BY date ASC
GROUP BY week(date);

Show the average value of a group in MySQL

My MySQL table contains a field that stores the user’s weight and a date field. I am trying to make a query that shows the average weight for a week.
My query so far looks like this:
SELECT Table.`id`,
Table.`weight`,
Table.`date`
FROM Table
WHERE id=%CURRENT_USER_ID%
ORDER BY date ASC
GROUP BY week(date);
This query generates a list where the results are grouped by week.
But I also need the average weight for each week. Right now it only shows the weight for one entry.
How can I make this query?
Be careful. Your SQL is not valid, even though MySQL can be configured to allow it.
This is an issue of functional dependence. The SELECT list terms must be functionally dependent on the GROUP BY terms.
You've selected id, which is not functionally dependent on week(date). In other words, your schema/logic does not guarantee that there is at most one id value per week(date) group. The same is true of trying to select date, which is also not guaranteed to resolve to just one value per week.
Your WHERE clause is also a problem. % can be used as a pattern in a LIKE expression, but not with the = operator, and was not properly quoted as a literal. I'll leave 'CURRENT_USER_ID' to represent your value, assuming that's the correct type. The table definition wasn't shown in the question.
A corrected version of your original query is:
SELECT week(`date`) AS the_week
, MIN(`weight`) AS min_weight
, MIN(`id`) AS min_id
FROM Table
WHERE id='CURRENT_USER_ID'
GROUP BY week(date)
ORDER BY the_week ASC
;
Note: The above uses the aggregate function MIN in the SELECT list. These expressions are functionally dependent on the GROUP BY terms. The AVG function can also be used, like this:
SELECT week(`date`) AS the_week
, AVG(`weight`) AS avg_weight
FROM Table
WHERE id='CURRENT_USER_ID'
GROUP BY week(date)
ORDER BY the_week ASC
;
and in MySQL, we can use the derived column name / alias in the GROUP BY terms.
SELECT week(`date`) AS the_week
, AVG(`weight`) AS avg_weight
FROM Table
WHERE id='CURRENT_USER_ID'
GROUP BY the_week
ORDER BY the_week ASC
;
That's a start.
You can make a little changing and get your desired output.
use this
SELECT Table.`id`,
Table.`weight`,
Table.`date`,
Avg('Table.weight')'Average Weight'
FROM Table
WHERE id=%CURRENT_USER_ID%
ORDER BY date ASC
GROUP BY week(date);

MySQL MAX Function mixes rows

I have the query SELECT id, MAX(value) FROM table1 and it returns the correct value, but it takes the first id of the table instead of the one corresponding to the value returned (id is primary key).
I've already seen solutions, but they all needed a WHERE clause which i can't use in my case.
I believe what you're trying to do is return the id of the row with the max value. Is that right?
I'm curious why you can't use a WHERE clause?
But ok, using that constraint this can be solved. I'm going to assume that your table is unique on id (if not, you should really talk to whoever built it and ask why ?)
SELECT id, value
FROM table1
ORDER BY value DESC
LIMIT 1
This will sort your table, by value descending (greatest -> least), and then only show the first row (ie, the row with the largest "value").
If your table is not unique on id, you can still group by ID and get the same
SELECT id, max(value) as max_value
FROM table1
GROUP BY id
ORDER BY max_value DESC
LIMIT 1
First, to answer why your query is behaving in the way you observe: I suspect you are running without sql_mode = only_full_group_by as your query would likely generate an error otherwise. As you've noticed, this can lead to somewhat odd results.
If ONLY_FULL_GROUP_BY is disabled, a MySQL extension to the standard SQL use of GROUP BY permits the select list, HAVING condition, or ORDER BY list to refer to nonaggregated columns even if the columns are not functionally dependent on GROUP BY columns. This causes MySQL to accept the preceding query. In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are nondeterministic, which is probably not what you want.
In this case, since you have no GROUP BY clause, the entire table is effectively the group.
To get one id associated with the largest value in the table, you can select all the rows, order by the value (descending), and then just limit to the first result, no need for the aggregation operator (or a WHERE caluse):
SELECT id, value FROM table1 ORDER BY value DESC LIMIT 1
Note that if there are multiple ids with the (same) max value, this only returns one of them. In the comments, #RaymondNijland points out that this may give different results (for id, the value will always be the maximum) each time you run it, and you can make it more deterministic by ordering by id as well:
SELECT id, value FROM table1 ORDER BY value DESC, id ASC LIMIT 1
Likewise, if there are for some reason multiple values for the same ID, it will still return that ID if one of its rows happens to be the max value -- thankfully this doesn't apply in this case, as you mentioned that id is the primary key.
I think you forgot a group by clause :
SELECT id, MAX(value) FROM table1 GROUP BY id
EDIT : To answer your need you could do
SELECT id, MAX(value)
FROM table1
GROUP BY id
HAVING MAX(value) = (SELECT MAX(value) FROM table1)
This could give you multiple results if you have multiple ids with the max value. In this case you could add "LIMIT 1" to get only one result but that would be quite strange and random.

SQL Query Restricting to 1 return per condition contained within IN clause

Running MYSQL 5.5 and trying to essentially return only 1 record from each of the conditions in my IN clause. I can't use the DISTINCT because there should be multiple distinct records that are attached to each code (namely cost will be different) from the IN clause. Below is a dummy query of what I was trying to do, but doesn't work in 5.5 because of the ROW_NUMBER() function.
'1b' may have multiple records with differing cost values. title should always be the same across every record with the same codes value.
Any thoughts?
SELECT codes, name_place, title, cost
FROM (
SELECT *, ROW_NUMBER() OVER(PARTITION BY codes) rn
FROM MyDB.MyTable
)
WHERE codes IN ('1b', '1c', '1d', '1e')
AND rn = 1;

SQL select distinct but "keep first"?

According to another SO post (SQL: How to keep rows order with DISTINCT?), distinct has pretty undefined behavior as far as sorting.
I have a query:
select col_1 from table order by col_2
This can return values like
3
5
3
2
I need to then select a distinct on these that preserves ordering, meaning I want
select distinct(col_1) from table order by col_2
to return
3
5
2
but not
5
3
2
Here is what I am actually trying to do. Col_1 is a user id, and col_2 is a log in timestamp event by that user. So the same user (col_1) can have many login times. I am trying to build a historical list of users in which they were seen in the system. I would like to be able to say "our first user ever was, our second user ever was", and so on.
That post seems to suggest to use a group by, but group by is not meant to return an ordering of rows, so I do not see how or why this would be applicable here, since it does not appear group by will preserve any ordering. In fact, another SO post gives an example where group by will destroy the ordering I am looking for: see "Peter" in what is the difference between GROUP BY and ORDER BY in sql. Is there anyway to guarantee the latter result? The strange thing is, if I were implementing the DISTINCT clause, I would surely do the order by first, then take the results and do a linear scan of the list and preserve the ordering naturally, so I am not sure why the behavior is so undefined.
EDIT:
Thank you all! I have accepted IMSoP answer because not only was there an interative example that I could play around with (thanks for turning me on to SQL Fiddle), but they also explained why several things worked the way they worked, instead of simply "do this". Specifically, it was unclear that GROUP BY does not destroy (rather, keeps them in some sort of internal list) values in the other columns outside of the group by, and these values can still be examined in an ORDER BY clause.
This all has to do with the "logical ordering" of SQL statements. Although a DBMS might actually retrieve the data according to all sorts of clever strategies, it has to behave according to some predictable logic. As such, the different parts of an SQL query can be considered to be processed "before" or "after" one another in terms of how that logic behaves.
As it happens, the ORDER BY clause is the very last step in that logical sequence, so it can't change the behaviour of "earlier" steps.
If you use a GROUP BY, the rows have been bundled up into their groups by the time the SELECT clause is run, let alone the ORDER BY, so you can only look at columns which have been grouped by, or "aggregate" values calculated across all the values in a group. (MySQL implements a controversial extension to GROUP BY where you can mention a column in the SELECT that can't logically be there, and it will pick one from an arbitrary row in that group).
If you use a DISTINCT, it is logically processed after the SELECT, but the ORDER BY still comes afterwards. So only once the DISTINCT has thrown away the duplicates will the remaining results be put into a particular order - but the rows that have been thrown away can't be used to determine that order.
As for how to get the result you need, the key is to find a value to sort by which is valid after the GROUP BY/DISTINCT has (logically) been run. Remember that if you use a GROUP BY, any aggregated values are still valid - an aggregate function can look at all the values in a group. This includes MIN() and MAX(), which are ideal for ordering by, because "the lowest number" (MIN) is the same thing as "the first number if I sort them in ascending order", and vice versa for MAX.
So to order a set of distinct foo_number values based on the lowest applicable bar_number for each, you could use this:
SELECT foo_number
FROM some_table
GROUP BY foo_number
ORDER BY MIN(bar_number) ASC
Here's a live demo with some arbitrary data.
EDIT: In the comments, it was discussed why, if an ordering is applied before the grouping / de-duplication takes place, that order is not applied to the groups. If that were the case, you would still need a strategy for which row was kept in each group: the first, or the last.
As an analogy, picture the original set of rows as a set of playing cards picked from a deck, and then sorted by their face value, low to high. Now go through the sorted deck and deal them into a separate pile for each suit. Which card should "represent" each pile?
If you deal the cards face up, the cards showing at the end will be the ones with the highest face value (a "keep last" strategy); if you deal them face down and then flip each pile, you will reveal the lowest face value (a "keep first" strategy). Both are obeying the original order of the cards, and the instruction to "deal the cards based on suit" doesn't automatically tell the dealer (who represents the DBMS) which strategy was intended.
If the final piles of cards are the groups from a GROUP BY, then MIN() and MAX() represent picking up each pile and looking for the lowest or highest value, regardless of the order they are in. But because you can look inside the groups, you can do other things too, like adding up the total value of each pile (SUM) or how many cards there are (COUNT) etc, making GROUP BY much more powerful than an "ordered DISTINCT" could be.
I would go for something like
select col1
from (
select col1,
rank () over(order by col2) pos
from table
)
group by col1
order by min(pos)
In the subquery I calculate the position, then in the main query I do a group by on col1, using the smallest position to order.
Here the demo in SQLFiddle (this was Oracle, the MySql info was added later.
Edit for MySql:
select col1
from (
select col1 col1,
#curRank := #curRank + 1 AS pos
from table1, (select #curRank := 0) p
) sub
group by col1
order by min(pos)
And here the demo for MySql.
The GROUP BY in the referenced answer isn't attempting to perform an ordering... it is simply picking a single associated value for the column that we want to be distinct.
Like #bluefeet states, if you want a guaranteed ordering, you must use ORDER BY.
Why can't we specify a value in the ORDER BY that isn't included in the SELECT DISTINCT?
Consider the following values for col1 and col2:
create table yourTable (
col_1 int,
col_2 int
);
insert into yourTable (col_1, col_2) values (1, 1);
insert into yourTable (col_1, col_2) values (1, 3);
insert into yourTable (col_1, col_2) values (2, 2);
insert into yourTable (col_1, col_2) values (2, 4);
With this data, what should SELECT DISTINCT col_1 FROM yourTable ORDER BY col_2 return?
That's why you need the GROUP BY and the aggregate function, to decide which of the multiple values for col_2 you should order by... could be MIN(), could be MAX(), maybe even some other function such as AVG() would make sense in some cases; it all depends on the specific scenario, which is why you need to be explicit:
select col_1
from yourTable
group by col_1
order by min(col_2)
SQL Fiddle Here
For MySQL only, when you select columns that are not in the GROUP BY it will return columns from the first record in the group. You can use this behavior to select which record is returned from each group like this:
SELECT foo_number, bar_number
FROM
(
SELECT foo_number, bar_number
FROM some_table
ORDER BY bar_number
) AS t
GROUP BY foo_number
ORDER BY bar_number DESC;
This is more flexible because it allows you to order the records within each group using expressions that are not possible with aggregates - in my case I wanted to return the one with the shortest string in another column.
For completeness, my query looks like this:
SELECT
s.NamespaceId,
s.Symbol,
s.EntityName
FROM
(
SELECT
m.NamespaceId,
i.Symbol,
i.EntityName
FROM ImportedSymbols i
JOIN ExchangeMappings m ON i.ExchangeMappingId = m.ExchangeMappingId
WHERE
i.Symbol NOT IN
(
SELECT Symbol
FROM tmp_EntityNames
WHERE NamespaceId = m.NamespaceId
)
AND
i.EntityName IS NOT NULL
ORDER BY LENGTH(i.RawSymbol), i.RawSymbol
) AS s
GROUP BY s.NamespaceId, s.Symbol;
What this does is return a distinct list of symbols in each namespace, and for duplicated symbols returns the one with the shortest RawSymbol. When the RawSymbol lengths are the same, it returns the one who's RawSymbol comes first alphabetically.