I am trying to order a query by two keys. The query is built with several subqueries. The table contains, beside columns with other data, two columns, Key and Key_Father. So I need to order the results since SQL to print the results in a report. This is an example:
Key Key_Father
4 NULL
1 4
2 4
7 NULL
1 7
2 7
As you can see is a structure father-son, where a row is a father if the Key_Father is NULL and the Key column start from one for each son with a different father.
The first subquery gives the data in order, because is stored on that order in the table, but the second subquery that uses a group by, no. So I tried adding a extra column with Row_Number on the first subquery to keep that order, but the second subquery does the same thing.
This is the query:
SELECT Orden,INV_Key,Key_Padre,INV.INV_ID,INV.BOD_Bodega_ID,
CASE WHEN MAX(HIS_Ventas) > 0 OR max(HIS_Disponible) > 0 THEN 1 ELSE 0 END AS Participacion,MAX(ISNULL(HIS_Ventas,0)) AS Ventas
FROM(SELECT ROW_NUMBER() OVER (ORDER BY C.INV_Compra_ID) Orden,C.BOD_Bodega_ID,INV_Key,Key_Padre,CD.INV_ID
FROM dbo.INV_COMPRAS_USADOS C
INNER JOIN dbo.INV_COMPRAS_USADOS_DET CD ON C.INV_Compra_ID = CD.INV_Compra_ID
WHERE C.INV_Compra_ID = #Compra_ID
AND ((Key_Padre IS NULL AND CD.INV_Catalogo_Codigo = ISNULL(#Cod_Catalogo,CD.INV_Catalogo_Codigo)
AND INV_Key IN (SELECT DISTINCT Key_Padre
FROM dbo.INV_COMPRAS_USADOS_DET
WHERE INV_Compra_ID = #Compra_ID AND Key_Padre IS NOT NULL))
OR Key_Padre IN (SELECT DISTINCT INV_Key
FROM dbo.INV_COMPRAS_USADOS_DET
WHERE INV_Compra_ID = #Compra_ID AND (Key_Padre IS NULL AND CD.INV_Catalogo_Codigo = ISNULL(#Cod_Catalogo,CD.INV_Catalogo_Codigo))))) INV
LEFT JOIN DBO.HIS_HISTORICO_DETALLE HD ON INV.INV_ID = HD.INV_ID AND HD.BOD_Bodega_ID = INV.BOD_Bodega_ID
LEFT JOIN DBO.HIS_HISTORICO_INVENTARIO H on H.HIS_Historico_ID= HD.HIS_Historico_ID AND (CONVERT(datetime,(convert(varchar(20),HIS_Historico_Ano) + '/' + convert(varchar(20),HIS_Historico_Mes) + '/01')) BETWEEN #FechaDesde AND #FechaHasta)
WHERE H.HIS_Historico_Mes IS NOT NULL OR INV.INV_ID IS NULL
GROUP BY Orden,INV_Key,Key_Padre,INV.INV_ID,INV.BOD_Bodega_ID,HIS_Historico_Ano,HIS_Historico_Mes
Another interesting thing (well for me) is that when I change the #Variables for Constant values, the second query keeps the correct order, even when the constant values are the same that the #variables. This is just a portion of the total query, is a subquery that needs of another two selects, and I need to keep the order from those selects too.
So I hope that someone could help me with this. Thanks!
To order the results you need to place an ORDER BY clause on the outermost SELECT statement. Using ORDER BY in a nested SELECT is generally not permitted but even if you work around it (e.g. by using TOP), you can't rely on the results being ordered in any particular way.
Without an ORDER BY the results may appear to be coming out in the order you want but this cannot be relied upon. Running the same query on a different server or at some point in the future may produce a different order where differences in statistics, server load, etc can affect how the query optimizer actually executes the statement.
The portion of the query you've provided is outputting the following columns. Which are the ones you want to order by?
Orden (although this is just an alias for INV_Compra_ID as far as orderin is concerned)
INV_Key
Key_Padre
INV_ID
BOD_Bodega_ID
Participacion
Ventas
Let's say you want to order by just thre of them, then you need to append the following clause to the outermost SELECT:
ORDER BY
Orden,
INV_Key,
Key_Padre,
This should do it. I'm not sure if I'm missing an obvious simplification though.
ORDER BY ISNULL(Key_Father,[Key]), ISNULL(Key_Father,-1),[Key]
Related
Notes about the database
It was generated using Prisma so unfortunately the column names in the many-to-many tables are named "A" and "B". "A" refers to the table which comes first in the alphabet and "B" the second. For example, in _ReadingToWord, "A" refers to Reading.id and "B" refers to Word.id because "r" comes before "w" in the alphabet.
The problem
I have the below query that uses a limit statement to implement paging.
The problem I am having is that the result order is non-deterministic. (If I execute the query a bunch of times, some of the time the order will be different).
I am ordering by id which is a primary key so I thought that should ensure a consistent order.
Can anyone explain why the ordering is non-deterministic and how to fix it?
select * from (
SELECT w.id,
hiragana,
group_concat( distinct(concat(coalesce(r.downStep, -1) + 1 , "," ,r.katakana)) order by r.downStep SEPARATOR ' ')
from Hiragana a join _HiraganaToWord b on a.id = b.A join
Word w on w.id = b.B join _ReadingToWord rtw on w.id = rtw.B join
Reading r on r.id = rtw.A
WHERE hiragana like "あ%"
group by w.id
)
as groupQuery
order by length(hiragana), hiragana, id asc limit 600,5;
Sample runs
You are experiencing one of the subtle side-effects of disabling only_full_group_by:
If ONLY_FULL_GROUP_BY is disabled, a MySQL extension to the standard SQL use of GROUP BY permits the select list, HAVING condition, or ORDER BY list to refer to nonaggregated columns even if the columns are not functionally dependent on GROUP BY columns. This causes MySQL to accept the preceding query. In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are nondeterministic, which is probably not what you want.
If you would enable that mode, you would get an error like
Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'a.hiragana' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
and searching on stackoverflow for that error message will give you lots and lots of examples for this problem.
So in your query
SELECT w.id, a.hiragana,
...
group by w.id
...
order by hiragana
the values for hiragana are not necessarily deterministic. If, for the same w.id, there are several values for a.hiragana, MySQL can pick any of those. And if you order by that non-deterministically chosen value, you can get different orders. MySQL doesn't actually pick a random row, just doesn't care which one it is, so oftentimes, you get the same (which can make this harder to spot), but not always.
It doesn't have to be the entry with id 31752 for which MySQL has picked a different value for hiragana (it can be any of the previous 600 rows), but I would check that value first - if it has a 2nd value that also starts with "あ" but would be ordered after the value for 47348 (or is longer), it might immediately make things clearer.
You can technically fix this by picking a deterministic value there, e.g. the min or max value:
select * from (
SELECT w.id,
min(hiragana) as hiragana,
...
group by w.id
) as groupQuery
order by length(hiragana), hiragana, id asc limit 600,5;
You have to check if that is what you are actually trying to do (e.g., if there are several choices for hiragana, you don't care which one is chosen, as long as it is a determinic one) and if this fits your required result. Other choices might be group by w.id, a.hiragana or group by w.id, a.id, or maybe you need to completely rewrite your query (as it may not cover this case).
My database is called: (training_session)
I try to print out some information from my data, but I do not want to have any duplicates. I do get it somehow, may someone tell me what I do wrong?
SELECT DISTINCT athlete_id AND duration FROM training_session
SELECT DISTINCT athlete_id, duration FROM training_session
It works perfectly if i use only one column, but when I add another. it does not work.
I think you misunderstood the use of DISTINCT.
There is big difference between using DISTINCT and GROUP BY.
Both have some sort of goal, but they have different purpose.
You use DISTINCT if you want to show a series of columns and never repeat. That means you dont care about calculations or group function aggregates. DISTINCT will show different RESULTS if you keep adding more columns in your SELECT (if the table has many columns)
You use GROUP BY if you want to show "distinctively" on a certain selected columns and you use group function to calculate the data related to it. Therefore you use GROUP BY if you want to use group functions.
Please check group functions you can use in this link.
https://dev.mysql.com/doc/refman/8.0/en/group-by-functions.html
EDIT 1:
It seems like you are trying to get the "latest" of a certain athlete, I'll assume the current scenario if there is no ID.
Here is my alternate solution:
SELECT a.athlete_id ,
( SELECT b.duration
FROM training_session as b
WHERE b.athlete_id = a.athlete_id -- connect
ORDER BY [latest column to sort] DESC
LIMIT 1
) last_duration
FROM training_session as a
GROUP BY a.athlete_id
ORDER BY a.athlete_id
This syntax is called IN-SELECT subquery. With the help of LIMIT 1, it shows the topmost record. In-select subquery must have 1 record to return or else it shows error.
MySQL's DISTINCT clause is used to filter out duplicate recordsets.
If your query was SELECT DISTINCT athlete_id FROM training_session then your output would be:
athlete_id
----------
1
2
3
4
5
6
As soon as you add another column to your query (in your example, the column called duration) then each record resulting from your query are unique, hence the results you're getting. In other words the query is working correctly.
I've been trying to learn MySQL, and I'm having some trouble creating a join query to not select duplicates.
Basically, here's where I'm at :
SELECT atable.phonenumber, btable.date
FROM btable
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
However, in my database, there is the possibility of having duplicate rows in column atable.phonenumber.
For example (added asterisks for clarity)
phonenumber | date
-------------|-----------
*555-681-2105 | 2015-08-12
555-425-5161 | 2015-08-15
331-484-7784 | 2015-08-17
*555-681-2105 | 2015-08-25
.. and so on.
I tried using SELECT DISTINCT but that doesn't work. I also was looking through other solutions which recommended GROUP BY, but that threw an error, most likely because of my WHERE clause and condition. Not really sure how I can easily accomplish this.
DISTINCT applies to the whole row being returned, essentially saying "I want only unique rows" - any row value may participate in making the row unique
You are getting phone numbers duplicated because you're only looking at the column in isolation. The database is looking at phone number and also date. The rows you posted have different dates, and these hence cause the rows to be different
I suggest you do as the commenter recommended and decide what you want to do with the dates. If you want the latest date for a phone number, do this:
SELECT atable.phonenumber, max(btable.date)
FROM battle
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
GROUP BY atable.phonenumber
When you write a query that uses grouping, you will get a set of rows where there is only one set of value combinations for anything that is in the group by list. In this case, only unique phone numbers. But, because you want other values as well (I.e. Date) you MUST use what's called an aggregate function, to specify what you want to do with all the various values that aren't part of the unique set. Sometimes it will be MAX or MIN, sometimes it will be SUM, COUNT, AVG and so on.
if you're familiar with hash tables or dictionaries from elsewhere in programming, this is what a group by is: it maps a set of values (a key) to a list of rows that have those key values, and then the aggregating function is applied to any of the values in the list associated with the key
The simple rule when using group by (and one that MySQL will do implicitly for you) is to write queries thus:
SELECT
List,
of,
columns,
you,
want,
in,
unique,
combination,
FN(List),
FN(of),
FN(columns),
FN(you),
FN(want),
FN(aggregating)
FROM table
GROUP BY
List,
of,
columns,
you,
want,
in,
unique,
combination
i.e. You can copy paste from your select list to your group list. MySQL does this implicitly for you if you don't do it (i.e. If you use one or more aggregate functions like max in your select list, but forget or omit the group by clause- it will take everything that isn't in an agggregate function and run the grouping as if you'd written it). Whether group by is hence largely redundant is often debated, but there do exist other things you can do with a group by, such as rollup, cube and grouping sets. Also you can group on a column, if that column is used in a deterministic function, without having to group on the result of he deterministic function. Whether there is any point to doing so is a debate for another time :)
You should add GROUP BY, and an aggregate to the date field, something like this:
SELECT atable.phonenumber, MAX(btable.date)
FROM btable
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
GROUP BY atable.phonenumber
This will return the maximum date, hat is the latest date...
I have this query
SELECT `PR_CODIGO`, `PR_EXIBIR`, `PR_NOME`, `PRC_DETALHES` FROM `PROPRIETARIOS` LEFT JOIN `PROPRIETARIOSCONTATOS` ON `PROPRIETARIOSCONTATOS`.`PRC_COD_CAD` = `PROPRIETARIOS`.`PR_CODIGO` WHERE `PR_EXIBIR` = 'T' LIMIT 20
It runs very fast, less than 1 second.
If i add GROUP BY, it takes several seconds (5+) to run. Even the Group By field being index.
I'm using group by because the query above returns repeated rows (i search for a name and his contacts on another table, show's 4 times same name).
How do i fix this?
With the GROUP BY clause, the LIMIT clause isn't applied until after the rows are collapsed by the group by operation.
To get an understanding of the operations that MySQL is performing and which indexes are being considered and chosen by the optimizer, we use EXPLAIN.
Unstated in the question is what "field" (columns or expressions) are in the GROUP BY clause. So we are only guessing.
Based on the query shown in the question...
SELECT pr.pr_codigo
, pr.pr_exibir
, pr.pr_nome
, prc.prc_detalhes
FROM `PROPRIETARIOS` pr
LEFT
JOIN `PROPRIETARIOSCONTATOS` prc
ON prc.prc_cod_cad = pr.pr_codigo
WHERE pr.pr_exibir = 'T'
LIMIT 20
Our guess at the most appropriate indexes...
... ON PROPRIETARIOSCONTATOS (prc_cod_cad, prc_detalhes)
... ON PROPRIETARIOS (pr_exibir, pr_codigo, pr_exibir, pr_nome)
Our guess is going to change depending on what column(s) are listed in the GROUP BY clause. And we might also suggest an alternative query to return an equivalent result.
But without knowing the GROUP BY clause, without knowing if our guesses about which table each column is from are correct, without knowing the column datatypes, without any estimates of cardinality, and without example data and expected output, ... we're flying blind and just making guesses.
I have a table filled with tasting notes written by users, and another table that holds ratings that other users give to each tasting note.
The query that brings up all notes that are written by other users that you have not yet rated looks like this:
SELECT tastingNotes.userID, tastingNotes.beerID, tastingNotes.noteID, tastingNotes.note, COALESCE(sum(tasteNoteRate.Score), 0) as count,
CASE
WHEN tasteNoteRate.userVoting = 1162 THEN 1
ELSE 0
END AS userScored
FROM tastingNotes
left join tasteNoteRate on tastingNotes.noteID = tasteNoteRate.noteID
WHERE tastingNotes.userID != 1162
Group BY tastingNotes.noteID
HAVING userScored < 1
ORDER BY count, userScored
User 1162 has written a note for note 113. In the tasteNoteRate table it shows up as:
noteID | userVoting | score
113 1162 0
but it is still returned each time the above query is run....
MySQL allows you to use group by in a rather special way without complaining, see the documentation:
If ONLY_FULL_GROUP_BY is disabled, a MySQL extension to the standard SQL use of GROUP BY permits the select list, HAVING condition, or ORDER BY list to refer to nonaggregated columns even if the columns are not functionally dependent on GROUP BY columns. [...] In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate, which is probably not what you want.
This behaviour was the default behaviour prior to MySQL 5.7.
In your case that means, if there is more than one row in tasteNoteRate for a specific noteID, so if anyone else has already voted for that note, userScored, which is using tasteNoteRate.userVoting without an aggregate function, will be based on a random row - likely the wrong one.
You can fix that by using an aggregate:
select ...,
max(CASE
WHEN tasteNoteRate.userVoting = 1162 THEN 1
ELSE 0
END) AS userScored
from ...
or, because the result of a comparison (to something other than null) is either 1 or 0, you can also use a shorter version:
select ...,
coalesce(max(tasteNoteRate.userVoting = 1162),0) AS userScored
from ...
To be prepared for an upgrade to MySQL 5.7 (and enabled ONLY_FULL_GROUP_BY), you should also already group by all non-aggregate columns in your select-list: group by tastingNotes.userID, tastingNotes.beerID, tastingNotes.noteID, tastingNotes.note.
A different way of writing your query (amongst others) would be to do the grouping of tastingNoteRates in a subquery, so you don't have to group by all the columns of tastingNotes:
select tastingNotes.*,
coalesce(rates.count, 0) as count,
coalesce(rates.userScored,0) as userScored
from tastingNotes
left join (
select tasteNoteRate.noteID,
sum(tasteNoteRate.Score) as count,
max(tasteNoteRate.userVoting = 1162) as userScored
from tasteNoteRate
group by tasteNoteRate.noteID
) rates
on tastingNotes.noteID = rates.noteID and rates.userScored = 0
where tastingNotes.userID != 1162
order by count;
This also allows you to get the notes the user voted on by changing rates.userScored = 0 in the on-clause to = 1 (or remove it to get both).
Change to an inner join.
The tasteNoteRate table is being left joined to the tastingNotes, which means that the full tastingNotes table (matching the where) is returned, and then expanded by the matching fields in the tasteNoteRate table. If tasteNoteRate is not satisfied, it doesn't prevent tastingNotes from returning the matched fields. The inner join will take the intersection.
See here for more explanation of the types of joins:
What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?
Make sure to create an index on noteID in both tables or this query and use case will quickly explode.
Note: Based on what you've written as the use case, I'm still not 100% certain that you want to join on noteID. As it is, it will try to give you a joined table on all the notes joined with all the ratings for all users ever. I think the CASE...END is just going to interfere with the query optimizer and turn it into a full scan + join. Why not just add another clause to the where..."and tasteNoteRate.userVoting = 1162"?
If these tables are not 1-1, as it looks like (given the sum() and "group by"), then you will be faced with an exploding problem with the current query. If every note can have 10 different ratings, and there are 10 notes, then there are 100 candidate result rows. If it grows to 1000 and 1000, you will run out of memory fast. Eliminating a few rows that the userID hasn't voted on will remove like what 10 rows from eventually 1,000,000+, and then sum and group them?
The other way you can do it is to reverse the left join:
select ...,sum()... from tasteNoteRate ... left join tastingNotes using (noteID) where userID != xxx group by noteID, that way you only get tastingNotes information for other users' notes.
Maybe that helps, maybe not, but yeah, SCHEMA and specific use cases/example data would be helpful.
With this kind of "ratings of ratings", sometimes its better to maintain a summary table of the vote totals and just track which the user has already voted on. e.g. Don't sum them all up in the select query. Instead, sum it up in the insert...on duplicate key update (total = total + 1); At least thats how I handle the problem in some user ranking tables. They just grow so big so fast.