I have a table data like this
Column_a | Column_b
a | 5
b | 25
g | 14
t | 13
b | 15
c | 04
g | 15
b | 13
in the column_a i have a lot of duplicate values, i want to be able to select all the rows from the table but if two rows have the same column_a value, i want only the row with the biggest value from colum_b in the results
exemple of the result that i am looking for :
Column_a | Column_b
a | 5
b | 25
t | 13
c | 04
g | 15
Thank you in advance
**
Update of the question
**
these are the columns i have in my table :
CRMID | user | ticket_id | | description | date | hour
what i am trying to do is to select all the rows from the table, but when two rows have the same ticket_id, i want only the newest one to appear in the results, so the row with the newest date and hour ,
Sorry for making this such complicated !
i am not a native english speaker and i find it hard to well explain the problem.
If you group by column_a then you can use aggregate functions like max() on it to get the maximum value of each group
select column_a,
max(column_b) as column_b
from your_table
group by column_a
SELECT Column_A, MAX(Column_B) FROM table
GROUP BY Column_A
You're looking for a Group By clause. Your syntax should look similar to this:
SELECT Column_A, MAX(Column_B)
FROM Table
GROUP BY Column_A
If you want to get all the columns in the table, then you have a different problem (and one not in the original posting). One reason you should add code into such a question is so you get a broader range of answers. I, for one, ignored the question, thinking it was just a newbie asking about obvious SQL functionality.
In MySQL the best approach is to use not exists:
select t.*
from table t
where not exists (select 1
from table t2
where t2.column_a = t.column_a and
t2.column_b > t.column_b
);
For optimal performance, you want an index on table(column_a, column_b). Also, this can return multiple rows, if there are duplicated maximum values.
This query is not intuitive. What it is doing is: "Get me all rows from the table where there is no other row with the same column_a value and a higher column_b value". If you think about it, this is the same as getting the maximum value. This has better performance than other methods (notably, aggregation and join), because MySQL does a simple index lookup for each row in the table. That is faster than aggregation and join.
select * from (select * from yourtable order by column_b desc)t group by column_a
Related
From the table below as an example, I need to select all fields from a table where the first 3 columns are the exact same, and take the first time this instance appears. For example, rows 1,3 and 4 should be selected, as they have differing values in the first 3 columns. I have been given this data, and there is no unique ID. There are about 25000 records so handling this once I have SELECT the data in python seems silly therefore the only methods I can think are deleting the records that are nearly identical, or using a SELECT statement I have not worked out yet. Would it be better do try and select the data in small amounts and use python to use the correct bits, as while this is messier, I know how to do it this way?
ID | Class | Season | Grade
---|-------|--------|---------
1 | x | 1 | A
1 | x | 1 | A*
1 | y | 1 | A
1 | x | 2 | C
Try using DISTINCT * it means "select all columns and skip any rows where the values in all columns match some already included row".
So with LIMIT 3 you will have the first 3 unique rows:
SELECT distinct * FROM yourTable LIMIT 3;
You want the first three unique rows. You can actually do this pretty easily if you have an ordering column:
select t.*
from (select t.*,
row_number() over (partition by id, class, season order by <orderingcol>) as seqnum
from t
) t
where seqnum = 1
order by <orderingcol>
limit 3;
Actually, the subquery is not necessary, but the query is a bit more inscrutable without it:
select t.*
from t
where seqnum = 1
order by row_number() over (partition by id, class, season order by <orderingcol>),
<orderingcol>
limit 3;
The one caveat is that this will return duplicates if there are not three unique ones.
Window functions were introduced in MySQL 8+. This could be phrased in earlier versions of MySQL as well:
select t.*
from t join
(select id, class, season, min(<ordering col>) as min_oc
from t
) tt
using (id, class, season)
where t.<ordering col> = tt.min_oc
order by tt.min_oc;
I have a mysql table with entries of my driver's logbook. In the table there are two columns: start_place and end_place. Sometimes it's possible, that end_place is equal to start_place (i think that sounds logical).
Now I wan't to select the entries of the table which occour as tupel (x,y), but not as (y,x).
Example:
id | start_place | end_place
-----------------------------------
0 | New York | San Francisco
-----------------------------------
1 | San Francisco | New York
The row with the id 1 is a duplicate of id 0 in reversed order and should not be part of the result.
Does someone has an idea? Several times I tried with subselects or where conditions like (x,y) != (y,x) but that doesn't work.
This can be done with least and greatest functions with a group by.
select least(start_place,end_place), greatest(start_place,end_place)
from tbl
group by least(start_place,end_place), greatest(start_place,end_place)
having count(*) = 1
To retrieve such rows with other columns, use
select *
from tbl
where (least(start_place,end_place), greatest(start_place,end_place))
in (select least(start_place,end_place), greatest(start_place,end_place)
from tbl
group by least(start_place,end_place), greatest(start_place,end_place)
having count(*) = 1
)
Use LEAST, GREATEST and DISTINCT to get distinct pairs:
select distinct
least(start_place, end_place) as place1,
greatest(start_place, end_place) as place2
from mytable;
I am learning about databases at college, and have the assignment about finding the minimum avg exam grade for a college course. I have made two solutions, but I hope you experts in here can help me with:
What is the best/most effective solution?
Solution 1:
SELECT courses.name , MIN(avg_grade)
FROM (SELECT courseCode, AVG(grade) as avg_grade
FROM exams
GROUP BY courseCode) avg_grades, courses
WHERE courses.code = avg_grades.courseCode
Solution 2:
SELECT name, min(avg_grade)
FROM (SELECT courses.name, AVG(grade) as avg_grade
FROM courses
LEFT JOIN exams on exams.courseCode = courses.code
GROUP BY courseCode) mytable
And I have been thinking about if JOIN or LEFT JOIN is the correct to use here?
Your two queries are different, so you can't really compare efficiency, your second query will return records for courses with no exam results.
Assuming that you switch the LEFT JOIN to an INNER to make the queries comparable, then I would expect the first query to be slightly more efficient since it only has one
derived table, and the second has two:
Solution 1:
ID SELECT_TYPE TABLE TYPE POSSIBLE_KEYS KEY KEY_LEN REF ROWS FILTERED EXTRA
1 PRIMARY ALL 5 100
1 PRIMARY courses ALL 5 100 Using where; Using join buffer
2 DERIVED exams ALL 5 100 Using temporary; Using filesort
Solution 2:
ID SELECT_TYPE TABLE TYPE POSSIBLE_KEYS KEY KEY_LEN REF ROWS FILTERED EXTRA
1 PRIMARY ALL 5 100
2 DERIVED courses ALL 5 100 Using temporary; Using filesort
2 DERIVED exams ALL 5 100 Using where; Using join buffer
I would however check this against your own execution plans as mine was just a quick example on SQL Fiddle.
I would like to take this chance to advise against using the ANSI-89 implicit join syntax, it was replaced over 20 years ago by the explicit join syntax in the ANSI-92 standard. Aaron Bertrand has written a great article on why to switch, I won't duplicate it here.
Another, much more important point though is that your queries are not deterministic, that is to say you could run the same query twice and get 2 different results even with no underlying change in the data.
Taking your second query as an example (although you will notice both queries are wrong on the SQL-Fiddle), you have a subquery MyTable like so:
SELECT courses.name, AVG(grade) as avg_grade
FROM courses
LEFT JOIN exams on exams.courseCode = courses.code
GROUP BY courseCode
This returned a table like so:
Name | avg_grade
--------+--------------
A | 10
B | 5
C | 6
D | 7
E | 2
You may expect the query as a whole to return:
Name | avg_grade
--------+--------------
E | 2
Since 2 is the lowest average grade, and E is the name that corresponds with that. You would be wrong though, as demonstrated here you can see this actually returns:
Name | avg_grade
--------+--------------
A | 2
What is essentially happening is that MySQL is calculating the minimum avg_grade correctly, but since you have not added any columns to the group by you have given MySQL Carte blanche to choose any value for Name it chooses.
To get the output you want, I think you need:
SELECT courses.name , MIN(avg_grade)
FROM ( SELECT courseCode, AVG(grade) as avg_grade
FROM exams
GROUP BY courseCode
) avg_grades
INNER JOIN courses
ON courses.code = avg_grades.courseCode
GROUP BY courses.Name;
Or if you only want to the course with the lowest average grade then use:
SELECT courseCode, AVG(grade) as avg_grade
FROM exams
GROUP BY courseCode
ORDER BY avg_grade
LIMIT 1;
Examples on SQL Fiddle
Please excuse the laziness of what I am about to do, but I have explained this problem a lot before, and now have a standard response that I post to explain the issue of MySQL grouping. It goes into more detail than the above, and hopefully explains it further.
MySQL Implicit Grouping
I would advise to avoid the implicit grouping offered by MySQL where possible, by this i mean including columns in the select list, even though they are not contained in an aggregate function or the group by clause.
Imagine the following simple table (T):
ID | Column1 | Column2 |
----|---------+----------|
1 | A | X |
2 | A | Y |
In MySQL you can write
SELECT ID, Column1, Column2
FROM T
GROUP BY Column1;
This actually breaks the SQL Standard, but it works in MySQL, however the trouble is it is non-deterministic, the result:
ID | Column1 | Column2 |
----|---------+----------|
1 | A | X |
Is no more or less correct than
ID | Column1 | Column2 |
----|---------+----------|
2 | A | Y |
So what you are saying is give me one row for each distinct value of Column1, which both results sets satisfy, so how do you know which one you will get? Well you don't, it seems to be a fairly popular misconception that you can add and ORDER BY clause to influence the results, so for example the following query:
SELECT ID, Column1, Column2
FROM T
GROUP BY Column1
ORDER BY ID DESC;
Would ensure that you get the following result:
ID | Column1 | Column2 |
----|---------+----------|
2 | A | Y |
because of the ORDER BY ID DESC, however this is not true (as demonstrated here).
The MySQL documents state:
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause.
So even though you have an order by this does not apply until after one row per group has been selected, and this one row is non-deterministic.
The SQL-Standard does allow columns in the select list not contained in the GROUP BY or an aggregate function, however these columns must be functionally dependent on a column in the GROUP BY. For example, ID in the sample table is the PRIMARY KEY, so we know it is unique in the table, so the following query conforms to the SQL standard and would run in MySQL and fail in many DBMS currently (At the time of writing Postgresql is the closest DBMS I know of to correctly implementing the standard):
SELECT ID, Column1, Column2
FROM T
GROUP BY ID;
Since ID is unique for each row, there can only be one value of Column1 for each ID, one value of Column2 there is no ambiguity about what to return for each row.
EDIT
From the SQL-2003-Standard (5WD-02-Foundation-2003-09 - page 346) - http://www.wiscorp.com/sql_2003_standard.zip
If T is a grouped table, then let G be the set of grouping columns of T. In each contained
in , each column reference that references a column of T shall reference some column C that
is functionally dependent on G or shall be contained in an aggregated argument of a
whose aggregation query is QS.
Let's say I have a table making a junction between two tables... like this:
id_product | id_category
-----------------
11 | 22
11 | 33
12 | 22
12 | 33
I want to get id_products (distinct) according to a list of searched categories IDs.
If I use the IN() clause, the list of id_categories uses a logical OR.
How can I make a SELECT query to have logical ANDs for the list of id_categ submitted??
Example: I want all the id_products belonging to category 22 AND 33 (and possibly 5+ more Categ. IDs)
I found this answer:
Using MySQL IN clause as all inclusive (AND instead of OR)
...but the query is mixing more than 1 table... I only want a query on a single table, the junction one.
reading your link, I think it would be something like
select id_product
from yourTable
where id_category in (--your List goes Here)
group by id_product
having count(distinct id_category) = NumberOfElementsOfYourList
you should use = if only wants to get that id_category, but no others id_category.
If not, use >=
select id_product
from your_table
where id_category in (22, 33)
group by id_product
having count(distinct id_category) = 2
You can add a having clause that counts the found id_category's. If you look for 5 IDs for instance, you have to change the 2 in 5 in the having clause.
Sorry for the ambiguous title; but I don't now how to describe it different.
I have the following table:
imei | date | time | some more fields
345 | 2012-06-28 | 07:18 | .....
345 | 2012-06-28 | 07:20 | .....
345 | 2012-06-28 | 07:21 | .....
987 | 2012-06-28 | 07:19 | .....
etc etc
I want to get the latest row of ervery distinct imei, so:
345 | 2012-06-28 | 07:21
987 | 2012-06-28 | 07:19
Using SELECT * FROM t GROUP BY imei results in using the first line instead of the last one.
Ordering by time obviously orders the result relation instead of the sub.
Using having is only for stating a condition....
How can I write a query to do this?
As stated in the manual:
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values the server chooses.
To obtain the groupwise maximum (as you want), you need to join the result of a grouped subquery with your table:
SELECT * FROM t JOIN (
SELECT imei, MAX(ADDTIME(date, time)) AS dt FROM t GROUP BY imei
) AS u ON t.imei = u.imei AND t.date = DATE(u.dt) AND t.time = TIME(u.dt)
Fahim is right. Separated date and time columns make life unnecessarily difficult. No matter.
Try this.
SELECT IMEI, MAX(STR_TO_DATE(CONCAT(`date`,` `,'`time`)))
FROM T
GROUP BY IMEI
You may have to muck about to get the MAX/STR/CONCAT expression just right.
This is a summary query that finds the maximum datetime for each of your IMEI items.
Take a look at that post: How to combine date from one field with time from another field - MS SQL Server
In order to this you can simply use the SQL statement:
SELECT imei, (date+time) AS datim, [some more fields] FROM yourTable;
Then you can use max, min, distinct on the virtual field datim.
maybe you should order by both imei and date, time:
SELECT * FROM t GROUP BY imei ORDER BY imei, date DESC, time DESC