I have a mysql table with entries of my driver's logbook. In the table there are two columns: start_place and end_place. Sometimes it's possible, that end_place is equal to start_place (i think that sounds logical).
Now I wan't to select the entries of the table which occour as tupel (x,y), but not as (y,x).
Example:
id | start_place | end_place
-----------------------------------
0 | New York | San Francisco
-----------------------------------
1 | San Francisco | New York
The row with the id 1 is a duplicate of id 0 in reversed order and should not be part of the result.
Does someone has an idea? Several times I tried with subselects or where conditions like (x,y) != (y,x) but that doesn't work.
This can be done with least and greatest functions with a group by.
select least(start_place,end_place), greatest(start_place,end_place)
from tbl
group by least(start_place,end_place), greatest(start_place,end_place)
having count(*) = 1
To retrieve such rows with other columns, use
select *
from tbl
where (least(start_place,end_place), greatest(start_place,end_place))
in (select least(start_place,end_place), greatest(start_place,end_place)
from tbl
group by least(start_place,end_place), greatest(start_place,end_place)
having count(*) = 1
)
Use LEAST, GREATEST and DISTINCT to get distinct pairs:
select distinct
least(start_place, end_place) as place1,
greatest(start_place, end_place) as place2
from mytable;
Related
From the table below as an example, I need to select all fields from a table where the first 3 columns are the exact same, and take the first time this instance appears. For example, rows 1,3 and 4 should be selected, as they have differing values in the first 3 columns. I have been given this data, and there is no unique ID. There are about 25000 records so handling this once I have SELECT the data in python seems silly therefore the only methods I can think are deleting the records that are nearly identical, or using a SELECT statement I have not worked out yet. Would it be better do try and select the data in small amounts and use python to use the correct bits, as while this is messier, I know how to do it this way?
ID | Class | Season | Grade
---|-------|--------|---------
1 | x | 1 | A
1 | x | 1 | A*
1 | y | 1 | A
1 | x | 2 | C
Try using DISTINCT * it means "select all columns and skip any rows where the values in all columns match some already included row".
So with LIMIT 3 you will have the first 3 unique rows:
SELECT distinct * FROM yourTable LIMIT 3;
You want the first three unique rows. You can actually do this pretty easily if you have an ordering column:
select t.*
from (select t.*,
row_number() over (partition by id, class, season order by <orderingcol>) as seqnum
from t
) t
where seqnum = 1
order by <orderingcol>
limit 3;
Actually, the subquery is not necessary, but the query is a bit more inscrutable without it:
select t.*
from t
where seqnum = 1
order by row_number() over (partition by id, class, season order by <orderingcol>),
<orderingcol>
limit 3;
The one caveat is that this will return duplicates if there are not three unique ones.
Window functions were introduced in MySQL 8+. This could be phrased in earlier versions of MySQL as well:
select t.*
from t join
(select id, class, season, min(<ordering col>) as min_oc
from t
) tt
using (id, class, season)
where t.<ordering col> = tt.min_oc
order by tt.min_oc;
I have a table t1 with 5 columns and 80000 rows :
+---+--------+-------+--------+------------+
|id |category|groupe |subject | description|
+---+--------+-------+--------+------------+
|1 |categ1 |group1 |subject1| desc1 |
|2 |categ1 |group2 |subject2| desc2 |
|3 |categ1 |group2 |subject5| desc3 |
|4 |categ2 |group1 |subject5| desc4 |
|5 |categ2 |group3 |subject1| desc5 |
|6 |categ2 |group3 |subject2| desc6 |
|7 |categ3 |group1 |subject1| desc7 |
|8 |categ3 |group1 |subject4| desc8 |
+---+--------+-------+--------+------------+
I need to extract rows that have minimum 30 occurrences of values in category AND 30 occurrences of group AND 30 of subject.
This means if "categ3" appears more than 30 times, i need rows with categ3
same with group and subject.
but when i used the query bellow the final result can have less than 30 categ3 because result has been filtered by group or subject that remove id who have categ3.
You can see an example on db<>fiddle,the good query result count() with 10 occurences have to return 118 rows.
select
*
from
t1
where
category in (
SELECT
category
FROM
t1
GROUP BY
category
HAVING
COUNT(category) >= 30
)
and
groupe in (
SELECT
groupe
FROM
t1
GROUP BY
groupe
HAVING
COUNT(groupe) >= 30
)
and
subject in (
SELECT
subject
FROM
t1
GROUP BY
subject
HAVING
COUNT(subject) >= 30
)
This query return intersection on ID where category,groupe and subject have 30 occurrences on values, but this intersection reduce the result count...
this means certain category values count could be reduce to a number less than 30.
for resume,i need 30 occurences in the intersection result.
I think I need to do a recursive filter and have to repeat the loop until input rows is equal to output rows.. But I don't know how to do that... An idea?
Thanks 😊
Add some DISTINCT's, while grouping on the 3 columns.
select *
from dataset t
where t.category in (SELECT distinct category FROM dataset GROUP BY category, groupe, subject HAVING COUNT(*) >= 30)
and t.groupe in (SELECT distinct groupe FROM dataset GROUP BY category, groupe, subject HAVING COUNT(*) >= 30)
and t.subject in (SELECT distinct subject FROM dataset GROUP BY category, groupe, subject HAVING COUNT(*) >= 30)
A test on db<>fiddle here
For reference sake, this query will only select those with a tupple that occurs 30 times or more.
Which will naturally be less that the query above.
SELECT *
FROM dataset
WHERE (category, groupe, subject) IN (
SELECT category, groupe, subject
FROM dataset
GROUP BY category, groupe, subject
HAVING COUNT(*) >= 30
)
Pro tip: This is a case where describing your requirement takes a lot of thought. As you think about it, think of SQL as a processor of sets of rows. It is always worthwhile to describe the requirement as carefully as you can, especially when it is as tricky as this one. Often it's helpful to describe the problem domain, rather than just talking about columns and values.
I guess you need the sets of rows meeting your three different criteria (more than x duplicates). You can use a set of id values for those rows because they are apparently a primary key (unique).
Here's one set of IDs
SELECT id FROM dataset WHERE category IN (
SELECT category FROM dataset GROUP BY category HAVING COUNT(*) >= 5))
I believe you need all the rows lying in the intersection of those three sets. That is, you want any rows having all three items recurring frequently. You can get that with
id IN set1 AND id IN set2 AND id IN set3
If you need the union of those sets you can use this instead. This gives you the rows with any of the three items recurring frequently.
id IN set1 OR id IN set2 OR id IN set3
So here's the query.
SELECT *
FROM dataset
WHERE id IN (
SELECT id FROM dataset WHERE category IN (
SELECT category FROM dataset GROUP BY category HAVING COUNT(*) >= 5))
AND id IN (
SELECT id FROM dataset WHERE groupe IN (
SELECT groupe FROM dataset GROUP BY groupe HAVING COUNT(*) >= 5))
AND id IN (
SELECT id FROM dataset WHERE subject IN (
SELECT subject FROM dataset GROUP BY subject HAVING COUNT(*) >= 5))
I used 5 for the repeat threshold. You can use another number.
If you want your result set to contain only those rows with at least ten items in the result set, rather than in the dataset, you would use this query.
select d.*
from dataset d
join (
select count(*), groupe, category, subject
from dataset
group by groupe, category, subject
having count(*) >= 10
) e ON d.groupe=e.groupe AND d.category = e.category AND d.subject = e.subject
Based on an example already given, I would like to ask my further question.
MySQL: Count occurrences of distinct values
example db
id name
----- ------
1 Mark
2 Mike
3 Paul
4 Mike
5 Mike
6 John
7 Mark
expected result
name count
----- -----
Mark 2
Mike 3
Paul 1
Mike 3
Mike 3
John 1
Mark 2
In my opinion 'GROUP BY' doesn't help.
Thank you very much.
Simplest approach would be using Count() as Window Function over a partition of name; but they are available only in MySQL 8.0.2 and onwards.
However, another approach is possible using a Derived Table. In a sub-select query (Derived Table), we will identify the counts for each unique name. Now, we simply need to join this to the main table, to show counts against each name (while not doing a grouping on them):
SELECT
t1.name,
dt.total_count
FROM your_table AS t1
JOIN
(
SELECT name,
COUNT(*) AS total_count
FROM your_table
GROUP BY name
) AS dt ON dt.name = t1.name
ORDER BY t1.id
If MySQL 8.0.2+ is available, the solution would be less verbose:
SELECT
name,
COUNT(*) OVER (PARTITION BY name) AS total_count
FROM your_table
I can't seem to find a suitable solution for the following (probably an age old) problem so hoping someone can shed some light. I need to return 1 distinct column along with other non distinct columns in mySQL.
I have the following table in mySQL:
id name destination rating country
----------------------------------------------------
1 James Barbados 5 WI
2 Andrew Antigua 6 WI
3 James Barbados 3 WI
4 Declan Trinidad 2 WI
5 Steve Barbados 4 WI
6 Declan Trinidad 3 WI
I would like SQL statement to return the DISTINCT name along with the destination, rating based on country.
id name destination rating country
----------------------------------------------------
1 James Barbados 5 WI
2 Andrew Antigua 6 WI
4 Declan Trinidad 2 WI
5 Steve Barbados 4 WI
As you can see, James and Declan have different ratings, but the same name, so they are returned only once.
The following query returns all rows because the ratings are different. Is there anyway I can return the above result set?
SELECT (distinct name), destination, rating
FROM table
WHERE country = 'WI'
ORDER BY id
Using a subquery, you can get the highest id for each name, then select the rest of the rows based on that:
SELECT * FROM table
WHERE id IN (
SELECT MAX(id) FROM table GROUP BY name
)
If you'd prefer, use MIN(id) to get the first record for each name instead of the last.
It can also be done with an INNER JOIN against the subquery. For this purpose the performance should be similar, and sometimes you need to join on two columns from the subquery.
SELECT
table.*
FROM
table
INNER JOIN (
SELECT MAX(id) AS id FROM table GROUP BY name
) maxid ON table.id = maxid.id
The problem is that distinct works across the entire return set and not just the first field. Otherwise MySQL wouldn't know what record to return. So, you want to have some sort of group function on rating, whether MAX, MIN, GROUP_CONCAT, AVG, or several other functions.
Michael has already posted a good answer, so I'm not going to re-write the query.
I agree with #rcdmk . Using a DEPENDENT subquery can kill performance, GROUP BY seems more suitable provided that you have already INDEXed the country field and only a few rows will reach the server. Rewriting the query giben by #rcdmk , I added the ORDER BY NULL clause to suppress the implicit ordering by GROUP BY, to make it a little faster:
SELECT MIN(id) as id, name, destination as rating, country
FROM table WHERE country = 'WI'
GROUP BY name, destination ORDER BY NULL
You can do a GROUP BY clause:
SELECT MIN(id) AS id, name, destination, AVG(rating) AS rating, country
FROM TABLE_NAME
GROUP BY name, destination, country
This query would perform better in large datasets than the subquery alternatives and it can be easier to read as well.
This is a doubt on mysql select query
let me axplain my doubt with a simple example
consider this is my query
SELECT dbCountry from tableCountry
tableCountry has fields dbCuntryId, dbCountry and dbState
I have the result as
dbCountry
india
america
england
kenya
pakisthan
I need the result as
1 india
2 america
3 england
4 kenya
5 pakisthan
the numbers 12345 must be generated with the increase in data and it is not an autoincrement id.
How can i get it
is it something like loop
You can try this:
SELECT dbCountry,
(SELECT COUNT(*) FROM tableCountry t2 WHERE t2.dbCountry <= t1.dbCountry)
AS RowNum
FROM tableCountry t1
ORDER BY dbCountry
The following should do what you need. It uses a variable that is incremented and returned for each row:
SELECT
#rownum:=#rownum+1 number,
c.dbCountry
FROM
tableCountry c,
(SELECT #rownum:=0) r
If you want the result to always be in the same order you'll need to add an order by constraint to the query, for example, ORDER BY c.dbCountry to order by the country name.