How to find the distinct of one column based on other columns - mysql

I have a data frame like below
col1 col2 col3
A Z 10
A Y 8
A Z 15
B X 11
B Z 7
C Y 10
D Z 11
D Y 14
D L 16
I have to select, for each distinct col1 which of the col2 have max(col3)
Output data frame should look like,
col1 col2 col3
A Z 15
B X 11
C Y 10
D L 16
How to do this either in R or in SQL
Thanks in advance

We can use data.table. We convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'col1', we subset the data.table (.SD) based on the index of max value of 'col3'
library(data.table)
setDT(df1)[, .SD[which.max(col3)], col1]
# col1 col2 col3
#1: A Z 15
#2: B X 11
#3: C Y 10
#4: D L 16
Or we can use top_n from dplyr after grouping by 'col1'.
library(dplyr)
df1 %>%
group_by(col1) %>%
top_n(1)

SQL answer:
Use NOT EXISTS to return a row if there are no other row with same col1 value that has a higher col3 value.
select *
from tablename t1
where not exists (select 1 from tablename t2
where t2.col1 = t1.col1
and t2.col3 > t1.col3)
Will return both rows for a col1 if there's a max(c3) tie.

Another way of doing in MySQL.
Here is the SQLFiddle Demo
Output : =>
SELECT T1.*
FROM
table_name T1
INNER JOIN
(SELECT col1,MAX(col3) AS Max_col3 FROM table_name GROUP BY col1) T2
ON T1.`col1` = T2.`col1` and T2.`Max_col3`=t1.`col3`
Hope this helps.

Related

Extract tuples with specified common values in another column in SQL

I have a dataset that look like:
Col1 Col2
1 ABC
2 DEF
3 ABC
1 DEF
Expected output:
Col1 Col2
1 ABC
1 DEF
I want to extract only those IDSs from Col1 which have both values ABC and DEF in the column.
I tried the self-join in SQL but that did not give me the expected result.
SELECT DISTINCT Col1
FROM db A, db B
WHERE A.ID <> B.ID
AND A.Col2 = 'ABC'
AND B.Col2 = 'DEF'
GROUP BY A.Col1
Also, I tried to the same thing in R using the following code:
vc <- c("ABC", "DEF")
data1 <- db[db$Col2 %in% vc,]
Again, I did not get the desired output. Thanks for all the pointers in advance.
In R, you could do
library(dplyr)
df %>%
group_by(Col1) %>%
filter(all(vc %in% Col2))
# Col1 Col2
# <int> <fct>
#1 1 ABC
#2 1 DEF
The Base R equivalent of that would be
df[as.logical(with(df, ave(Col2, Col1, FUN = function(x) all(vc %in% x)))), ]
# Col1 Col2
#1 1 ABC
#4 1 DEF
We select the groups which has all of vc in them.
Here is your current query corrected:
SELECT DISTINCT t1.Col1
FROM yourTable t1
INNER JOIN yourTable t2
ON t1.Col1 = t2.Col1
WHERE t1.Col2 = 'ABC' AND t2.Col2 = 'DEF';
Demo
The join condition is that both Col1 values are the same, the first Col2 value is ABC and the second Col2 value is DEF.
But, I would probably use the following canonical approach to this:
SELECT Col1
FROM yourTable
WHERE Col2 IN ('ABC', 'DEF')
GROUP BY Col1
HAVING MIN(Col2) <> MAX(Col2);
In R, we can also use data.table
library(data.table)
setDT(df)[, .SD[all(vc %in% Col2)], by = col1]
Use correlated subquery:
select * from tablename t
where exists (select 1 from tablename t1 where t1.col1=t.col1 and col2 in ('ABC','DEF')
group by col1 having count(distinct col2)=2)
Here's a way using group_concat
select t.Col1,t.col2
from t
join
(
select col1,group_concat(distinct col2 order by col2) gc
from t
group by col1 having gc = 'abc,def'
) s
on s.col1 = t.col1;
+------+------+
| Col1 | col2 |
+------+------+
| 1 | ABC |
| 1 | DEF |
+------+------+
2 rows in set (0.16 sec)
But you do have to understand the order that col2 will be in

SQL Select rows where col1 or col2 equals variable

So I want to select rows from table where col1 or col2 equals to variable, but if there is already row selected where col1 equals to variable (variable X) and col2 is anything else (variable Y) then it won't select another row where col2 equals to variable X and col1 equals to that variable Y. Everything ordered by column TIME descending.
Let's say this is my table:
COL1 COL2 TIME COL4
1 2 0 A
1 2 1 B
2 1 2 C
1 3 3 D
3 1 4 E
4 2 5 F
3 4 6 G
1 2 7 H
4 1 8 I
And let's say that variable X equals to 1, then I want to have these rows:
COL1 COL2 TIME COL4
4 1 8 I
1 2 7 H
3 1 4 E
So it won't show me this row
COL1 COL2 TIME COL4
2 1 2 C
because there is already a combination where col1/col2 is 2/1 or 1/2.
Sorry if I explained it in a bad way, but I can't think of better explanation.
Thank you guys.
Making a couple of key assumptions...
SELECT a.*
FROM my_table a
JOIN
( SELECT MAX(time) time
FROM my_table
WHERE 1 IN (COL1,COL2)
GROUP
BY LEAST(col1,col2)
, GREATEST(col1,col2)
) b
ON b.time = a.time;
EDIT: I posted this answer when it was thought that OP's database was SQL Server. But as it turns out, the database is MySQL.
I think this query should do it:
select t.col1, t.col2, t.time, t.col4
from (select t.*,
row_number() over (
partition by
case when col1 < col2 then col1 else col2 end,
case when col1 < col2 then col2 else col1 end
order by time desc) as rn
from tbl t
where t.col1 = x or t.col2 = x) t
where t.rn = 1
order by t.time desc
The key part is defining the row_number partition by clause in such a way that (1, 2) is considered equivalent to (2, 1), which is what the case statements do. Once the partitioning works correctly, you just need to keep the first row of every "partition" (where t.rn = 1) to exclude duplicate rows.

select a field values that are not found in another table

I need a query which should results a fields that are not found in another table. Let say,
table 1:
comp col1 col2 col3 col4
----------------------------------
nam1 1 1 b c
nam2 0 0 abc c
nam3 1 1 a c
nam4 1 1 b c
nam5 0 0 c c
table2:
name col1 col2
----------------------
b 3 f
a 4 f
c 5 f
result:
comp col3 col4
----------------------
nam2 abc c
the result be based on col3,col1=0,col2=0 on first table and name in second table..abc is not found in table2...
thanks in advance.
The question is not very clear, but if I understood you correctly, you want to select all rows from table1 where col1 and col2 are both 0, and col3 is not contained in the name column of table2. If so, your query should be simple enough:
SELECT comp, col3, col4
FROM table1
WHERE col1 = 0
AND col2 = 0
AND col3 NOT IN (SELECT name FROM table2);
As per the explanation the joining key between table1 and table2 is
table1.col3 = table2.name
Using this you can use the following query
select
t1.comp,
t1.col3,
t1.col4
from table1 t1
left join table2 t2 on t2.name = t1.col3
where t2.name is null

MySQL select unique records on two columns

How do I select rows where two columns are unique?
Given table
id col1 col2
1 a 222
2 b 223
3 c 224
4 d 224
5 b 225
6 e 226
How do remove the duplicates in col1 and the duplicates in col2, to get rows unique to whole table,
So that result is
id col1 col2
1 a 222
6 e 226
Is there a better way than using sub queries?
SELECT * FROM table WHERE id
IN (SELECT id FROM table WHERE col1
IN (SELECT col1 FROM table GROUP BY col1 HAVING(COUNT(col1)=1))
GROUP BY col2 HAVING(COUNT(col2)=1))
This should work using exists:
select *
from yourtable y
where not exists (
select 1
from yourtable y2
where y.id != y2.id
and (y.col1 = y2.col1
or y.col2 = y2.col2))
SQL Fiddle Demo
Here's an alternative solution using an outer join as I've read mysql sometimes doesn't do well with exists:
select *
from yourtable y
left join yourtable y2 on y.id != y2.id
and (y.col1 = y2.col1
or y.col2 = y2.col2)
where y2.id is null;
More Fiddle
You can also do this by aggregating along each dimension:
select t.*
from table t join
(select col1
from table t
group by col1
having count(*) = 1
) t1
on t.col1 = t1.col1 join
(select col2
from table t
group by col2
having count(*) = 1
) t2
on t.col2 = t2.col2;
This method seems like a very direct translation of the user requirements.

Splitting a result into three columns

Say I have a table 'alphabet'. This is just a basic representation/example.
id word
1 a
2 b
3 c
4 d
5 e
6 f
7 g
8 h
9 i
10 j
11 k
12 l
13 m
Now assume I am restricted to just a single query (with subqueries) due to a language restriction or otherwise.
I want my 'result' to be as follows:
row col1 col2 col3
1 a b c
2 d e f
3 g h i
4 j k l
5 m
Now I've gotten somewhat close to this by emulating a Full Outer Join in MySQL by following the instructions found here: Full Outer Join in MySQL combined with a sub-query on the same table using something along the lines of:
SELECT id,word FROM table WHERE MOD(id,3)=1
This isn't particularly perfect, as it requires me to assume that the ids follow each-other perfectly sequentially, but I haven't been able to think of a better method at the time. Since last I recall, LIMIT and OFFSET do not take sub-queries.
However, following this thought through, results into something along the lines of:
row col1 col2 col3
1 a
2 d
3 g
4 j
5 m
6 b
7 e
8 h
9 k
10 c
11 f
12 i
13 l
13 m
Is there a way to get my desired format?
And note that normally, the desired way to do this is indeed to just do three calls with a limit-offset call based on a count(). But /is this possible/ to be done in a single call?
I doesn't found any use case for this, but it is what you want:
SELECT
FLOOR((id - 1)/3) + 1 id,
MAX(CASE WHEN MOD(id - 1,3) = 0 THEN word END) col1,
MAX(CASE WHEN MOD(id - 1,3) = 1 THEN word END) col2,
MAX(CASE WHEN MOD(id - 1,3) = 2 THEN word END) col3
FROM tbl
GROUP BY FLOOR((id - 1)/3)
SQLFIDDLE DEMO
Notice, that this will work only in case when you have sequential Id starting from 1.
Is this what you need?
SELECT FLOOR((col1.id - 1) / 3 + 1) AS id, col1.word AS col1, col2.word AS col2, col3.word AS col3
FROM alphabet col1
LEFT JOIN alphabet col2 ON col1.id = col2.id - 1
LEFT JOIN alphabet col3 ON col2.id = col3.id - 1
WHERE col1.id % 3 = 1;
How about something like
Select t1.id as `row`, t1.word as col1, t2.word as col2, t3.word as col3
From alphabet t1
left join alphabet t2 on t2.id = t1.id + 5
left join alphabet t3 on t3.id = t1.id + 10
Where t1.id <= 5
Taking Halmet Hakobyan's answer, finishing this off:
SELECT
FLOOR((rank - 1)/3) + 1 rank,
MAX(CASE WHEN MOD(rank - 1,3) = 0 THEN word END) col1,
MAX(CASE WHEN MOD(rank - 1,3) = 1 THEN word END) col2,
MAX(CASE WHEN MOD(rank - 1,3) = 2 THEN word END) col3
FROM (SELECT #rn:=#rn+1 AS rank, `id`,`word` from tbl) as tbl, (SELECT #rn:=0) t2
GROUP BY FLOOR((rank - 1)/3)
SQLFIDDLE DEMO
This will work even if the ids are not in sequence.