I have a dataset that look like:
Col1 Col2
1 ABC
2 DEF
3 ABC
1 DEF
Expected output:
Col1 Col2
1 ABC
1 DEF
I want to extract only those IDSs from Col1 which have both values ABC and DEF in the column.
I tried the self-join in SQL but that did not give me the expected result.
SELECT DISTINCT Col1
FROM db A, db B
WHERE A.ID <> B.ID
AND A.Col2 = 'ABC'
AND B.Col2 = 'DEF'
GROUP BY A.Col1
Also, I tried to the same thing in R using the following code:
vc <- c("ABC", "DEF")
data1 <- db[db$Col2 %in% vc,]
Again, I did not get the desired output. Thanks for all the pointers in advance.
In R, you could do
library(dplyr)
df %>%
group_by(Col1) %>%
filter(all(vc %in% Col2))
# Col1 Col2
# <int> <fct>
#1 1 ABC
#2 1 DEF
The Base R equivalent of that would be
df[as.logical(with(df, ave(Col2, Col1, FUN = function(x) all(vc %in% x)))), ]
# Col1 Col2
#1 1 ABC
#4 1 DEF
We select the groups which has all of vc in them.
Here is your current query corrected:
SELECT DISTINCT t1.Col1
FROM yourTable t1
INNER JOIN yourTable t2
ON t1.Col1 = t2.Col1
WHERE t1.Col2 = 'ABC' AND t2.Col2 = 'DEF';
Demo
The join condition is that both Col1 values are the same, the first Col2 value is ABC and the second Col2 value is DEF.
But, I would probably use the following canonical approach to this:
SELECT Col1
FROM yourTable
WHERE Col2 IN ('ABC', 'DEF')
GROUP BY Col1
HAVING MIN(Col2) <> MAX(Col2);
In R, we can also use data.table
library(data.table)
setDT(df)[, .SD[all(vc %in% Col2)], by = col1]
Use correlated subquery:
select * from tablename t
where exists (select 1 from tablename t1 where t1.col1=t.col1 and col2 in ('ABC','DEF')
group by col1 having count(distinct col2)=2)
Here's a way using group_concat
select t.Col1,t.col2
from t
join
(
select col1,group_concat(distinct col2 order by col2) gc
from t
group by col1 having gc = 'abc,def'
) s
on s.col1 = t.col1;
+------+------+
| Col1 | col2 |
+------+------+
| 1 | ABC |
| 1 | DEF |
+------+------+
2 rows in set (0.16 sec)
But you do have to understand the order that col2 will be in
Related
I have a data frame like below
col1 col2 col3
A Z 10
A Y 8
A Z 15
B X 11
B Z 7
C Y 10
D Z 11
D Y 14
D L 16
I have to select, for each distinct col1 which of the col2 have max(col3)
Output data frame should look like,
col1 col2 col3
A Z 15
B X 11
C Y 10
D L 16
How to do this either in R or in SQL
Thanks in advance
We can use data.table. We convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'col1', we subset the data.table (.SD) based on the index of max value of 'col3'
library(data.table)
setDT(df1)[, .SD[which.max(col3)], col1]
# col1 col2 col3
#1: A Z 15
#2: B X 11
#3: C Y 10
#4: D L 16
Or we can use top_n from dplyr after grouping by 'col1'.
library(dplyr)
df1 %>%
group_by(col1) %>%
top_n(1)
SQL answer:
Use NOT EXISTS to return a row if there are no other row with same col1 value that has a higher col3 value.
select *
from tablename t1
where not exists (select 1 from tablename t2
where t2.col1 = t1.col1
and t2.col3 > t1.col3)
Will return both rows for a col1 if there's a max(c3) tie.
Another way of doing in MySQL.
Here is the SQLFiddle Demo
Output : =>
SELECT T1.*
FROM
table_name T1
INNER JOIN
(SELECT col1,MAX(col3) AS Max_col3 FROM table_name GROUP BY col1) T2
ON T1.`col1` = T2.`col1` and T2.`Max_col3`=t1.`col3`
Hope this helps.
I need a query which should results a fields that are not found in another table. Let say,
table 1:
comp col1 col2 col3 col4
----------------------------------
nam1 1 1 b c
nam2 0 0 abc c
nam3 1 1 a c
nam4 1 1 b c
nam5 0 0 c c
table2:
name col1 col2
----------------------
b 3 f
a 4 f
c 5 f
result:
comp col3 col4
----------------------
nam2 abc c
the result be based on col3,col1=0,col2=0 on first table and name in second table..abc is not found in table2...
thanks in advance.
The question is not very clear, but if I understood you correctly, you want to select all rows from table1 where col1 and col2 are both 0, and col3 is not contained in the name column of table2. If so, your query should be simple enough:
SELECT comp, col3, col4
FROM table1
WHERE col1 = 0
AND col2 = 0
AND col3 NOT IN (SELECT name FROM table2);
As per the explanation the joining key between table1 and table2 is
table1.col3 = table2.name
Using this you can use the following query
select
t1.comp,
t1.col3,
t1.col4
from table1 t1
left join table2 t2 on t2.name = t1.col3
where t2.name is null
How do I select rows where two columns are unique?
Given table
id col1 col2
1 a 222
2 b 223
3 c 224
4 d 224
5 b 225
6 e 226
How do remove the duplicates in col1 and the duplicates in col2, to get rows unique to whole table,
So that result is
id col1 col2
1 a 222
6 e 226
Is there a better way than using sub queries?
SELECT * FROM table WHERE id
IN (SELECT id FROM table WHERE col1
IN (SELECT col1 FROM table GROUP BY col1 HAVING(COUNT(col1)=1))
GROUP BY col2 HAVING(COUNT(col2)=1))
This should work using exists:
select *
from yourtable y
where not exists (
select 1
from yourtable y2
where y.id != y2.id
and (y.col1 = y2.col1
or y.col2 = y2.col2))
SQL Fiddle Demo
Here's an alternative solution using an outer join as I've read mysql sometimes doesn't do well with exists:
select *
from yourtable y
left join yourtable y2 on y.id != y2.id
and (y.col1 = y2.col1
or y.col2 = y2.col2)
where y2.id is null;
More Fiddle
You can also do this by aggregating along each dimension:
select t.*
from table t join
(select col1
from table t
group by col1
having count(*) = 1
) t1
on t.col1 = t1.col1 join
(select col2
from table t
group by col2
having count(*) = 1
) t2
on t.col2 = t2.col2;
This method seems like a very direct translation of the user requirements.
I have a problem: I have two tables
Table1 which has two columns
Col1 Col2
---- ------
a value1
b value1
b value1
And Table2
Col1 Col2
---- ------
1 a,b
2 a,c
3 a,b,c
I want result
Col1 Col2
----- -----
a 1,2,3
b 1,3
c 2,3
WITH C AS
(
SELECT T2.Col1,
S.Item
FROM Table2 AS T2
CROSS APPLY dbo.SplitStrings(T2.Col2, ',') AS S
)
SELECT C1.Item AS Col1,
(
SELECT ','+CAST(C2.Col1 AS VARCHAR(10))
FROM C AS C2
WHERE C1.Item = C2.Item
ORDER BY C2.Col1
FOR XML PATH(''), TYPE
).value('substring(text()[1], 2)', 'VARCHAR(MAX)') AS Col2
FROM C AS C1
GROUP BY C1.Item
SQL Fiddle
Try this:
NOTE: Not tested
select col1, [col2],
(select col1+',' from Table2 where Col2=ID
group by col1 for xml path('')) AS Col2
From Table1
i have a denormalized table, where i have to count the number of same values in other columns.
I'm using the InfiniDB Mysql Storage Engine.
This is my Table:
col1 | col2 | col3
------------------
A | B | B
A | B | C
A | A | A
This is what i expect:
col1Values | col2Values | col3Values
------------------------------------
1 | 2 | 2 -- Because B is in Col2 and Col3
1 | 1 | 1
3 | 3 | 3
Is there something like
-- function count_values(needle, haystack1, ...haystackN)
select count_values(col1, col1, col2, col3) as col1values -- col1 is needle
, count_values(col2, col1, col2, col3) as col2values -- col2 is needle
, count_values(col3, col1, col2, col3) as col3values -- col3 is needle
from table
or am i missing something simple that will do the trick? :-)
Thanks in advance
Roman
select
CASE WHEN col1 = col2 and col1=col3 THEN '3'
WHEN col1 = col2 or col1=col3 THEN '2'
WHEN col1 != col2 and col1!=col3 THEN '1'
ELSE '0' END AS col1_values,
CASE WHEN col2 = col1 and col2=col3 THEN '3'
WHEN col2 = col1 or col2=col3 THEN '2'
WHEN col2 != col1 and col2!=col3 THEN '1'
ELSE '0' END AS col2_values,
CASE WHEN col3 = col1 and col3=col2 THEN '3'
WHEN col3 = col1 or col3=col2 THEN '2'
WHEN col3 != col1 and col3!=col2 THEN '1'
ELSE '0' END AS col3_values
FROM table_name
fiddle demo
Assuming the table has got a key, you could:
Unpivot the table.
Join the unpivoted dataset back to the original.
For every column in the original, count matches against the unpivoted column.
Here's how the above could be implemented:
SELECT
COUNT(t.col1 = s.col OR NULL) AS col1Values,
COUNT(t.col2 = s.col OR NULL) AS col2Values,
COUNT(t.col3 = s.col OR NULL) AS col3Values
FROM atable t
INNER JOIN (
SELECT
t.id,
CASE colind
WHEN 1 THEN t.col1
WHEN 2 THEN t.col2
WHEN 3 THEN t.col3
END AS col
FROM atable t
CROSS JOIN (SELECT 1 AS colind UNION ALL SELECT 2 UNION ALL SELECT 3) x
) s ON t.id = s.id
GROUP BY t.id
;
The subquery uses a cross join to unpivot the table. The id column is a key column. The OR NULL bit is explained in this answer.
I have found a different, very very simple solution :-)
select if(col1=col1,1,0) + if(col2=col1,1,0) + if(col3=col1,1,0) as col1values -- col1 is needle
from table