I have a table that has columns Name, Series and Season.
Name
series
season
abc
alpha
s1
abc
alpha
s2
pqr
alpha
s1
xyz
beta
s2
xyz
gamma
s3
abc
theta
s1
I am trying to extract the number of people who have watched only the series 'alpha', and not any other series.
How to get this count?
On giving the "where series='alpha' " condition, I get the counts of people who watched alpha, but not the counts of those who watched only alpha eg: abc has watched alpha as well as theta, but pqr has watched only alpha.
You could use a subquery to get only the names which have watched only distinct series and then filter in the where condition your specific serie
select count(yt.name) as only_alpha
from yourtable yt
inner join ( select name
from yourtable
group by name
having count(distinct series) = 1
) yt1 on yt.name=yt1.name
where yt.series='alpha';
https://dbfiddle.uk/n0PavP4H
You can use subquery, like this:
SELECT COUNT(DISTINCT name)
FROM t
WHERE name NOT IN (
SELECT DISTINCT name
FROM t
WHERE series<>'alpha'
) AND series='alpha'
If you really want to get the number of such people only, without their name or any further information, you can use a NOT EXISTS clause like this:
SELECT COUNT(*) AS desiredColumnName
FROM yourtable y
WHERE series = 'alpha'
AND NOT EXISTS
(SELECT 1 FROM yourtable y1 WHERE y.name = y1.name AND series <> 'alpha');
Thus, you can set your condition the person must not appear in any other series but the 'alpha' one.
If the same name can occur in different seasons, you can add a DISTINCT to count them only once. This should only be done if really required because it can slow down the query:
SELECT COUNT(DISTINCT name) AS desiredColumnName
FROM yourtable y
WHERE series = 'alpha'
AND NOT EXISTS
(SELECT 1 FROM yourtable y1 WHERE y.name = y1.name AND series <> 'alpha');
If your description is incorrect and you need also other information, you might do better with a GROUP BY clause etc.
Try out here: db<>fiddle
You can use like below
select sum(Record) as Count from (select count() as Record from yourtable where series='alpha'
group by series,name having count()=1) as data
Check below link
https://dbfiddle.uk/1Y3WZT23
I added some new cases to your table to see other anomalies that can happen. This is how it looks like now:
Name
series
season
abc
alpha
s1
abc
alpha
s2
abc
theta
s1
fgh
alpha
s1
fgh
alpha
s2
klj
gamma
s1
klj
gamma
s2
klj
gamma
s3
pqr
alpha
s1
xyz
beta
s2
xyz
gamma
s3
I maybe overcomplicated, but it can provide the correct result for your problem; you just need to COUNT() it. I tested other SQL queries under your question on my this table, but not all of them showed the correct figure. I doesn't recommend to use NOT IN ( sub query ) for the following reasons:
Strange results when running SQL IN and NOT IN using Redshift
Optimization for Large IN Lists
Consider using EXISTS instead of IN with a subquery
Please find my code here:
WITH helper_table as
(
SELECT "Name",
series,
count(1) as seasons_watched
FROM your_table
GROUP BY 1, 2
)
------------------------------------------------------------
SELECT t1."Name"
FROM
(
SELECT "Name",
count(1) as is_watched_at_least_one_series_of_alpha
FROM helper_table
WHERE series = 'alpha'
GROUP BY 1
) t1
INNER JOIN
(
SELECT "Name",
count(1) as watched_exactly_one_series_so_far
FROM helper_table
GROUP BY 1
HAVING count(1) = 1
) t2
ON t2."Name" = t1."Name"
;
Hope it helps!
Related
(This question is specific to MySQL 5.6, which does not include CTEs)
Let's say I have a table like this (actual table is made from a subquery with several joins and is a fair bit more complex):
ID NAME POWER ACCESS_LEVEL DEPTH
1 Odin poetry 1 0
1 Isis song 2 1
2 Enki water 1 0
2 Zeus storms 2 2
2 Thor hammer 2 3
I want to first group them up by ID (it's actually a double grouping by PRINCIPAL_TYPE, PRINCIPAL_ID if that matters), then select one row from each group, preferring the row with the highest ACCESS_LEVEL, and among rows with the same access, choosing the one with the lowest depth. No two rows will have the same (ID, DEPTH) so we don't need to worry beyond that point.
For example with the above table, we select:
ID NAME POWER ACCESS_LEVEL DEPTH
1 Isis song 2 1
2 Zeus storms 2 2
The groups are (Odin, Isis) and (Enki, Thor, Zeus). In the first group, we prefer Odin over Isis because Odin has a higher ACCESS_LEVEL. In the second group, we take Zeus because Zeus and Thor have higher ACCESS_LEVELs than Enki, and between those two, Zeus has the lower depth.
Worst case Ontario, I can do it at the application level, but doing it at the database level allows for using LIMIT and SORT BY to do paging, instead of fetching the whole result set.
Here is one method that uses a correlated subquery:
select t.*
from t
where (access_level, depth) = (select access_level, depth
from t t2
where t2.id = t.id
order by access_level desc, depth asc
limit 1
);
Could be you need some grouped table based on subquery
select *
my_table m2
inner join (
select id, t.level, min(depth) depth
from my_table m
inner join (
select id, max(ACCESS_LEVEL) level
from my_table
group by id ) t on t.id= m.id and t.level = m.access_level
group by id level ) t2 on t2.id = m2.id
and t2.depth = m2.depth and t2.level = m2.access_level
This may work.
select *
from your_table t1
where (t1.access_level, t1.depth) = (
select t2.access_level, min(t2.depth)
from your_table t2
where (t2.access_level, t2.id) = (
select max(t3.access_level), t3.id
from your_table t3
where t3.id = t1.id
)
)
However, I feel that Gordon's solution will probably lead to a better query plan.
Here is my testing table data:
Testing
ID Name Payment_Date Fee Amt
1 BankA 2016-04-01 100 20000
2 BankB 2016-04-02 200 10000
3 BankA 2016-04-03 100 20000
4 BankB 2016-04-04 300 20000
I am trying to compare fields Name, Fee and Amt of each data records to see whether there are the same values or not. If they got the same value, I'd like to mark something like 'Y' to those record. Here is the expected result
ID Name Payment_Date Fee Amt SameDataExistYN
1 BankA 2016-04-01 100 20000 Y
2 BankB 2016-04-02 200 10000 N
3 BankA 2016-04-03 100 20000 Y
4 BankB 2016-04-04 300 20000 N
I have tried these two methods below. but I am looking for any other solutions so I can pick out the best one for my work.
Method 1.
select t.*, iif((select count(*) from testing where name=t.name and fee=t.fee and amt=t.amt)=1,'N','Y') as SameDataExistYN from testing t
Method 2.
select t.*, case when ((b.Name = t.Name)
and (b.Fee = t.Fee) and (b.Amt = t.Amt)) then 'Y' else 'N' end as SameDataExistYN
from testing t
left join ( select Name, Fee, Amt
from testing
Group By Name, Fee, Amt
Having count(*)>1 ) as b on b.Name = t.Name
and b.Fee = t.Fee
and b.Amt = t.Amt
There are several approaches, with differences in performance characteristics.
One option is to run a correlated subquery. This approach is best suited if you have a suitable index, and you are pulling a relatively small number of rows.
SELECT t.id
, t.name
, t.payment_date
, t.fee
, t.amt
, ( SELECT 'Y'
FROM testing s
WHERE s.name = t.name
AND s.fee = t.fee
AND s.amt = t.amt
AND s.id <> t.id
LIMIT 1
) AS SameDataExist
FROM testing t
WHERE ...
LIMIT ...
The correlated subquery in the SELECT list will return a Y when there is at least one "matching" row found. If no "matching" row is found, SameDataExist column will have a value of NULL. To convert the NULL to an 'N', you could wrap the subquery in an IFULL() function.
Your method 2 is a workable approach. The expression in the SELECT list doesn't need to do all those comparisons, those have already been done in the join predicates. All you need to know is whether a matching row was found... just testing one of the columns for NULL/NOT NULL is sufficient.
SELECT t.id
, t.name
, t.payment_date
, t.fee
, t.amt
, IF(s.name IS NOT NULL,'Y','N') AS SameDataExists
FROM testing t
LEFT
JOIN ( -- tuples that occur in more than one row
SELECT r.name, r.fee, r.amt
FROM testing r
GROUP BY r.name, r.fee, r.amt
HAVING COUNT(1) > 1
) s
ON s.name = t.name
AND s.fee = t.fee
AND s.amt = t.amt
WHERE ...
You could also make use of an EXISTS (correlated subquery)
Check this out
Select statement to find duplicates on certain fields
Not sure how to mark this as a dupe...
Here is another method, but I think you have to run tests on your data to find out which is best:
SELECT
t.*,
CASE WHEN EXISTS(
SELECT * FROM testing WHERE id <> t.id AND Name = t.Name AND Fee = t.Fee AND Amt = t.Amt
) THEN 'Y' ELSE 'N' END SameDataExistYN
FROM
testing t
;
Select t.name ,t.fee,t.amt,if(count(*)>1),'Y','N') from testing t group by t.name,t.fee,t.amt
I am facing a problem with MySQL query which is a variant of "Id for row with max value". I am either getting error or incorrect result for all my trials.
Here is the table structure
Row_id
Group_id
Grp_col1
Grp_col2
Field_for_aggregate_func
Another_field_for_row
For all rows with a particular group_id, I want to group by fields Grp_col1, Grp_col2 then get max value of Field_for_aggregate_func and then corresponding value of Another_field_for_row.
Query I have tried is like below
SELECT c.*
FROM mytable as c left outer join mytable as c1
on (
c.group_id=c1.group_id and
c.Grp_col1 = c1.Grp_col1 and
c.Grp_col2 = c1.Grp_col2 and
c.Field_for_aggregate_func > c1.Field_for_aggregate_func
)
where c.group_id=2
Among alternative solutions for this problem I want a high performance solution as this will be used for large set of data.
EDIT: Here is the sample set of row and expected answer
Group_ID Grp_col1 Grp_col2 Field_for_aggregate_func Another_field_for_row
2 -- N 12/31/2015 35
2 -- N 1/31/2016 15 select 15 from group for max value 1/31/2016
2 -- Y 12/31/2015 5
2 -- Y 1/1/2016 15
2 -- Y 1/2/2016 25
2 -- Y 1/3/2016 30 select 30 from group for max value 1/3/2016
You can use a sub-query to find the maximums, then join that with the original table, along the lines of:
select m1.group_id, m1.grp_col1, m1.grp_col2, m1.another_field_for_row, max_value
from mytable m1, (
select group_id, grp_col1, grp_col2, max(field_for_aggregate_func) as max_value
from mytable
group by group_id, grp_col1, grp_col2) as m2
where m1.group_id=m2.group_id
and m1.grp_col1=m2.grp_col1
and m1.grp_col2=m2.grp_col2
and m1.field_for_aggregate_func=m2.max_value;
Watch out for when there is more than one max_value for the given grouping. You'll get multiple rows for that grouping. Fiddle here.
Try this.
See Fiddle demo here
http://sqlfiddle.com/#!9/9a3c26/8
Select t1.* from table1 t1 inner join
(
Select a.group_id,a.grp_col2,
A.Field_for_aggregate_func,
count(*) as rnum from table1 a
Inner join table1 b
On a.group_id=b.group_id
And a.grp_col2=b.grp_col2
And a.Field_for_aggregate_func
<=b.Field_for_aggregate_func
Group by a.group_id,
a.grp_col2,
a.Field_for_aggregate_func) t2
On t1.group_id=t2.group_id
And t1.grp_col2=t2.grp_col2
And t1.Field_for_aggregate_func
=t2.Field_for_aggregate_func
And t2.rnum=1
Here first I am assigning a rownumber in descending order based on date. The selecting all the records for that date.
I hope you can help me with this one. I've been looking for ways to set up a MySQL query that selects rows based on the number of times a certain value occurs, but have had no luck so far. I'm pretty sure i need to use count(*) somewhere, but i can only found how to count all values or all distinct values, instead of counting all occurences.
I have a table as such:
info setid
-- --
A 1
B 1
C 2
D 1
E 2
F 3
G 1
H 3
What i need is a query that will select all the lines where a setid occurs a certain number (x) of times.
So using x=2 should give me
C 2
E 2
F 3
H 3
because both setIds 2 and 3 each occur two times. Using x=1 or x = 3 should not give any results, and choosing x=4 should give me
A 1
B 1
D 1
G 1
Because only setid 1 occurs 4 times.
I hope you guys can help me. At this point i've been looking for the answer for so long that i'm not even sure this can be done in MySQL anymore. :)
select * from mytable
where setid in (
select setid from mytable
group by setid
having count(*) = 2
)
you can specify the # of times a setid needs to occur in the table in the having count(*) part of the subquery
Consider the following statement that uses an uncorrelated subquery:
SELECT ... FROM t1 WHERE t1.a IN (SELECT b FROM t2);
The optimizer rewrites the statement to a correlated subquery:
SELECT ... FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t2.b = t1.a);
If the inner and outer queries return M and N rows, respectively, the execution time becomes on the order of O(M×N), rather than O(M+N) as it would be for an uncorrelated subquery.
But this time the subquery in Fuzzy Tree's solution is complety superfluous:
SELECT
set_id,
GROUP_CONCAT(info ORDER BY info) infos
COUNT(*) total
FROM
tablename
GROUP_BY set_id
HAVING COUNT(*) = 2
I've got a database where each entry is an edge with a source tag, a relationship and a weight. I want to perform a query where given a source tag, I get the top n edges by weight with that source tag per relationship.
For example, given the entries
Id Source Relationship End Weight
-----------------------------------------
1 cat isA feline 56
2 cat isA animal 12
3 cat isA pet 37
4 cat desires food 5
5 cat desires play 88
6 dog isA canine 72
If I queried using "cat" as a source and n=2, the result should be
Id Source Relationship End Weight
-----------------------------------------
1 cat isA feline 56
3 cat isA pet 37
4 cat desires food 5
5 cat desires play 88
I've tried several different approaches based on other questions.
The most sucessful so far is based on How to SELECT the newest four items per category?
SELECT *
FROM tablename t1
JOIN tablename t2 ON (t1.relationship = t2.relationship)
LEFT OUTER JOIN tablename t3
ON (t1.relationship = t3.relationship AND t2.weight < t3.weight)
WHERE t1.source = "cat"
AND t3.relationship IS NULL
ORDER BY t2.weight DESC;
However, this returns all the edges with source="cat" in sorted order. If I try to add LIMIT, I get the edges with the top weights not by group.
The other thing that I have tried is
SELECT *
FROM tablename t1
WHERE t1.source="cat"
AND (
SELECT COUNT(*)
FROM tablename t2
WHERE t1.relationship = t2.relationship
AND t1.weight <= t2.weight
) <= 2;
This returns
Id Source Relationship End Weight
-----------------------------------------
1 cat isA feline 56
4 cat desires food 5
5 cat desires play 88
Because edge 6 has a higher weight for the isA relationship than edge 2, but is excluded from the results because the source="dog"
I am very new to databases, so if I need to take a completely different approach, let me know. I'm not afraid of starting over.
Doing this with the correlated subquery is indeed inefficient, because MySQL has to run the subquery for every row of the outer query, just to decide if the row in the outer query meets the conditions. That's a lot of overhead.
Here's a method using no subquery:
SELECT t1.*
FROM tablename t1
JOIN tablename t2 ON t1.source = t2.source and t1.relationship = t2.relationship
AND t1.weight <= t2.weight
WHERE t1.source = 'cat'
GROUP BY t1.id
HAVING COUNT(*) <= 2;
And here's a method using neither subquery, nor joins/group by:
SELECT *
FROM (
SELECT tablename.*, IF(#r = relationship, #n:=#n+1, #n:=1) AS _n,
#r:=relationship AS _r
FROM (SELECT #r:=null, #n:=1) _init, tablename
WHERE source = 'cat'
ORDER BY relationship, weight DESC
) AS _t
WHERE _n <= 2;
These solutions also need some tiebreaker in case there are multiple rows with the same top weights. But that applies to all the solutions.
The simpler solution, which wouldn't require special gymnastics or tiebreakers, is to use SQL window functions like ROW_NUMBER() OVER (PARTITION BY relationship), but MySQL does not support these.
It won't be too efficient, but MySQL allows you to do something like this:
SELECT t1.*
FROM
tablename t1 INNER JOIN (
SELECT SUBSTRING_INDEX(
GROUP_CONCAT(Id ORDER BY Weight DESC),
',',
2) top_2
FROM tablename
WHERE Source='cat'
GROUP BY Relationship) t2
ON FIND_IN_SET(t1.id, t2.top_2);
Please see fiddle here.