selecting multiple records where count is some value - mysql

There is a huge database with more than 500k values, but with only one table having all the data. I need to extract some of it for a given condition.
Table structure is like this,
column_a | column_b
A | 30
A | 40
A | 70
B | 25
B | 45
C | 10
C | 15
C | 25
I need to extract all the data having a count(column_a) = 3. the catch is that i need to get all the three records too. Like this,
column_a | column_b
A | 30
A | 40
A | 70
C | 10
C | 15
C | 25
I have tried to do this with a query like this
select column_a,column_b group by column_a having count(*)=3;
Here i get the correct values for column_a but only one record from each.
Thanks in advance,
Bhashithe

One approach is to INNER JOIN your original table to a subquery which identifies the column_a records which come in groups of exactly 3.
SELECT t1.column_a, t1.column_b
FROM table t1
INNER JOIN
(
SELECT column_a, COUNT(*)
FROM table
GROUP BY column_a
HAVING COUNT(*) = 3
) t2
ON t1.column_a = t2.column_a

You can use nested query, if you want.
Here, inner query fetches the records having column_a size equals to 3 and outer query displays all the records using the 'IN' clause.
SELECT t.column_a, t.column_b FROM table t
WHERE t.column_a IN
(
SELECT t1.column_a FROM table t1
GROUP BY t1.column_a
HAVING COUNT(t1.column_a) = 3
)
ORDER BY t.column_a;

Related

MySQL: Retrieve Values and Counts For Each

How can I count the occurrence of the field/column in SQL?
Example dataset:
A
A
A
A
B
B
C
I want:
A | 4
A | 4
A | 4
A | 4
B | 2
B | 2
C | 1
Is there anyway to do it without using GROUP BY? So far all answer I get my query retuns the following:
A | 4
B | 2
C | 1
select value, count(*) from table group by value
Use HAVING to further reduce the results, e.g. only values that occur more than 3 times:
select value, count(*) from table group by value having count(*) > 3
You could use a nested sub-select for this desired result set.
If the example table name is my_table and the column called col1:
select col1,
(select count(*) from my_table where col1 = t.col1) as Count
from my_table t;
Or if you want to remove the duplicates, use the distinct statement. It removes the duplicates of your result set.
select distinct col1,
(select count(*) from my_table where col1 = t.col1) as Count
from my_table t;

Create a mysql function that accepts a result set as parameter?

Is it possible to create a mySQL function that accepts as a parameter the result set from a query?
Basically I have a lot of queries that will return a result result set as follows:
id | score
70 | 25
71 | 7
72 | 215
74 | 32
75 | 710
76 | 34
78 | 998
79 | 103
80 | 3
I want to normalize the values such that they come to a range between 0 and 1.
The way I thought I'd do this was by applying calculation:
nscore = (score-min(score))/(max(score) - min(score))
to get following result
id | score
70 | 0.022
71 | 0.004
72 | 0.213
74 | 0.029
75 | 0.710
76 | 0.031
78 | 1.000
79 | 0.100
80 | 0.000
But I'm not able to come up with a query to get the min and max in this query along with results, hence thought of using a function (cannot use stored procedure) but couldn't documentation on how to pass a result set.
Any help appreciated!Thanks!
EDIT:
The score field in result is a computed field. Cannot select it directly.
For eg: Sample query that returns the above result -
select t.id as id, count(*) as score
from tbl t
inner join tbl2 t2 on t.idx = t2.idx
where t2.role in (.....)
just for demo purpose, not actual schema or query
No. MySQL doesn't support defining a function with a resultset as an argument.
Unfortunately, MySQL does not support Common Table Expression (CTE), and does not support Analytic functions.
To get this result from a MySQL query... one way to do that in MySQL would require the original query to be returned as an inline view, two times ...
As an example:
SELECT t.id
, (t.score-s.min_score)/(s.max_score-s.min_score) AS normalized_score
FROM (
-- original query here
SELECT id, score FROM ...
) t
CROSS
JOIN ( SELECT MIN(r.score) AS min_score
, MAX(r.score) AS max_score
FROM (
-- original query here
SELECT id, score FROM ...
) r
) s
ORDER BY t.id
EDIT
Based on the query added to the question ...
SELECT q.id
, (q.score-s.min_score)/(s.max_score-s.min_score) AS normalized_score
FROM ( -- original query goes here
-- ------------------------
select t.id as id, count(*) as score
from tbl t
inner join tbl2 t2 on t.idx = t2.idx
where t2.role in (.....)
-- ------------------------
) q
CROSS
JOIN ( SELECT MIN(r.score) AS min_score
, MAX(r.score) AS max_score
FROM ( -- original query goes here
-- ------------------------
select t.id as id, count(*) as score
from tbl t
inner join tbl2 t2 on t.idx = t2.idx
where t2.role in (.....)
-- ------------------------
) r
) s
ORDER BY q.id

left join two tables where my variable is less than 5

Im creating a query that select two tables and create a total variable by count a field in one table.
Example:
Table A:
ID | email
1 | test#test
2 | test2#test
3 | test3#test
Table B
ID | email_id | username_id
1 | 1 | 11
2 | 1 | 22
3 | 2 | 33
My query:
select a.id, a.email, count(c.id) as total
from tableA a
left join tableC c on c.email_id = a.id AND total <= 5
group by a.email LIMIT 1
Output:
Unknown column 'total' in 'on clause
I need to select the first "a.id" that has total <= 5. How can I do it?
Logically Select is processed after the Where clause so you cannot use Alias name in same Where clause.
Use HAVING clause
select a.id, a.email, count(c.id) as total
from tableA a
left join tableC c on c.email_id = a.id
group by a.email
Having count(c.id) <= 5
LIMIT 1
I think Mysql allows you do this as well
Having total <= 5
Try HAVING Count(c.id) <= 5
Just to make this a bit clearer, since the correct answer has already been provided - You don't have to use the HAVING clause, and the HAVING clause is not always the solution for this problem.
The HAVING clause is usually used to place filters on aggregated columns (sum,count,max,min etc..) , but when you have a calculated column (colA + colB as calc_column for example) , then another approach , which should work here as well is to wrap the query with another select, and then the new column will be available on the WHERE :
SELECT *
FROM (The query here ) s
WHERE s.total <= 5

Remove all query results that have duplicates (NOT DISTINCT)

I have a table with two sets of integer values. Using MySQL I want to display all of the rows that correspond to unique entries in the second column. Basically I can have duplicate A values, but only unique B values. If there are duplicates for a value in B, remove all the results with that value. If I use DISTINCT I will still get one of those duplicates which I do not want. I also want to avoid using COUNT(). Here's an example:
|_A____B_|
| 1 2 |
| 1 3 |
| 2 2 |
| 2 4 |
| 1 4 |
| 5 5 |
Will have the following Results (1,3), (5,5). Any value in B that has a duplicate is removed.
Try this
SELECT * FROM TEMP WHERE B IN (
SELECT B FROM TEMP GROUP BY B Having COUNT(B)=1
);
I know you want to avoid using COUNT() but this is the quick solution.
working fiddle here - http://sqlfiddle.com/#!9/29d16/8
Tested and works! you need atleast count(*) to count the values
select * from test where B in (
select B from test group by B having count(*)<2
)
I don't know why you want to avoid using count(), because that's what would do the trick as follows:
Let's say your table is named "mytable"
SELECT t1.A, t1.B
FROM mytable t1
JOIN (
SELECT B, count(*) AS B_INSTANCES
FROM mytable
GROUP BY B
HAVING count(*) = 1
) t2 ON t2.B = t1.B
ORDER BY t1.A, t1.B

Using 'GROUP BY' while preferring rows associated in another table

I have a table tbl_entries with the following structure:
+----+------+------+------+
| id | col1 | col2 | col3 |
+----+------+------+------+
| 11 | a | b | c |
| 12 | d | e | a |
| 13 | a | b | c |
| 14 | X | e | 2 |
| 15 | a | b | c |
+----+------+------+------+
And another table tbl_reviewlist with the following structure:
+----+-------+------+------+------+
| id | entid | cola | colb | colc |
+----+-------+------+------+------+
| 1 | 12 | N | Y | Y |
| 2 | 13 | Y | N | Y |
| 3 | 14 | Y | N | N |
+----+-------+------+------+------+
Basically, tbl_reviewlist contains reviews about the entries in tbl_entries. However, for some known reason, the entries in tbl_entries are duplicated. I am extracting the unique records by the following query:
SELECT * FROM `tbl_entries` GROUP BY `col1`, `col2`, `col3`;
However, any one of the duplicate rows from tbl_entries will be returned no matter they have been reviewed or not. I want the query to prefer those rows which have been reviewed. How can I do that?
EDIT: I want to prefer rows which have been reviewed but if there are rows which have not been reviewed yet it should return those as well.
Thanks in advance!
Have you actually tried anything?
A hint: The SQL standard requires that every column in the result set of a query with a group by clause must be either
a grouping column
an aggregate function — sum(), count(), etc.,
a constant value/literal, or
an expression derived solely from the above.
Some broken implementations (and I believe MySQL is one of them) allow other columns to be included and offer their own...creative...behavior. If you think about it, group by essentially says to do the following:
Order this table by the grouping expressions
Partition it into subsets based on the group by sequence
Collapse each such partition into a single row computing the aggregate expressions as you go.
Once you've done that, what does it mean to ask for something that isn't uniform across the collapsed group partition?
If you have a table foo containing columns A, B, C, D and E and say something like
select A,B,C,D,E from foo group by A,B,C
per the standard, you should get a compile error. Deviant implementations [usually] treat this sort of query as the [rough] equivalent of
select *
from foo t
join ( select A,B,C
from foo
group by A,B,C
) x on x.A = t.A
and x.B = t.B
and x.C = t.C
But I wouldn't necessarily count on that without review the documentation for the specific implementation that your are using.
If you want to find just reviewed entries, then something like this:
select *
from tbl_entries t
where exists ( select *
from tbl_reviewlist x
where x.entid = t.id
)
will do you. If, however, you want to find reviewed entries that are duplicated on col1, col2 and col3 then something like this should do you:
select *
from tbl_entries t
join ( select col1,col2,col3
from tbl_entries x
group by col1,col2,col3
having count(*) > 1
) d on d.col1 = t.col1
and d.col2 = t.col2
and d.col3 = t.col3
where exists ( select *
from tbl_reviewlist x
where x.entid = t.id
)
Since your problem statement is rather unclear, another take might be something along these lines:
select t.col1 ,
t.col2 ,
t.col3 ,
t.duplicate_count ,
coalesce(x.review_count,0) as review_count
from ( select col1 ,
col2 ,
col3 ,
count(*) as duplicate_count
from tbl_entries
group by col1 ,
col2 ,
col3
) t
left join ( select cola, colb, colc , count(*) as review_count
from tbl_reviewList
group by cola, colb, colc
having count(*) > 1
) x on x.cola = t.col1
and x.colb = t.col2
and x.colc = t.col3
order by sign(coalesce(x.review_count,0)) desc ,
t.col1 ,
t.col2 ,
t.col3
This query
summarizes the entries table, developing a count of how many time seach col1/2/3 combination exists.
summarizes the review table, developing a count of reviews for each cola/b/c combination
joins them together matching cols a:1, b:2 c:3
orders them
preferring reviewed items to non-reviewed items by placing them first,
then by the col1/2/3 values.
I think there's a way with less repetition, but this should be a start:
select
tbl_entries.ID,
col1,
col2,
col3,
cola, -- ... you get the idea ...
from (
select coalesce(min(entid), min(tbl_entries.ID)) as favID
from tbl_entries left join tbl_reviewlist on entid = tbl_entries.ID
group by col1, col2, col3
) as A join tbl_entries on tbl_entries.ID = favID
left join tbl_reviewlist on entid = tbl_entries.ID
Basically you distill the desired output to a list of core ID's and then re-map back to the data...
SELECT e.col1, e.col2, e.col3,
COALESCE(MIN(r.entid), MIN(e.id)) AS id
FROM tbl_entries AS e
LEFT JOIN tbl_reviewlist AS r
ON r.entid = e.id
GROUP BY e.col1, e.col2, e.col3 ;
Tested at SQL-Fiddle