MySQL: Find number of users with identical poll answers - mysql

I have a table, poll_response, with three columns: user_id, poll_id, option_id.
Give an arbitrary number of poll/response pairs, how can I determine the number of distinct user_ids match?
So, suppose the table's data looks like this:
user_id | poll_id | option_id
1 1 0
1 2 1
1 3 0
1 4 0
2 1 1
2 2 1
2 3 1
2 4 0
And suppose I want to know how many users have responded "1" to poll 2 and "0" to poll 3.
In this case, only user 1 matches, so the answer is: there is only one distinct user.
But suppose I want to know how many users have responded "1" to poll 2 and "0" to poll 4.
In this case, both user 1 and user 2 match, so the answer is: there are 2 distinct users.
I'm having trouble constructing the MySQL query to make this happen, especially given that there are an arbitrary number of poll/response pairs. Do I just try to chain a bunch of joins together?

To know how many users have responded "1" to poll 2 and "0" to poll 3.
select count(user_id) from(
select user_id from tblA
where (poll_id=2 and option_id=1) or (poll_id=3 and option_id=0)
group by user_id
having count(user_id)=2
)m
SQL FIDDLE HERE.

Related

Duplicate or unpredictable results in MySQL

I'm trying to join a few tables in MySQL. Our setup is a little unique so I try to explain as good as I can.
I have a table 'INVENTORY' that represents the current items on stock.
These items are stored in a table 'COMPONENT'
Components are being used in installations.
Every user can have multiple installations and the same component can be used in multiple installation as well.
To uniquely map a component to an installation, it can be assigned to a PRODUCT. a product as has a 1-1 relationship with an installation. A component is not directly related to an installation
To finally assign a product to a specific installation a mapping table COMPOMENT_PRODUCT is used.
Example:
A component is like a part, lets say a screw. This screw is used in a computer. The very same screw can be used on multiple computers. But each computer can only be used on one specific installation.
TABLE COMPOMENT_PRODUCT
COMPOMENT_ID PRODUCT_ID
1 1
1 2
2 1
2 2
So we have the components C1 and C2 relevant for two installations.
TABLE INVENTORY
COMPOMENT_ID INSTALLATION_ID ON_STOCK
1 1 5
1 2 2
What I want to achieve
Now, I want to retrieve the inventory state for all components. But, not every component has an inventory record. In these cases, the ON_STOCK value from the inventory shall be NULL
That means, for this example I'd expect the following results
COMPOMENT_ID PRODUCT_ID ON_STOCK
1 1 5
1 2 2
2 1 NULL
2 2 NULL
But executing this query:
SELECT DISTINCT
COMPONENT_PRODUCT.COMPONENT_ID,
COMPONENT_PRODUCT.PRODUCT_ID,
INVENTORY.ON_STOCK
FROM INVENTORY
RIGHT JOIN COMPONENT_PRODUCT ON COMPONENT_PRODUCT.COMPONENT_ID =
INVENTORY.COMPONENT_ID
returns the following resultset:
COMPONENT_ID PRODUCT_ID ON_STOCK
1 1 5
1 2 5
1 1 2
1 2 2
2 1 (null)
2 2 (null)
Now, my next thought was, "of course, this is how joins behave, okay I need to group the results". But the way SQL works, the aggregation is not entirely predictable. SO when I
GROUP BY COMPONENT_PRODUCT.COMPONENT_ID,COMPONENT_PRODUCT.PRODUCT_ID
I get this result:
COMPONENT_ID PRODUCT_ID ON_STOCK
1 1 5
1 2 5
2 1 (null)
2 2 (null)
I have prepared a Fiddle here: http://sqlfiddle.com/#!9/71ca87
What am I forgetting here? Thanks in advance for any pointers.
Try this query -
SELECT DISTINCT
COMPONENT_PRODUCT.COMPONENT_ID,
COMPONENT_PRODUCT.PRODUCT_ID,
INVENTORY.ON_STOCK
FROM INVENTORY
RIGHT JOIN COMPONENT_PRODUCT ON COMPONENT_PRODUCT.COMPONENT_ID =
INVENTORY.COMPONENT_ID
AND COMPONENT_PRODUCT.PRODUCT_ID = INVENTORY.INSTALLATION_ID

How to show multiple values of a single field in a single row in a single query?

Table 1:Domain Link Result
======================================================================
||Column1(words) ||Column2(links) ||Column3(frequency) ||
======================================================================
1 1 Any Number
2 1 Any Number
3 1 Any Number
4 1 Any Number
1 2 Any Number
2 2 Any Number
3 2 Any Number
4 2 Any Number
Table 2:Sub Link Result
======================================================================
||Column1(words) ||Column2(sublinks) ||Column3(frequency) ||
======================================================================
1 a Any Number
2 b Any Number
3 c Any Number
4 d Any Number
1 e Any Number
2 f Any Number
3 g Any Number
4 h Any Number
And so on.
In the above scenario user entered 4 words and 2 domain links. Now the frequency of 4 keywords is calculated on domain links as well sublinks and stored in separate tables as shown above. I want an aggregate result like below:
Table 3:Final Result
==================================================================================
||Column1(words) ||Column2(Domain links) ||Column3(Total frequency) ||
==================================================================================
Row1: 1 1 Total of frequency in both tables
2 for word "1"
----------------------------------------------------------------------------------
Row2: 2 1 Total of frequency in both tables
2 for word "2"
----------------------------------------------------------------------------------
Row3: 3 1 Total of frequency in both tables
2 for word "3"
----------------------------------------------------------------------------------
Row4: 4 1 Total of frequency in both tables
2 for word "4"
----------------------------------------------------------------------------------
I tried the following query in MySQL:
SELECT t.`keyword`, t.`link` SUM( t.`frequency` ) AS total
FROM (
SELECT `frequency`
FROM `domain_link_result`
WHERE `keyword` = 'national'
UNION ALL
SELECT `frequency`
FROM `sub_link_result`
WHERE `keyword` = 'national'
)t GROUP BY `keyword`
But in Column 2 of the final result I get only first link instead of two links for row 1. How can I get both links or any number of links entered by user in a single row ?
Words and Links have VARCHAR as type and frequency has INT type.
If you want to collapse several rows into one and still be able to see the information, you have to use GROUP_CONCAT
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat
This outputs the collapsed values separated by commas, i.e.: a string. In your programming language you can split this string if you need individual values.
Your query would look somehow like this
SELECT keyword, GROUP_CONCAT(links), SUM(frequency)
FROM (subquery)
GROUP BY keyword
Which would output something like this:
==================================================================================
||Column1(words) ||Column2(Domain links) ||Column3(Total frequency) ||
==================================================================================
Row1: 1 1,2 sum of freq.
----------------------------------------------------------------------------------
Row2: 2 1,2 sum of freq.
----------------------------------------------------------------------------------
Row3: 3 1,2 sum of freq.
----------------------------------------------------------------------------------
Row4: 4 1,2 sum of freq.
EDIT: Extra help for your query
Your query looks a little bit confusing to me. Try with a JOIN approach:
SELECT domain_link_results.word AS word,
GROUP_CONCAT(domain_link_results.links) AS domain_links,
domain_link_results.frequency + sub_link_results.frequency AS total_frequency
FROM domain_link_results
INNER JOIN sub_link_results
ON domain_link_results.word = sub_link_results.word
WHERE domain_link_results.word = "national"
GROUP BY domain_link_results.word
On the other hand, it might be better to have all the links in the same table, and an extra field to determine if it's a domain link or a sublink. Without knowing more about your system it is hard to say.

Compare rows and get percentage

I found it hard to find a fitting title. For simplicity let's say I have the following table:
cook_id cook_rating
1 2
1 1
1 3
1 4
1 2
1 2
1 1
1 3
1 5
1 4
2 5
2 2
Now I would like to get an output of 'good' cooks. A good cook is someone who has a rating of at least 70% of 1, 2 or 3, but not 4 or 5.
So in my example table, the cook with id 1 has a total of 10 ratings, 7 of which have type 1, 2 and 3. Only three have type 4 or 5. Therefore the cook with id 1 would be a 'good' cook, and the output should be the cook's id with the number of good ratings.
cook_id cook_rating
1 7
The cook with id 2, however, doesn't satisfy my condition, therefore should not be listed at all.
select cook_id, count(cook_rating) - sum(case when cook_rating = 4 OR cook_rating = 5 then 1 else 0 end) as numberOfGoodRatings from cook
where cook_rating in (1,2,3,4,5)
group by cook_id
order by numberOfGoodRatings desc
However, this doesn't take into account the fact that there might be more 4 or 5 than good ratings, resulting in negative outputs. Plus, the requirement of at least 70% is not included.
You can get this with a comparison in your HAVING clause. If you must have just the two columns in the result set, this can be wrapped as a sub-select select cook_id, positive_ratings FROM (...)
SELECT
cook_id,
count(cook_rating < 4 OR cook_rating IS NULL) as positive_ratings,
count(*) as total_ratings
FROM cook
GROUP BY cook_id
HAVING (positive_ratings / total_ratings) >= 0.70
ORDER BY positive_ratings DESC
Edit Note that count(cook_rating < 4) is intended to only count rows where the rating is less than 4. The MySQL documentation says that count will only count non-null rows. I haven't tested this to see if it equates FALSE with NULL but I would be surprised it it doesn't. Worst case scenario we would need to wrap that in an IF(cook_rating < 4, 1,NULL).
I suggest you change a little your schema to make this kind of queries trivial.
Suppose you add 5 columns to your cook table, to simply count the number of each ratings :
nb_ratings_1 nb_ratings_2 nb_ratings_3 nb_ratings_4 nb_ratings_5
Updating such a table when a new rating is entered in DB is trivial, just as would be recomputing those numbers if having redundancy makes you nervous. And it makes all filterings and sortings fast and easy.

SQL - counting rows with specific value

I have a table that looks somewhat like this:
id value
1 0
1 1
1 2
1 0
1 1
2 2
2 1
2 1
2 0
3 0
3 2
3 0
Now for each id, I want to count the number of occurences of 0 and 1 and the number of occurences for that ID (the value can be any integer), so the end result should look something like this:
id n0 n1 total
1 2 2 5
2 1 2 4
3 2 0 3
I managed to get the first and last row with this statement:
SELECT id, COUNT(*) FROM mytable GROUP BY id;
But I'm sort of lost from here. Any pointers on how to achieve this without a huge statement?
With MySQL, you can use SUM(condition):
SELECT id, SUM(value=0) AS n0, SUM(value=1) AS n1, COUNT(*) AS total
FROM mytable
GROUP BY id
See it on sqlfiddle.
As #Zane commented above, the typical method is to use CASE expressions to perform the pivot.
SQL Server now has a PIVOT operator that you might see. DECODE() and IIF() were older approaches on Oracle and Access that you might still find lying around.

Please explain the functionality of select max(...) ... group by in sql

I appreciate that this may appear to many as a dum question but I cannot find a clear explanation anywhere as to what the effect of "group by" has on a select max(...) from SQL statement.
I have the following data (there is another column image of type mediumblob which is not shown):
id title test_id
1 bomb 0
2 Soft watch 2
3 Dali 1
4 Narciss 1
5 The Woman In Green 0
6 A summer in Vetheuil 0
7 Artist's Garden 2
8 Beech Forest 2
9 Claude Monet 0
I know if I perform
select max(id) from images
where image is not null;
I get the max value of id i.e.:
max(id)
9
However can someone please explain what is happening when I perform
select max(id), title, test_id
from images
where image is not null
group by id;
I find that the max(id) serves no useful purpose (results shown below)?
max(id) title test_id
1 bomb 0
2 Soft watch 2
3 Dali 1
4 Narciss 1
5 The Woman In Green 0
6 A summer in Vetheuil 0
7 Artist's Garden 2
8 Beech Forest 2
9 Claude Monet 0
In the case of using MAX() the GROUP BY clause essentially tells the query engine how to group the items from which to determine a maximum. In your first example you were selecting only a single column, so there was no need for grouping. But in your second example you had multiple columns. So you need to tell the query engine how to determine which ones are going to be compared to find a maximum.
You told it to group by the id column. Which means that it's going to compare records which have the same id and give you the maximum one for each unique id. Since every record has a different id, you essentially didn't do anything with that clause.
It grouped all records with an id of 1 (which was a single record), and returned the record with the maximum id from that group (which was that record). It did the same for 2, 3, etc.
In the case of the three columns shown here, the only place where it would make sense to group your records would be on the test_id column. Something like this:
SELECT MAX(id), title, test_id
FROM images
WHERE image IS NOT null
GROUP BY test_id
This would group them by the test_id, so the results will include records 6 (the maximum id for test_id 0), 4 (the maximum id for test_id 1), and 8 (the maximum id for test_id 2). By splitting the records into those three groups based on the three unique values in the test_id column, it can effectively find a "maximum" id within each group.
Yes, in your example it serves no useful purpose.
You're grouping by ID then finding the maximum ID. But that doesn't make sense since there's only one of each ID. Normally MAX() is used on quantities, like prices or item counts or such like.
Group by is not used for this kind of queries
Its is used for queries like this
OId OrderDate OrderPrice Customer
1 2008/11/12 1000 Hansen
2 2008/10/23 1600 Nilsen
3 2008/09/02 700 Hansen
4 2008/09/03 300 Hansen
5 2008/08/30 2000 Jensen
6 2008/10/04 100 Nilsen
Now if you want to get sum of material bought by each customer of these you will use group by
SELECT Customer,SUM(OrderPrice) FROM Orders
GROUP BY Customer
customer SUM(OrderPrice)
Hansen 2000
Nilsen 1700
Jensen 2000
In above case id is unique so group by id will not make any sense