CASE will ignore duplicate entries - mysql

I have a table which store user votes, something like this:
+----+---------+-------+---------+
| id | post_id | value | user_id |
+----+---------+-------+---------+
| 1 | 103 | 1 | 1 |
| 2 | 105 | 1 | 3 |
| 3 | 106 | 1 | 1 |
| 4 | 108 | 0 | 1 |
| 5 | 108 | 0 | 2 |
| 6 | 105 | 0 | 2 |
| 7 | 105 | 1 | 1 |
+----+---------+-------+---------+
Where id is Vote ID, post_id is Post ID, value is a boolean for like/unlike (1 = Like, 0 = Unlike) and user_id is Voter ID.
I want to get a list which has:
All posts
Each post's like count
Each post's unlike count
Vote ID of where a user (for example, user 1) voted. So I can update the value later
The vote that user gave
So I wrote this query:
SELECT
post_id,
COUNT(
CASE
WHEN value = 1
THEN 1
END
) AS likes,
COUNT(
CASE
WHEN value = 0
THEN 1
END
) AS unlikes,
(CASE
WHEN user_id = 1 -- I choose user_id manually
THEN id
END) AS vote_id,
(CASE
WHEN user_id = 1
THEN value
END) AS user_vote
FROM votes
GROUP BY post_id;
What I expected was something like this:
+---------+-------+---------+---------+-----------+
| post_id | likes | unlikes | vote_id | user_vote |
+---------+-------+---------+---------+-----------+
| 103 | 1 | 0 | 1 | 1 |
| 105 | 2 | 1 | 7 | 1 |
| 106 | 1 | 0 | 3 | 1 |
| 108 | 0 | 2 | 4 | 0 |
+---------+-------+---------+---------+-----------+
But this is the result I have:
+---------+-------+---------+---------+-----------+
| post_id | likes | unlikes | vote_id | user_vote |
+---------+-------+---------+---------+-----------+
| 103 | 1 | 0 | 1 | 1 |
| 105 | 2 | 1 | NULL | NULL |
| 106 | 1 | 0 | 3 | 1 |
| 108 | 0 | 2 | 4 | 0 |
+---------+-------+---------+---------+-----------+
Those NULL values happen because of the last line of my query (GROUP BY post_id). If there's a duplicate post_id in table (for example, if some other user already voted for that post), CASE will ignore it and return NULL.
What should I do?

The problem with your query is the lack of aggregations on the last two columns. Assuming that a user can only vote once for a post, you are safe combining the values using MAX() (or MIN()) because there is at most one matching row:
SELECT v.post_id, SUM(value = 1) AS likes, SUM(value = 0) AS unlikes,
MAX(CASE WHEN user_id = 1 THEN vote_id END) AS user1_voteid,
MAX(CASE WHEN user_id = 1 THEN value END) AS user1_vote
FROM votes v
GROUP BY v.post_id;

Related

How can I get a purchase check in MySQL

I am not able to figure out how I can get the following result with one MySQL Query:
I have two tables:
shop_items
| id | description | price | active |
+----+-------------+-------+--------+
| 1 | product_1 | 5 | 1 |
+----+-------------+-------+--------+
| 2 | product_2 | 10 | 1 |
+----+-------------+-------+--------+
| 3 | product_3 | 15 | 0 |
+----+-------------+-------+--------+
inventory_items (the shop_items a user purchased)
| id | item_id | user_id | active |
+----+---------+---------+--------+
| 1 | 2 | 1 | 1 |
+----+---------+---------+--------+
| 2 | 1 | 1 | 0 |
+----+---------+---------+--------+
I want to see all shop_items where active = 1 including a row called purchased = 0 or 1 based on inventory_items -> matching user_id (where user_id = something) and active = 1
Example output based on the data from above tables -> where user_id = 1:
| item_id | price | description | purchased |
+---------+-------+-------------+-----------+
| 1 | 5 | product_1 | 0 |
+---------+-------+-------------+-----------+
| 2 | 10 | product_2 | 1 |
+---------+-------+-------------+-----------+
What query do I need for this output?
Please note: I only need the result from ONE user_id which I can change within the query :)
Test
SELECT shop_items.*, COALESCE(inventory_items.active, 0) purchased
FROM shop_items
LEFT JOIN inventory_items ON shop_items.id = inventory_items.item_id
AND user_id = 1
WHERE shop_items.active = 1

COUNT counting something on LEFT JOIN that doesn't exist. Why?

I'm having a problem.
I have this table called usersbycourse which shows this information:
+------------+-----------------+--------+-----------+-------+-----------------+-----------------+
| instanceid | shortname | userid | firstname | logid | lastaccessdelta | modulesfinished |
+------------+-----------------+--------+-----------+-------+-----------------+-----------------+
| 2 | PJU | 74 | Robin | 766 | 1662246 | 0 |
| 3 | Fundgest-GRHN1A | 75 | Batman | 867 | 1576725 | 0 |
| 3 | Fundgest-GRHN1A | 77 | Abigobeu | 1004 | 610480 | 0 |
+------------+-----------------+--------+-----------+-------+-----------------+-----------------+
and this SQL:
SELECT
mdl_course.id,
mdl_course.shortname,
COUNT(CASE WHEN usersbycourse.modulesfinished = 1 THEN NULL ELSE 1 END) AS studentcount
FROM mdl_course LEFT JOIN usersbycourse ON mdl_course.id = usersbycourse.instanceid
GROUP BY mdl_course.id;
The results from the SQL are:
+----+-----------------+--------------+
| id | shortname | studentcount |
+----+-----------------+--------------+
| 1 | Unity I | 1 |
| 2 | PJU | 1 |
| 3 | Fundgest-GRHN1A | 2 |
| 4 | asdzxc2 | 1 |
+----+-----------------+--------------+
But why? In inside SQL has no Unity I, and no asdzxc2. How do I produce a result like this:
+----+-----------------+--------------+
| id | shortname | studentcount |
+----+-----------------+--------------+
| 1 | Unity I | 0 |
| 2 | PJU | 1 |
| 3 | Fundgest-GRHN1A | 2 |
| 4 | asdzxc2 | 0 |
+----+-----------------+--------------+
?
EDIT:
I want to count only rows having modulesfinished = 0
What you're looking for is SUM rather than COUNT, that is,
SELECT
mdl_course.id,
mdl_course.shortname,
SUM(CASE WHEN usersbycourse.modulesfinished = 0 THEN 1 ELSE 0 END) AS studentcount
FROM mdl_course LEFT JOIN usersbycourse ON mdl_course.id = usersbycourse.instanceid
GROUP BY mdl_course.id;
The problem is because you are using LEFT JOIN some of the values for usersbycourse.modulesfinished are NULL
Something you need to learn is
NULL == something
Is always unknown, not true, not false, just unknown.
So when you try to compare with = 1 your nulls get the ELSE but not because they aren't 1, is because is all the rest.
So if instead you change the condition to
COUNT(CASE WHEN usersbycourse.modulesfinished = 0 THEN 1 ELSE NULL)
Only the TRUE match will get 1, the FALSE and the UNKNOW part ill get NULL and COUNT doesnt count nulls. And that is what you want.

How can I treat with NULL as minimum value?

I have a table like this:
// notifications
+----+-----------+-------+---------+---------+------+
| id | score | type | post_id | user_id | seen |
+----+-----------+-------+---------+---------+------+
| 1 | 15 | 1 | 2342 | 342 | 1 |
| 2 | 5 | 1 | 2342 | 342 | 1 |
| 3 | NULL | 2 | 5342 | 342 | 1 |
| 4 | -10 | 1 | 2342 | 342 | NULL |
| 5 | 5 | 1 | 2342 | 342 | NULL |
| 6 | NULL | 2 | 8342 | 342 | NULL |
| 7 | -2 | 1 | 2342 | 342 | NULL |
+----+-----------+-------+---------+---------+------+
-- type: 1 means "it is a vote", 2 means "it is a comment (without score)"
Here is my query:
SELECT SUM(score), type, post_id, seen
FROM notifications
WHERE user_id = 342
GROUP BY type, post_id
ORDER BY (seen IS NULL) desc
As you see, there is SUM() function, Also both type and post_id columns are in the GROUP BY statement. Well now I'm talking about seen column. I don't want to put it into GROUP BY statement. So I have to use either MAX() or MIN() for it. Right?
Actually I need to select NULL as seen column (in query above) if there is even one row which has seen = NULL. My current query selects 1 as seen's value, even when I use MIN(seen). So why 1 is minimum when there is NULL?
Also I want to order rows so that all SEEN = NULL be in the top of list. How can I do that?
Expected result:
// notifications
+-----------+-------+---------+------+
| score | type | post_id | seen |
+-----------+-------+---------+------+
| 13 | 1 | 2342 | NULL |
| NULL | 2 | 8342 | NULL |
| NULL | 2 | 5342 | 1 |
+-----------+-------+---------+------+
You could do this
case when sum(seen is null) > 0
then null
else min(seen)
end
You could use the following query:
SELECT SUM(score), type, post_id, min(IFNULL(seen, 0)) as seen
FROM notifications
WHERE user_id = 342
GROUP BY type, post_id
ORDER BY seen desc

Update innerquery result

I have a query and a result as follows.
In the database NULL and 0 represent the same meaning.
Now I want a counter based on Null+0 or 1
Eg:in the following example I want the result like this:
IsVirtual Category counter
NULL+0 3 343+8 = (351 is Total)
Query
select * from
(
Select IsVirtual, Category, count(*) as counter
from [Hardware]
group by IsVirtual, Category
) innercat
Output
+-----------+----------+---------+
| IsVirtual | Category | counter |
+-----------+----------+---------+
| NULL | 3 | 343 |
| 0 | 3 | 8 |
| 1 | 2 | 1 |
| 0 | 1 | 1 |
| NULL | 6 | 119 |
| 0 | 4 | 1 |
| NULL | 1 | 70 |
| 0 | 5 | 9 |
| NULL | 4 | 54 |
| 0 | 2 | 2 |
| NULL | 5 | 41 |
| NULL | 2 | 112 |
| 1 | 1 | 5 |
+-----------+----------+---------+
I think you want this :
SELECT COALESCE(IsVirtual, 0) as [IsVirtual],
Category,
Count(*) as [Counter]
FROM yourtable
GROUP BY COALESCE(IsVirtual, 0),Category
This will give you expected result without using subquery.
try with this
select * from (
Select CASE ISNULL(IsVirtual,0)
WHEN 0 Then 'NULL + 0'
ELSE IsVirtual
END AS IsVirtual, Category, count(*) as counter from [Hardware] group by ISNULL(IsVirtual,0), Category
)innercat
You can also do the same thing by using MAX function
This might help you.
SELECT
max(IsVirtual) as IsVirtual,
Category,
Count(*) as Counter
FROM
yourtable
GROUP BY
Category

Advanced MySQL: Find correlations between poll responses

I've got four MySQL tables:
users (id, name)
polls (id, text)
options (id, poll_id, text)
responses (id, poll_id, option_id, user_id)
Given a particular poll and a particular option, I'd like to generate a table that shows which options from other polls are most strongly correlated.
Suppose this is our data set:
TABLE users:
+------+-------+
| id | name |
+------+-------+
| 1 | Abe |
| 2 | Bob |
| 3 | Che |
| 4 | Den |
+------+-------+
TABLE polls:
+------+-----------------------+
| id | text |
+------+-----------------------+
| 1 | Do you like apples? |
| 2 | What is your gender? |
| 3 | What is your height? |
| 4 | Do you like polls? |
+------+-----------------------+
TABLE options:
+------+----------+---------+
| id | poll_id | text |
+------+----------+---------+
| 1 | 1 | Yes |
| 2 | 1 | No |
| 3 | 2 | Male |
| 4 | 2 | Female |
| 5 | 3 | Short |
| 6 | 3 | Tall |
| 7 | 4 | Yes |
| 8 | 4 | No |
+------+----------+---------+
TABLE responses:
+------+----------+------------+----------+
| id | poll_id | option_id | user_id |
+------+----------+------------+----------+
| 1 | 1 | 1 | 1 |
| 2 | 1 | 2 | 2 |
| 3 | 1 | 2 | 3 |
| 4 | 1 | 2 | 4 |
| 5 | 2 | 3 | 1 |
| 6 | 2 | 3 | 2 |
| 7 | 2 | 3 | 3 |
| 8 | 2 | 4 | 4 |
| 9 | 3 | 5 | 1 |
| 10 | 3 | 6 | 2 |
| 10 | 3 | 5 | 3 |
| 10 | 3 | 6 | 4 |
| 10 | 4 | 7 | 1 |
| 10 | 4 | 7 | 2 |
| 10 | 4 | 7 | 3 |
| 10 | 4 | 7 | 4 |
+------+----------+------------+----------+
Given the poll ID 1 and the option ID 2, the generated table should be something like this:
+----------+------------+-----------------------+
| poll_id | option_id | percent_correlated |
+----------+------------+-----------------------+
| 4 | 7 | 100 |
| 2 | 3 | 66.66 |
| 3 | 6 | 66.66 |
| 2 | 4 | 33.33 |
| 3 | 5 | 33.33 |
| 4 | 8 | 0 |
+----------+------------+-----------------------+
So basically, we're identifying all of the users who responded to poll ID 1 and selected option ID 2, and we're looking through all the other polls to see what percentage of them also selected each other option.
Don't have an instance handy to test, can you see if this gets proper results:
select
poll_id,
option_id,
((psum - (sum1 * sum2 / n)) / sqrt((sum1sq - pow(sum1, 2.0) / n) * (sum2sq - pow(sum2, 2.0) / n))) AS r,
n
from
(
select
poll_id,
option_id,
SUM(score) AS sum1,
SUM(score_rev) AS sum2,
SUM(score * score) AS sum1sq,
SUM(score_rev * score_rev) AS sum2sq,
SUM(score * score_rev) AS psum,
COUNT(*) AS n
from
(
select
responses.poll_id,
responses.option_id,
CASE
WHEN user_resp.user_id IS NULL THEN SELECT 0
ELSE SELECT 1
END CASE as score,
CASE
WHEN user_resp.user_id IS NULL THEN SELECT 1
ELSE SELECT 0
END CASE as score_rev,
from responses left outer join
(
select
user_id
from
responses
where
poll_id = 1 and
option_id = 2
)user_resp
ON (user_resp.user_id = responses.user_id)
) temp1
group by
poll_id,
option_id
)components
After a few hours of trial and error, I managed to put together a query that works correctly:
SELECT poll_id AS p_id,
option_id AS o_id,
COUNT(*) AS optCount,
(SELECT COUNT(*) FROM response WHERE option_id = o_id AND user_id IN
(SELECT user_id FROM response WHERE poll_id = '1' AND option_id = '2')) /
(SELECT COUNT(*) FROM response WHERE poll_id = p_id AND user_id IN
(SELECT user_id FROM response WHERE poll_id = '1' AND option_id = '2'))
AS percentage
FROM response
INNER JOIN
(SELECT user_id FROM response WHERE poll_id = '1' AND option_id = '2') AS user_ids
ON response.user_id = user_ids.user_id
WHERE poll_id != '1'
GROUP BY option_id DESC
ORDER BY percentage DESC, optCount DESC
Based on a tests with a small data set, this query looks to be reasonably fast, but I'd like to modify it so the "IN" subquery is not repeated three times. Any suggestions?
This seems to give the right results for me:
select poll_stats.poll_id,
option_stats.option_id,
(100 * option_responses / poll_responses) as percent_correlated
from (select response.poll_id,
count(*) as poll_responses
from response selecting_response
join response on response.user_id = selecting_response.user_id
where selecting_response.poll_id = 1 and selecting_response.option_id = 2
group by response.poll_id) poll_stats
join (select options.poll_id,
options.id as option_id,
count(response.id) as option_responses
from options
left join response on response.poll_id = options.poll_id
and response.option_id = options.id
and exists (
select 1 from response selecting_response
where selecting_response.user_id = response.user_id
and selecting_response.poll_id = 1
and selecting_response.option_id = 2)
group by options.poll_id, options.id
) as option_stats
on option_stats.poll_id = poll_stats.poll_id
where poll_stats.poll_id <> 1
order by 3 desc, option_responses desc