SQL SUM() being multiplied with COUNT() - mysql

dictionaries: id (INT) | title (VARCHAR) | created_at
dictionary_ratings: id (INT) | dictionary_id (INT) | rating (TINYINT)
terms: id (INT) | dictionary_id (INT)
I have three SQL tables as shown above (simplified). I want to query all dictionaries and add information about the number of terms per each, as well as the calculated rating of each dictionary.
SELECT dictionaries.title,
SUM(dictionary_ratings.rating) AS rating_count,
COUNT(terms.id) AS term_count
FROM dictionaries
LEFT JOIN dictionary_ratings
ON dictionaries.id = dictionary_ratings.dictionary_id
LEFT JOIN terms
ON dictionaries.id = terms.dictionary_id
GROUP BY dictionaries.id
ORDER BY dictionaries.created_at DESC
The query above works OK except that it multiplies SUM(dictionary_ratings.rating) by COUNT(terms.id). So if I have 5 terms associated with a dictionary, and the total rating is 10, it outputs 50 instead of 10.
How do I fix that?

You need to modify your query for get your expected result and for that your final query looks like :
SELECT
dictionaries.title,
sub.total_rating AS rating_count,
COUNT(terms.id) AS term_count
FROM
dictionaries
LEFT JOIN (
SELECT
dictionary_id,
SUM(rating) AS total_rating
FROM
dictionary_ratings
GROUP BY
dictionary_id
) sub ON dictionaries.id = sub.dictionary_id
LEFT JOIN terms ON dictionaries.id = terms.dictionary_id
GROUP BY
dictionaries.id
ORDER BY
dictionaries.created_at DESC
Here the first join is between dictionaries table and the result of a
subquery, which calculates the total rating for each dictionary by
summing the ratings for that dictionary in the dictionary_ratings
table and grouping them by dictionary_id.
The second join is between the dictionaries table and the terms table.
And finally, the query then groups the results by the dictionaries.id field and orders
the results by dictionaries.created_at field in desc order.

Related

SQL Query to Return Top Field for Each Member

Let's say I have a table with the following fields...
MEMBER_ID (text)
CATEGORY1 (int)
CATEGORY2 (int)
CATEGORY3 (int)
CATEGORY4 (int)
...and let's say I have, like, 30+ more CATEGORY fields, all numbered accordingly. And in each category field, there is a numerical score.
Is there a query that could be used to populate a new table that looks like so...
MEMBER_ID
TOP_CATEGORY (the category name from the previous table with the
highest score for this MEMBER_ID)
SECOND_CATEGORY (the category name from the previous table with the
second-highest score for this MEMBER_ID)
THIRD_CATEGORY (the category name from the previous table with the
third-highest score for this MEMBER_ID)
...I know I could use CASE, but if I have a ton of CATEGORY fields, I assume that would get unwieldy. Do I have any other options?
Your best option in my opinion would be to normalise your database structure. Create a table of members, a table of categories and a table of scores by member & category. Having normalised, the problem you are trying to solve becomes a "top n per group" problem, followed by conditional aggregation. For example, if you follow the template for normalising that I have made in this demo, your query would look something like this:
select member_id,
max(case when `rank` = 1 then name end) as top_category,
max(case when `rank` = 2 then name end) as second_category,
max(case when `rank` = 3 then name end) as third_category
from (select n.member_id, c.name, s1.score, count(s2.score) + 1 as rank
from score s1
left join score s2 on s2.member = s1.member and s1.score < s2.score
join new_members n on n.id = s1.member
join category c on c.id = s1.category
group by n.member_id, c.name, s1.score
having count(s2.score) < 3) s
group by member_id
Once your database is normalised, adding new members, categories and scores becomes a lot easier, and queries to get the data out such as the above don't have to change at all.

SQL - Counting how many associated records another table has

As I'm SQL beginner, I can't describe a problem in a simple way, so let me show you an example:
3 Tables:
PRODUCT
id
group_id
person_id
GROUP
id
name
PERSON
id
group_id
As you see, GROUP can have multiple PERSONs and PRODUCT can be connected with GROUP and PERSON.
From this point, I would like to count number of PERSONs having a PRODUCT within a GROUP
I don't really understand the background of IN or using another SELECT within FROM, so if that's the point, then I'm happy that I was one step before it lol.
SELECT
group.name as GROUP_name,
COUNT(DISTINCT person_id) AS PERSON_having_min_one_PRODUCT
FROM products
LEFT JOIN groups ON groups.id = products.group_id
LEFT JOIN persons ON persons.id = products.person_id;
With this data:
GROUP
ExampleGroupName1 has 3 PERSONs, but 2 of them has >0 PRODUCTS
ExampleGroupName2 has 3 PERSONs and all of them has >0 PRODUCTS
ExampleGroupName3 has 2 PERSONs, but none of them has the PRODUCT
ExampleGroupName4 has 2 PERSONs, but only 1 has >0 PRODUCT
I would like to have an output like this:
GROUP_name | PERSON_having_min_one_PRODUCT
ExampleGroupName1 | 2
ExampleGroupName2 | 3
ExampleGroupName4 | 1
I would like to count number of PERSONs having a PRODUCT within a GROUP
Note: I will assume the table product does not have the column group_id, since it is redundant and can lead to a lot of errors.
The following query will show you the result you want by joining the tables person and product:
select
count(distinct x.id)
from person x
join product p on p.person_id = x.id
where x.group_id = 123 -- choosing a specific group
and p.id = 456 -- choosing a specific product
This would rather be simple like below meaning all the groups with some group_id with count(persons) and those count who has some product via id used in having clause
Select group_id,
count( distinct id ) AS "PERSON_WITH_PRODUCT"
from
person group by group_id having id
in (Select id from product);

Sql conditional count with join

I cannot find the answer to my problem here on stackoverflow. I have a query that spans 3 tables:
newsitem
+------+----------+----------+----------+--------+----------+
| Guid | Supplier | LastEdit | ShowDate | Title | Contents |
+------+----------+----------+----------+--------+----------+
newsrating
+----+----------+--------+--------+
| Id | NewsGuid | UserId | Rating |
+----+----------+--------+--------+
usernews
+----+----------+--------+----------+
| Id | NewsGuid | UserId | ReadDate |
+----+----------+--------+----------+
Newsitem obviously contains newsitems, newsrating contains ratings that users give to newsitems, and usernews contains the date when a user has read a newsitem.
In my query I want to get every newsitem, including the number of ratings for that newsitem and the average rating, and how many times that newsitem has been read by the current user.
What I have so far is:
select newsitem.guid, supplier, count(newsrating.id) as numberofratings,
avg(newsrating.rating) as rating,
count(case usernews.UserId when 3 then 1 else null end) as numberofreads from newsitem
left join newsrating on newsitem.guid = newsrating.newsguid
left join usernews on newsitem.guid = usernews.newsguid
group by newsitem.guid
I have created an sql fiddle here: http://sqlfiddle.com/#!9/c8add/8
Both count() calls don't return the numbers I want. numberofratings should return the total number of ratings for that newsitem (by all users). numberofreads should return the number of reads for the current user for that newsitem.
So, newsitem with guid d104c330-c319-40e8-8be3-a7c4f549d35c should have 2 ratings and 3 reads for the current user with userid = 3.
I have tried conditional counts and sums, but no success yet. How can this be accomplished?
The main problem that I see is that you're joining in both tables together, which means that you're going to effectively be multiplying out by both numbers, which is why your counts aren't going to be correct. For example, if the Newsitem has been read 3 times by the user and rated by 8 users then you're going to end up getting 24 rows, so it will look like it has been rated 24 times. You can add a DISTINCT to your COUNT of the ratings IDs and that should correct that issue. Average should be unaffected because the average of 1 and 2 is the same as the average of 1, 1, 2, & 2 (for example).
You can then handle the reads by adding the userid to the JOIN condition (since it's an OUTER JOIN it shouldn't cause any loss of results) instead of in a CASE statement for your COUNT, then you can do a COUNT on distinct id values from Usernews. The resulting query would be:
SELECT
I.guid,
I.supplier,
COUNT(DISTINCT R.id) AS number_of_ratings,
AVG(R.rating) AS avg_rating,
COUNT(DISTINCT UN.id) AS number_of_reads
FROM
NewsItem I
LEFT OUTER JOIN NewsRating R ON R.newsguid = I.guid
LEFT OUTER JOIN UserNews UN ON
UN.newsguid = I.guid AND
UN.userid = #userid
GROUP BY
I.guid,
I.supplier
While that should work, you might get better results from a subquery, as the above needs to explode out the results and then aggregate them, perhaps unnecessarily. Also, some people might find the below to be a little clearer.
SELECT
I.guid,
I.supplier,
R.number_of_ratings,
R.avg_rating,
COUNT(*) AS number_of_reads
FROM
NewsItem I
LEFT OUTER JOIN
(
SELECT
newsguid,
COUNT(*) AS number_of_ratings,
AVG(rating) AS avg_rating
FROM
NewsRating
GROUP BY
newsguid
) R ON R.newsguid = I.guid
LEFT OUTER JOIN UserNews UN ON UN.newsguid = I.guid AND UN.userid = #userid
GROUP BY
I.guid,
I.supplier,
R.number_of_ratings,
R.avg_rating
I'm with Tom you should use a subquery to calculate the user count.
SQL Fiddle Demo
SELECT NI.guid,
NI.supplier,
COUNT(NR.ID) as numberofratings,
AVG(NR.rating) as rating,
user_read as numberofreads
FROM newsitem NI
LEFT JOIN newsrating NR
ON NI.guid = NR.newsguid
LEFT JOIN (SELECT NewsGuid, COUNT(*) user_read
FROM usernews
WHERE UserId = 3 -- use a variable #user_id here
GROUP BY NewsGuid) UR
ON NI.guid = UR.NewsGuid
GROUP BY NI.guid,
NI.supplier,
numberofreads;

Combining SELECTs and filtering by a column in a different table

I have two tables. The first stores orders:
id | date | status
The second table stores revisions (changes to orders):
name | date | product | producer | etc...
I am displaying orders with this query:
SELECT id, date from orders
Then I'm displaying only the last revision to each order with this query:
SELECT name, product, producer from revisions WHERE order_id = id ORDER BY date LIMIT 1
Two questions:
How can I combine these into a single SELECT?
How can I filter all orders by there name in the revisions table?
You want the groupwise maximum:
SELECT o.id, o.date,
r.name, r.product, r.producer
FROM orders o JOIN (revisions r NATURAL JOIN (
SELECT order_id, name, MAX(date) date
FROM revisions
WHERE name = ?
GROUP BY order_id
) t) ON r.order_id = o.id
Note that if multiple revisions of a single order have the same value for revisions.date, then they will all be returned in the results (whereas your previous approach with LIMIT 1 would have returned an indeterminate one). If you want to be more selective, you need to decide the criteria for choosing which result to select.

MySQL Query: Multiple joins on the same field returning multiple categories - HOW?

We have a table with around 40000 id's in - some of these IDs are parents of other IDs (and subsequently some of those are parents of others in a different table). I'd like to use magic, persistance and some joins to work out which categories are related by querying against a field that contains all the child IDs (known as arrange).
So the way this works is Category table:
id name arrange
16 Alarms c,119|c,117|c,4607|c,3366|c,709|c,4204|c,624|c,626|c,625|c,4203|c,4201|c,4202
119 Carbon Monoxide i,21434|i,272|i,274|i,28451
Products table then has the i, items from arrange in
id name
272 Aico EI205ENA
274 AICO EI225EN
Basically I am running a query against a third orders table, and would like to create a table using joins which would be as such:
order date id name id name id name quantity price
13-06-2013 16 Alarms 119 Carbon Monoxide 272 Aico EI205ENA 2 10.00
At the moment I have:
select * from cart c
join prods p on p.id = c.item
where order_status = ''
and date_ordered != '0000-00-00'
order by date_added desc;
Simple join where I now want to add the categories from the first example, how on earth do I query an array to get what I want?
(If it helps we have 3 tables I am interested in cart, product and category).
Thanks in advance!
I have now realised how the tables join together.
What you want to do is possible with something like the following. It will be slow and the main reason I am putting it here is to show how nasty and unreadable it is, as an encouragement to normalise the database.
SELECT *
FROM cart c
INNER JOIN prods p
ON p.id = c.item
INNER JOIN Category z1
ON FIND_IN_SET(CONCAT('i-', p.id), REPLACE(REPLACE(z1.arrange, 'i,', 'i-'), '|', ',')) > 0
INNER JOIN Category z2
ON FIND_IN_SET(CONCAT('c-', z1.id), REPLACE(REPLACE(z2.arrange, 'c,', 'c-'), '|', ',')) > 0
WHERE order_status = ''
AND date_ordered != '0000-00-00'
ORDER BY date_added DESC
FIND_IN_SET is a function that looks for a value in a comma separated list. This is taking your delimited list, changing all the occurrences of i, and changing them to i- then changing all the occurrences of | and changing them to commas. Then it looks for i- concatenated with the id in the resulting comma separated list. Then does the same joining against category again to get the match for the parent category.