I cannot find the answer to my problem here on stackoverflow. I have a query that spans 3 tables:
newsitem
+------+----------+----------+----------+--------+----------+
| Guid | Supplier | LastEdit | ShowDate | Title | Contents |
+------+----------+----------+----------+--------+----------+
newsrating
+----+----------+--------+--------+
| Id | NewsGuid | UserId | Rating |
+----+----------+--------+--------+
usernews
+----+----------+--------+----------+
| Id | NewsGuid | UserId | ReadDate |
+----+----------+--------+----------+
Newsitem obviously contains newsitems, newsrating contains ratings that users give to newsitems, and usernews contains the date when a user has read a newsitem.
In my query I want to get every newsitem, including the number of ratings for that newsitem and the average rating, and how many times that newsitem has been read by the current user.
What I have so far is:
select newsitem.guid, supplier, count(newsrating.id) as numberofratings,
avg(newsrating.rating) as rating,
count(case usernews.UserId when 3 then 1 else null end) as numberofreads from newsitem
left join newsrating on newsitem.guid = newsrating.newsguid
left join usernews on newsitem.guid = usernews.newsguid
group by newsitem.guid
I have created an sql fiddle here: http://sqlfiddle.com/#!9/c8add/8
Both count() calls don't return the numbers I want. numberofratings should return the total number of ratings for that newsitem (by all users). numberofreads should return the number of reads for the current user for that newsitem.
So, newsitem with guid d104c330-c319-40e8-8be3-a7c4f549d35c should have 2 ratings and 3 reads for the current user with userid = 3.
I have tried conditional counts and sums, but no success yet. How can this be accomplished?
The main problem that I see is that you're joining in both tables together, which means that you're going to effectively be multiplying out by both numbers, which is why your counts aren't going to be correct. For example, if the Newsitem has been read 3 times by the user and rated by 8 users then you're going to end up getting 24 rows, so it will look like it has been rated 24 times. You can add a DISTINCT to your COUNT of the ratings IDs and that should correct that issue. Average should be unaffected because the average of 1 and 2 is the same as the average of 1, 1, 2, & 2 (for example).
You can then handle the reads by adding the userid to the JOIN condition (since it's an OUTER JOIN it shouldn't cause any loss of results) instead of in a CASE statement for your COUNT, then you can do a COUNT on distinct id values from Usernews. The resulting query would be:
SELECT
I.guid,
I.supplier,
COUNT(DISTINCT R.id) AS number_of_ratings,
AVG(R.rating) AS avg_rating,
COUNT(DISTINCT UN.id) AS number_of_reads
FROM
NewsItem I
LEFT OUTER JOIN NewsRating R ON R.newsguid = I.guid
LEFT OUTER JOIN UserNews UN ON
UN.newsguid = I.guid AND
UN.userid = #userid
GROUP BY
I.guid,
I.supplier
While that should work, you might get better results from a subquery, as the above needs to explode out the results and then aggregate them, perhaps unnecessarily. Also, some people might find the below to be a little clearer.
SELECT
I.guid,
I.supplier,
R.number_of_ratings,
R.avg_rating,
COUNT(*) AS number_of_reads
FROM
NewsItem I
LEFT OUTER JOIN
(
SELECT
newsguid,
COUNT(*) AS number_of_ratings,
AVG(rating) AS avg_rating
FROM
NewsRating
GROUP BY
newsguid
) R ON R.newsguid = I.guid
LEFT OUTER JOIN UserNews UN ON UN.newsguid = I.guid AND UN.userid = #userid
GROUP BY
I.guid,
I.supplier,
R.number_of_ratings,
R.avg_rating
I'm with Tom you should use a subquery to calculate the user count.
SQL Fiddle Demo
SELECT NI.guid,
NI.supplier,
COUNT(NR.ID) as numberofratings,
AVG(NR.rating) as rating,
user_read as numberofreads
FROM newsitem NI
LEFT JOIN newsrating NR
ON NI.guid = NR.newsguid
LEFT JOIN (SELECT NewsGuid, COUNT(*) user_read
FROM usernews
WHERE UserId = 3 -- use a variable #user_id here
GROUP BY NewsGuid) UR
ON NI.guid = UR.NewsGuid
GROUP BY NI.guid,
NI.supplier,
numberofreads;
Related
I just can't figure out how to get average rating and count comments from my mysql database.
I have 3 tables (activity, rating, comments) activity contains the main data the "activities", rating holds the ratings and comments - of course, the ratings.
activity_table
id | title |short_desc | long_desc | address | lat | long |last_updated
rating_table
id | activityid | userid | rating
comment_table
id | activityid | userid | rating
I'm now trying to the data from activity plus the comment_counts and average_rating in one query.
SELECT activity.*, AVG(rating.rating) as average_rating, count(comments.activityid) as total_comments
FROM activity LEFT JOIN
rating
ON activity.aid = rating.activityid LEFT JOIN
comments
ON activity.aid = comments.activityid
GROUP BY activity.aid
...doesn't do the job. It gives me the right average_rating, but the wrong amount of comments.
Any ideas?
Thanks a lot!
You are aggregating along two different dimensions. The Cartesian product generated by the joins affects the aggregation.
So, you should aggregate before the joins:
SELECT a.*, r.average_rating, COALESCE(c.total_comments, 0) as total_comments
FROM activity a LEFT JOIN
(SELECT r.activityid, AVG(r.rating) as average_rating
FROM rating r
GROUP BY r.activityid
) r
ON a.aid = r.activityid LEFT JOIN
(SELECT c.activityid, COUNT(*) as total_comments
FROM comments c
GROUP BY c.activityid
) c
ON a.aid = c.activityid;
Notice that the outer GROUP BY is no longer needed.
Working example: http://sqlfiddle.com/#!9/80995/20
I have three tables, a user table, a user_group table, and a link table.
The link table contains the dates that users were added to user groups. I need a query that returns the count of users currently in each group. The most recent date determines the group that the user is currently in.
SELECT
user_groups.name,
COUNT(l.name) AS ct,
GROUP_CONCAT(l.`name` separator ", ") AS members
FROM user_groups
LEFT JOIN
(SELECT MAX(added), group_id, name FROM link LEFT JOIN users ON users.id = link.user_id GROUP BY user_id) l
ON l.group_id = user_groups.id
GROUP BY user_groups.id
My question is if the query I have written could be optimized, or written better.
Thanks!
Ben
You actual query is not giving you the answer you want; at least, as far as I understand your question. John actually joined group 2 on 2017-01-05, yet it appears on group 1 (that he joined on 2017-01-01) on your results. Note also you're missing one Group 4.
Using standard SQL, I think the next query is what you're looking for. The comments in the query should clarify what each part is doing:
SELECT
user_groups.name AS group_name,
COUNT(u.name) AS member_count,
group_concat(u.name separator ', ') AS members
FROM
user_groups
LEFT JOIN
(
SELECT * FROM
(-- For each user, find most recent date s/he got into a group
SELECT
user_id AS the_user_id, MAX(added) AS last_added
FROM
link
GROUP BY
the_user_id
) AS u_a
-- Join back to the link table, so that the `group_id` can be retrieved
JOIN link l2 ON l2.user_id = u_a.the_user_id AND l2.added = u_a.last_added
) AS most_recent_group ON most_recent_group.group_id = user_groups.id
-- And get the users...
LEFT JOIN users u ON u.id = most_recent_group.the_user_id
GROUP BY
user_groups.id, user_groups.name
ORDER BY
user_groups.name ;
This can be written in a more compact way in MySQL (abusing the fact that, in older versions of MySQL, it doesn't follow the SQL standard for the GROUP BY restrictions).
That's what you'll get:
group_name | member_count | members
:--------- | -----------: | :-------------
Group 1 | 2 | Mikie, Dominic
Group 2 | 2 | John, Paddy
Group 3 | 0 | null
Group 4 | 1 | Nellie
dbfiddle here
Note that this query can be simplified if you use a database with window functions (such as MariaDB 10.2). Then, you can use:
SELECT
user_groups.name AS group_name,
COUNT(u.name) AS member_count,
group_concat(u.name separator ', ') AS members
FROM
user_groups
LEFT JOIN
(
SELECT
user_id AS the_user_id,
last_value(group_id) OVER (PARTITION BY user_id ORDER BY added) AS group_id
FROM
link
GROUP BY
user_id
) AS most_recent_group ON most_recent_group.group_id = user_groups.id
-- And get the users...
LEFT JOIN users u ON u.id = most_recent_group.the_user_id
GROUP BY
user_groups.id, user_groups.name
ORDER BY
user_groups.name ;
dbfiddle here
so I have a problem with my query. I have 2 tables:
courses:
The user_id in this table is the instructor of the course.
-----------------------------------------------------------------------
| course_id | user_id | course_name | other information |
-----------------------------------------------------------------------
| 6 | 1 | My Course 1 | ... |
-----------------------------------------------------------------------
my_courses:
The user_id in this table is a student of the course.
--------------------------------------------------
| user_id | course_id | created_at |
--------------------------------------------------
| 5 | 6 | [UNIX_TIMESTAMP] |
--------------------------------------------------
The my_courses contains the number of people that have joined that course. I want to get all the course info as well as the number of people that have joined a course. Everything returns as expected except the number of people that joined a course. This is the query I'm using:
SELECT
courses.*,
users.name, //This is the name of the instructor
users.last_name, //This is the last name of the instructor
COUNT(my_courses.user_id) as count_students
FROM courses
LEFT JOIN users
ON courses.user_id = courses.user_id
LEFT JOIN my_courses
ON courses.course_id = my_courses.course_id
WHERE courses.course_id = '6'
Like I said, this query returns the course info like normal but returns 3 as count_students when it should only return 1. Does anyone know why this is happening? Any help is greatly appreciated.
It's probably because of the JOIN condition for users. It should be:
LEFT JOIN users
ON courses.user_id = users.user_id
You should also add a GROUP BY clause in your query:
SELECT
c.*,
u.name,
u.last_name,
COUNT(mc.user_id) AS count_students
FROM courses c
LEFT JOIN users u
ON c.user_id = u.user_id
LEFT JOIN my_courses mc
ON c.course_id = mc.course_id
WHERE c.course_id = '6'
GROUP BY
<columns not in the aggregate function>
Additionally, alias your tables to improve readability.
JOIN operations cause combinatorial multiplication of rows. You need to summarize the student count from its own table like so.
SELECT course_id, COUNT(*) students
FROM my_courses
GROUP BY course_id
That gives you a result set with either one or zero rows per course_id. You can then join it to the rest of your query.
SELECT courses.*,
users.name, //This is the name of the instructor
users.last_name, //This is the last name of the instructor
aggr.count_students
FROM courses
LEFT JOIN users ON courses.user_id = courses.user_id
LEFT JOIN (
SELECT course_id, COUNT(*) students
FROM my_courses
GROUP BY course_id
) aggr ON courses.course_id = aggr.course_id
WHERE courses.course_id = '6'
That way you'll avoid multiple-counting your students for courses with, perhaps, more than one instructor.
This query suggests friendship based on how many words users have in common. in_common sets this threshold.
I was wondering if it was possible to make this query completely % based.
What I want to do is have user suggested to current user, if 30% of their words match.
curent_user total words 100
in_common threshold 30
some_other_user total words 10
3 out of these match current_users list.
Since 3 is 30% of 10, this is a match for the current user.
Possible?
SELECT users.name_surname, users.avatar, t1.qty, GROUP_CONCAT(words_en.word) AS in_common, (users.id) AS friend_request_id
FROM (
SELECT c2.user_id, COUNT(*) AS qty
FROM `connections` c1
JOIN `connections` c2
ON c1.user_id <> c2.user_id
AND c1.word_id = c2.word_id
WHERE c1.user_id = :user_id
GROUP BY c2.user_id
HAVING count(*) >= :in_common) as t1
JOIN users
ON t1.user_id = users.id
JOIN connections
ON connections.user_id = t1.user_id
JOIN words_en
ON words_en.id = connections.word_id
WHERE EXISTS(SELECT *
FROM connections
WHERE connections.user_id = :user_id
AND connections.word_id = words_en.id)
GROUP BY users.id, users.name_surname, users.avatar, t1.qty
ORDER BY t1.qty DESC, users.name_surname ASC
SQL fiddle: http://www.sqlfiddle.com/#!2/c79a6/9
OK, so the issue is "users in common" defined as asymmetric relation. To fix it, let's assume that in_common percentage threshold is checked against user with the least words.
Try this query (fiddle), it gives you full list of users with at least 1 word in common, marking friendship suggestions:
SELECT user1_id, user2_id, user1_wc, user2_wc,
count(*) AS common_wc, count(*) / least(user1_wc, user2_wc) AS common_wc_pct,
CASE WHEN count(*) / least(user1_wc, user2_wc) > 0.7 THEN 1 ELSE 0 END AS frienship_suggestion
FROM (
SELECT u1.user_id AS user1_id, u2.user_id AS user2_id,
u1.word_count AS user1_wc, u2.word_count AS user2_wc,
c1.word_id AS word1_id, c2.word_id AS word2_id
FROM connections c1
JOIN connections c2 ON (c1.user_id < c2.user_id AND c1.word_id = c2.word_id)
JOIN (SELECT user_id, count(*) AS word_count
FROM connections
GROUP BY user_id) u1 ON (c1.user_id = u1.user_id)
JOIN (SELECT user_id, count(*) AS word_count
FROM connections
GROUP BY user_id) u2 ON (c2.user_id = u2.user_id)
) AS shared_words
GROUP BY user1_id, user2_id, user1_wc, user2_wc;
Friendship_suggestion is on SELECT for clarity, you probably need to filter by it, so yu may just move it to HAVING clause.
I throw this option into your querying consideration... The first part of the from query is to do nothing but get the one user you are considering as the basis to find all others having common words. The where clause is for that one user (alias result OnePerson).
Then, add to the from clause (WITHOUT A JOIN) since the OnePerson record will always be a single record, we want it's total word count available, but didn't actually see how your worked your 100 to 30 threashold if another person only had 10 words to match 3... I actually think its bloat and unnecessary as you'll see later in the where of PreQuery.
So, the next table is the connections table (aliased c2) and that is normal INNER JOIN to the words table for each of the "other" people being considered.
This c2 is then joined again to the connections table again alias OnesWords based on the common word Id -- AND -- the OnesWords user ID is that of the primary user_id being compared against. This OnesWords alias is joined to the words table so IF THERE IS a match to the primary person, we can grab that "common word" as part of the group_concat().
So, now we grab the original single person's total words (still not SURE you need it), a count of ALL the words for the other person, and a count (via sum/case when) of all words that ARE IN COMMON with the original person grouped by the "other" user ID. This gets them all and results as alias "PreQuery".
Now, from that, we can join that to the user's table to get the name and avatar along with respective counts and common words, but apply the WHERE clause based on the total per "other users" available words to the "in common" with the first person's words (see... I didn't think you NEEDED the original query/count as basis of percentage consideration).
SELECT
u.name_surname,
u.avatar,
PreQuery.*
from
( SELECT
c2.user_id,
One.TotalWords,
COUNT(*) as OtherUserWords,
GROUP_CONCAT(words_en.word) AS InCommonWords,
SUM( case when OnesWords.word_id IS NULL then 0 else 1 end ) as InCommonWithOne
from
( SELECT c1.user_id,
COUNT(*) AS TotalWords
from
`connections` c1
where
c1.user_id = :PrimaryPersonBasis ) OnePerson,
`connections` c2
LEFT JOIN `connections` OnesWords
ON c2.word_id = OnesWords.word_id
AND OnesWords.user_id = OnePerson.User_ID
LEFT JOIN words_en
ON OnesWords.word_id = words_en.id
where
c2.user_id <> OnePerson.User_ID
group by
c2.user_id ) PreQuery
JOIN users u
ON PreQuery.user_id = u.id
where
PreQuery.OtherUserWords * :nPercentToConsider >= PreQuery.InCommonWithOne
order by
PreQuery.InCommonWithOne DESC,
u.name_surname
Here's a revised WITHOUT then need to prequery the total original words of the first person.
SELECT
u.name_surname,
u.avatar,
PreQuery.*
from
( SELECT
c2.user_id,
COUNT(*) as OtherUserWords,
GROUP_CONCAT(words_en.word) AS InCommonWords,
SUM( case when OnesWords.word_id IS NULL then 0 else 1 end ) as InCommonWithOne
from
`connections` c2
LEFT JOIN `connections` OnesWords
ON c2.word_id = OnesWords.word_id
AND OnesWords.user_id = :PrimaryPersonBasis
LEFT JOIN words_en
ON OnesWords.word_id = words_en.id
where
c2.user_id <> :PrimaryPersonBasis
group by
c2.user_id
having
COUNT(*) * :nPercentToConsider >=
SUM( case when OnesWords.word_id IS NULL then 0 else 1 end ) ) PreQuery
JOIN users u
ON PreQuery.user_id = u.id
order by
PreQuery.InCommonWithOne DESC,
u.name_surname
There might be some tweaking on the query, but your original query leads me to believe you can easily find simple things like alias or field name type-o instances.
Another options might be to prequery ALL users and how many respective words they have UP FRONT, then use the primary person's words to compare to anyone else explicitly ON those common words... This might be more efficient as the multiple joins would be better on the smaller result set. What if you have 10,000 users and user A has 30 words, and only 500 other users have one or more of those words in common... why compare against all 10,000... but if having up-front a simple summary of each user and how many should be an almost instant query basis.
SELECT
u.name_surname,
u.avatar,
PreQuery.*
from
( SELECT
OtherUser.User_ID,
AllUsers.EachUserWords,
COUNT(*) as CommonWordsCount,
group_concat( words_en.word ) as InCommonWords
from
`connections` OneUser
JOIN words_en
ON OneUser.word_id = words_en.id
JOIN `connections` OtherUser
ON OneUser.word_id = OtherUser.word_id
AND OneUser.user_id <> OtherUser.user_id
JOIN ( SELECT
c1.user_id,
COUNT(*) as EachUserWords
from
`connections` c1
group by
c1.user_id ) AllUsers
ON OtherUser.user_id = AllUsers.User_ID
where
OneUser.user_id = :nPrimaryUserToConsider
group by
OtherUser.User_id,
AllUsers.EachUserWords ) as PreQuery
JOIN users u
ON PreQuery.uer_id = u.id
where
PreQuery.EachUserWords * :nPercentToConsider >= PreQuery.CommonWordCount
order by
PreQuery.CommonWordCount DESC,
u.name_surname
May I suggest a different way to look at your problem?
You might look into a similarity metric, such as Cosine Similarity which will give you a much better measure of similarity between your users based on words. To understand it for your case, consider the following example. You have a vector of words A = {house, car, burger, sun} for a user u1 and another vector B = {flat, car, pizza, burger, cloud} for user u2.
Given these individual vectors you first construct another that positions them together so you can map to each user whether he/she has that word in its vector or not. Like so:
| -- | house | car | burger | sun | flat | pizza | cloud |
----------------------------------------------------------
| A | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
----------------------------------------------------------
| B | 0 | 1 | 1 | 0 | 1 | 1 | 1 |
----------------------------------------------------------
Now you have a vector for each user where each position corresponds to the value of each word to each user. Here it represents a simple count but you can improve it using different metrics based on word frequency if that applies to your case. Take a look at the most common one, called tf-idf.
Having these two vectors, you can compute the cosine similarity between them as follows:
Which basically is computing the sum of the product between each position of the vectors above, divided by their corresponding magnitude. In our example, that is 0.47, in a range that can vary between 0 and 1, the higher the most similar the two vectors are.
If you choose to go this way, you don't need to do this calculation in the database. You compute the similarity in your code and just save the result in the database. There are several libraries that can do that for you. In Python, take a look at the numpy library. In Java, look at Weka and/or Apache Lucene.
I have three tables (MySQL)
forum: each line in this table is a comment in the forum related to the match by static_id and related to the author by user_id
|match_static_id| date | time | comments | user_id |
matches: this table contains matches with all its information
| static_id | localteam_name | visitorteam_name | date | time |.......
iddaa : this table contains a code for each match (some matches do not have codes here)
|match_static_id| iddaa_code |
I make a query like following:
SELECT forum.match_static_id, forum.date, forum.time,
count(forum.comments) 'comments_no', matches.*, users.username, iddaa.iddaa_code
FROM forum
INNER JOIN matches ON forum.match_static_id = matches.static_id
INNER JOIN users on forum.user_id = users.id
LEFT JOIN iddaa on forum.match_static_id = iddaa.match_static_id
GROUP BY forum.match_static_id
ORDER BY forum.date DESC, forum.time DESC
the query work as I want (I get the match information, iddaa code for the match if there is one, and the author of the comment(last comment) ).
The problem is in the "count function" I should get the number of the comments related to the same match bur the query returned (double of each value)
for example if I have 5 comments for a match it returns 10
I want to know if all parts of my query is right and any help will be good?
Maybe it can be wrapped in a sub query? Its hard when i dont have the table def + data.
SELECT Sub.*, COUNT(1) 'comments_no'
FROM
(
SELECT forum.match_static_id, forum.date, forum.time,
matches.*, users.username, iddaa.iddaa_code
FROM forum
INNER JOIN matches ON forum.match_static_id = matches.static_id
INNER JOIN users on forum.user_id = users.id
GROUP BY forum.match_static_id
) Sub
LEFT JOIN iddaa on Sub.match_static_id = iddaa.match_static_id
ORDER BY forum.date DESC, forum.time DESC