I'm looking for help using sum() in my SQL query:
SELECT links.id,
count(DISTINCT stats.id) as clicks,
count(DISTINCT conversions.id) as conversions,
sum(conversions.value) as conversion_value
FROM links
LEFT OUTER JOIN stats ON links.id = stats.parent_id
LEFT OUTER JOIN conversions ON links.id = conversions.link_id
GROUP BY links.id
ORDER BY links.created desc;
I use DISTINCT because I'm doing "group by" and this ensures the same row is not counted more than once.
The problem is that SUM(conversions.value) counts the "value" for each row more than once (due to the group by)
I basically want to do SUM(conversions.value) for each DISTINCT conversions.id.
Is that possible?
I may be wrong but from what I understand
conversions.id is the primary key of your table conversions
stats.id is the primary key of your table stats
Thus for each conversions.id you have at most one links.id impacted.
You request is a bit like doing the cartesian product of 2 sets :
[clicks]
SELECT *
FROM links
LEFT OUTER JOIN stats ON links.id = stats.parent_id
[conversions]
SELECT *
FROM links
LEFT OUTER JOIN conversions ON links.id = conversions.link_id
and for each link, you get sizeof([clicks]) x sizeof([conversions]) lines
As you noted the number of unique conversions in your request can be obtained via a
count(distinct conversions.id) = sizeof([conversions])
this distinct manages to remove all the [clicks] lines in the cartesian product
but clearly
sum(conversions.value) = sum([conversions].value) * sizeof([clicks])
In your case, since
count(*) = sizeof([clicks]) x sizeof([conversions])
count(*) = sizeof([clicks]) x count(distinct conversions.id)
you have
sizeof([clicks]) = count(*)/count(distinct conversions.id)
so I would test your request with
SELECT links.id,
count(DISTINCT stats.id) as clicks,
count(DISTINCT conversions.id) as conversions,
sum(conversions.value)*count(DISTINCT conversions.id)/count(*) as conversion_value
FROM links
LEFT OUTER JOIN stats ON links.id = stats.parent_id
LEFT OUTER JOIN conversions ON links.id = conversions.link_id
GROUP BY links.id
ORDER BY links.created desc;
Keep me posted !
Jerome
Jeromes solution is actually wrong and can produce incorrect results!!
sum(conversions.value)*count(DISTINCT conversions.id)/count(*) as conversion_value
let's assume the following table
conversions
id value
1 5
1 5
1 5
2 2
3 1
the correct sum of value for distinct ids would be 8.
Jerome's formula produces:
sum(conversions.value) = 18
count(distinct conversions.id) = 3
count(*) = 5
18*3/5 = 9.6 != 8
For an explanation of why you were seeing incorrect numbers, read this.
I think that Jerome has a handle on what is causing your error. Bryson's query would work, though having that subquery in the SELECT could be inefficient.
Use the following query:
SELECT links.id
, (
SELECT COUNT(*)
FROM stats
WHERE links.id = stats.parent_id
) AS clicks
, conversions.conversions
, conversions.conversion_value
FROM links
LEFT JOIN (
SELECT link_id
, COUNT(id) AS conversions
, SUM(conversions.value) AS conversion_value
FROM conversions
GROUP BY link_id
) AS conversions ON links.id = conversions.link_id
ORDER BY links.created DESC
I use a subquery to do this. It eliminates the problems with grouping.
So the query would be something like:
SELECT COUNT(DISTINCT conversions.id)
...
(SELECT SUM(conversions.value) FROM ....) AS Vals
How about something like this:
select l.id, count(s.id) clicks, count(c.id) clicks, sum(c.value) conversion_value
from (SELECT l.id id, l.created created,
s.id clicks,
c.id conversions,
max(c.value) conversion_value
FROM links l
LEFT JOIN stats s ON l.id = s.parent_id
LEFT JOIN conversions c ON l.id = c.link_id
GROUP BY l.id, l.created, s.id, c.id) t
order by t.created
This will do the trick, just divide the sum with the count of conversation id which are duplicate.
SELECT a.id,
a.clicks,
SUM(a.conversion_value/a.conversions) AS conversion_value,
a.conversions
FROM (SELECT links.id,
COUNT(DISTINCT stats.id) AS clicks,
COUNT(conversions.id) AS conversions,
SUM(conversions.value) AS conversion_value
FROM links
LEFT OUTER JOIN stats ON links.id = stats.parent_id
LEFT OUTER JOIN conversions ON links.id = conversions.link_id
GROUP BY conversions.id,links.id
ORDER BY links.created DESC) AS a
GROUP BY a.id
Select sum(x.value) as conversion_value,count(x.clicks),count(x.conversions)
FROM
(SELECT links.id,
count(DISTINCT stats.id) as clicks,
count(DISTINCT conversions.id) as conversions,
conversions.value,
FROM links
LEFT OUTER JOIN stats ON links.id = stats.parent_id
LEFT OUTER JOIN conversions ON links.id = conversions.link_id
GROUP BY conversions.id) x
GROUP BY x.id
ORDER BY x.created desc;
I believe this will give you the answer that you are looking for.
Related
I'm currently creating a small application where users can post a text which can be commented and the post can also be voted (+1 or -1).
This is my database:
Now I want to select all information of all posts with status = 1 plus two extra columns: One column containing the count of comments and one column containing the sum (I call it score) of all votes.
I currently use the following query, which correctly adds the count of the comments:
SELECT *, COUNT(comments.fk_commented_post) as comments
FROM posts
LEFT JOIN comments
ON posts.id_post = comments.fk_commented_post
AND comments.status = 1
WHERE posts.status = 1
GROUP BY posts.id_post
Then I tried to additionally add the sum of the votes, using the following query:
SELECT *, COUNT(comments.fk_commented_post) as comments, SUM(votes_posts.type) as score
FROM posts
LEFT JOIN comments
ON posts.id_post = comments.fk_commented_post
AND comments.status = 1
LEFT JOIN votes_posts
ON posts.id_post = votes_posts.fk_voted_post
WHERE posts.status = 1
GROUP BY posts.id_post
The result is no longer correct for either the votes or the comments. Somehow some of the values seem to be getting multiplied...
This is probably simpler using correlated subqueries:
select p.*,
(select count(*)
from comments c
where c.fk_commented_post = p.id_post and c.status = 1
) as num_comments,
(select sum(vp.type)
from votes_posts vp
where c.fk_voted_post = p.id_post
) as num_score
from posts p
where p.status = 1;
The problem with join is that the counts get messed up because the two other tables are not related to each tother -- so you get a Cartesian product.
You want to join comments counts and votes counts to the posts. So, aggregate to get the counts, then join.
select
p.*,
coalesce(c.cnt, 0) as comments,
coalesce(v.cnt, 0) as votes
from posts p
left join
(
select fk_commented_post as id_post, count(*) as cnt
from comments
where status = 1
group by fk_commented_post
) c on c.id_post = p.id_post
left join
(
select fk_voted_post as id_post, count(*) as cnt
from votes_posts
group by fk_voted_post
) v on v.id_post = p.id_post
where p.status = 1
order by p.id_post;
I am trying to make a query to fetch the newest car for each user:
select * from users
left join
(select cars.* from cars
where cars.userid=users.userid
order by cars.year desc limit 1) as cars
on cars.userid=users.userid
It looks like it says Unknown column "users.userid" in where clause
I tried to remove cars.userid=users.userid part, but then it only fetches 1 newest car, and sticks it on to each user.
Is there any way to accomplish what I'm after? thanks!!
For this purpose, I usually use row_number():
select *
from users u left join
(select c.* , row_number() over (partition by c.userid order by c.year desc) as seqnum
from cars c
) c
on c.userid = u.userid and c.seqnum = 1;
One option is to filter the left join with a subquery:
select * -- better enumerate the columns here
from users u
left join cars c
on c.userid = u.userid
and c.year = (select max(c1.year) from cars c1 where c1.userid = c.userid)
For performance, consider an index on car(userid, year).
Note that this might return multiple cars per user if you have duplicate (userid, year) in cars. It would be better to have a real date rather than just the year.
Maybe there are better and more efficient way to query this. Here is my solution;
select users.userid, cars.*
from users
left join cars on cars.userid = users.userid
join (SELECT userid, MAX(year) AS maxDate
FROM cars
GROUP BY userid) as sub on cars.year = sub.maxDate;
I got my query working but it doesn't count the rows... in my left outer join.
SELECT mUserId,mUserName,COALESCE(x.likeId,0) AS likeCount
FROM likes
LEFT JOIN members ON likes.likeMember = members.mUserId
LEFT OUTER JOIN (SELECT likeId, count(*) n FROM likes WHERE likeMember = likes.likeMember) x ON likes.likeMember = x.likeId
WHERE likeDate > '2014-11-16 07:44:47'
GROUP BY likeMember
ORDER BY `likeCount` DESC
Any suggestions?
This is your query:
SELECT mUserId,mUserName,COALESCE(x.likeId,0) AS likeCount
FROM likes LEFT JOIN
members
ON likes.likeMember = members.mUserId LEFT OUTER JOIN
(SELECT likeId, count(*) n
FROM likes
WHERE likeMember = likes.likeMember
) x
ON likes.likeMember = x.likeId
WHERE likeDate > '2014-11-16 07:44:47'
GROUP BY likeMember
ORDER BY `likeCount` DESC;
It is a bit absurd. Either do an aggregation in the subquery. Or do an aggregation in the outer query. But not both. I suspect you want something more like this:
SELECT m.mUserId, m.mUserName, COUNT(*) AS likeCount
FROM likes l LEFT JOIN
members m
ON l.likeMember = m.likeId
WHERE l.likeDate > '2014-11-16 07:44:47'
GROUP BY l.likeMember
ORDER BY `likeCount` DESC;
The problem with your subquery is the WHERE clause. You think it is correlated to the outer query. But it is really interpreted as:
WHERE likes.likeMember = likes.likeMember
In other words, the condition is true whenever likes.likeMember is not NULL.
I have this php mysql statement
SELECT a.*, p.filename, m.`first name`, m.`last name`, m.`mobile number`, m.`status`, m.`email address`
FROM map a
join members m on a.members_id = m.id
join pictures p on m.pictures_id = p.id
WHERE a.active = 1
GROUP BY a.members_id
order by a.`date added` DESC
limit 1;
However it's not working. The map table has records, and many of them can have the same members_id value. I want to group them by the members_id, then order them by date added, so the most recent is on top of each group, then only get the top row (i.e. get most recent of each group).
Does anyone know whats wrong here?
Thanks
Try:
select * from
(SELECT a.*,
p.filename,
m.`first name`, m.`last name`, m.`mobile number`, m.`status`, m.`email address`
FROM map a
join members m on a.members_id = m.id
join pictures p on m.pictures_id = p.id
WHERE a.active = 1
order by a.members_id, a.`date added` DESC) sq
GROUP BY members_id;
Note that the fact that MySQL returns the first row when grouping is not documented and may change in future releases - so although this query should work with current versions of MySQL, it is not guaranteed to do so in future.
If you want to get one result per map, you have to select it in two steps - so with a subquery. The inner query gets the newest map per member and the outer query gets all the data. Be careful with the indices, otherwise it will be very slow.
I think it will be something like:
SELECT a.*, p.filename, m.`first name`, m.`last name`, m.`mobile number`, m.`status`, m.`email address`
FROM map a
inner join members m on a.members_id = m.id
inner join pictures p on m.pictures_id = p.id
inner join (
select max(a.`date added`) as maxdate from map ia where ia.members_id = m.id)
) as sub_a on sub_a.member_id = a.member_id and sub_a.maxdate = a.`date added`
WHERE a.active = 1
That depends on a single maximal date added, otherwise you will need some more tricks.
One approach is to use a where clause to filter out the records you are not interested in:
SELECT a.*, p.filename, m.`first name`, m.`last name`, m.`mobile number`, m.`status`, m.`email address`
FROM map a
join members m on a.members_id = m.id
join pictures p on m.pictures_id = p.id
WHERE a.active = 1 and
a.`date added` = (select max(map.`date added`)
from map
where map.members_id = a.members_id and
map.active = 1
)
GROUP BY a.members_id
order by a.`date added` DESC;
You cannot do this in one query. Replace the from map part with a subselect
select max(map_id) as map_id, members_id, max(date_added) as date_added from map where active = 1 group by members_id
This will give you all the members with the last dates. I have assumed your map_id existing, and being auto_increment. Use this instead of the original map table, and you will not need the group by, order and limit parts at all.
Im trying to select a table with multiple joins, one for the number of comments using COUNT and one to select the total vote value using SUM, the problem is that the two joins affect each other, instead of showing:
3 votes 2 comments
I get 3 * 2 = 6 votes and 2 * 3 comments
This is the query I'm using:
SELECT t.*, COUNT(c.id) as comments, COALESCE(SUM(v.vote), 0) as votes
FROM (topics t)
LEFT JOIN comments c ON c.topic_id = t.id
LEFT JOIN votes v ON v.topic_id = t.id
WHERE t.id = 9
What you're doing is an SQL antipattern that I call Goldberg Machine. Why make the problem so much harder by forcing it to be done in a single SQL query?
Here is how I would really solve this problem:
SELECT t.*, COUNT(c.id) as comments
FROM topics t
LEFT JOIN comments c ON c.topic_id = t.id
WHERE t.id = 9;
SELECT t.*, SUM(v.vote) as votes
FROM topics t
LEFT JOIN votes v ON v.topic_id = t.id
WHERE t.id = 9;
As you have found, combining these two into one query results in a Cartesian product. There may be clever and subtle ways to force it to give you the correct answer in one query, but what happens when you need a third statistic? It's much simpler to do it in two queries.
SELECT t.*, COUNT(c.id) as comments, COALESCE(SUM(v.vote), 0) as votes
FROM (topics t)
LEFT JOIN comments c ON c.topic_id = t.id
LEFT JOIN votes v ON v.topic_id = t.id
WHERE t.id = 9
GROUP BY t.id
or perhaps
SELECT `topics`.*,
(
SELECT COUNT(*)
FROM `comments`
WHERE `topic_id` = `topics`.`id`
) AS `num_comments`,
(
SELECT IFNULL(SUM(`vote`), 0)
FROM `votes`
WHERE `topic_id` = `topics`.`id`
) AS `vote_total`
FROM `topics`
WHERE `id` = 9
SELECT t.*, COUNT(DISTINCT c.id) as comments, COALESCE(SUM(v.vote), 0) as votes
FROM (topics t)
LEFT JOIN comments c ON c.topic_id = t.id
LEFT JOIN votes v ON v.topic_id = t.id
WHERE t.id = 9