MYSQL Join three tables and group by joined date - mysql

I am a MySQL beginner so please be patient with me!
I have three tables
1-Order
Order Id | Order Date
2-users
user Id | Registration date
3-connection
user Id | Order ID
I am trying to calculate conversion rate grouped by date
The conversion rate, example, 300 users registered in January 2013, and 100 of them made one or more purchases, the conversion rate of the registered user to purchaser is 33%
so the final output should be
table
date | number of registered users | number of orders | Conversion rate
Jan-2011| 300 | 100 | 33%
many thanks for the help!

I think something like this is that you need:
select u.registrationDate as date,
count(distinct u.id) as number_of_registered_users,
count(distinct o.id) as number_of_orders,
round(count(distinct o.id)/count(distinct u.id)*100) as conversion_rate
from users u
left join `Order` o on u.registrationDate = o.orderDate
left join connection c on c.orderId = o.id and c.usersId = u.id
group by u.registrationDate;
See SQL Fiddle.

Related

Count comments and get average rating from mysql

I just can't figure out how to get average rating and count comments from my mysql database.
I have 3 tables (activity, rating, comments) activity contains the main data the "activities", rating holds the ratings and comments - of course, the ratings.
activity_table
id | title |short_desc | long_desc | address | lat | long |last_updated
rating_table
id | activityid | userid | rating
comment_table
id | activityid | userid | rating
I'm now trying to the data from activity plus the comment_counts and average_rating in one query.
SELECT activity.*, AVG(rating.rating) as average_rating, count(comments.activityid) as total_comments
FROM activity LEFT JOIN
rating
ON activity.aid = rating.activityid LEFT JOIN
comments
ON activity.aid = comments.activityid
GROUP BY activity.aid
...doesn't do the job. It gives me the right average_rating, but the wrong amount of comments.
Any ideas?
Thanks a lot!
You are aggregating along two different dimensions. The Cartesian product generated by the joins affects the aggregation.
So, you should aggregate before the joins:
SELECT a.*, r.average_rating, COALESCE(c.total_comments, 0) as total_comments
FROM activity a LEFT JOIN
(SELECT r.activityid, AVG(r.rating) as average_rating
FROM rating r
GROUP BY r.activityid
) r
ON a.aid = r.activityid LEFT JOIN
(SELECT c.activityid, COUNT(*) as total_comments
FROM comments c
GROUP BY c.activityid
) c
ON a.aid = c.activityid;
Notice that the outer GROUP BY is no longer needed.

Sql conditional count with join

I cannot find the answer to my problem here on stackoverflow. I have a query that spans 3 tables:
newsitem
+------+----------+----------+----------+--------+----------+
| Guid | Supplier | LastEdit | ShowDate | Title | Contents |
+------+----------+----------+----------+--------+----------+
newsrating
+----+----------+--------+--------+
| Id | NewsGuid | UserId | Rating |
+----+----------+--------+--------+
usernews
+----+----------+--------+----------+
| Id | NewsGuid | UserId | ReadDate |
+----+----------+--------+----------+
Newsitem obviously contains newsitems, newsrating contains ratings that users give to newsitems, and usernews contains the date when a user has read a newsitem.
In my query I want to get every newsitem, including the number of ratings for that newsitem and the average rating, and how many times that newsitem has been read by the current user.
What I have so far is:
select newsitem.guid, supplier, count(newsrating.id) as numberofratings,
avg(newsrating.rating) as rating,
count(case usernews.UserId when 3 then 1 else null end) as numberofreads from newsitem
left join newsrating on newsitem.guid = newsrating.newsguid
left join usernews on newsitem.guid = usernews.newsguid
group by newsitem.guid
I have created an sql fiddle here: http://sqlfiddle.com/#!9/c8add/8
Both count() calls don't return the numbers I want. numberofratings should return the total number of ratings for that newsitem (by all users). numberofreads should return the number of reads for the current user for that newsitem.
So, newsitem with guid d104c330-c319-40e8-8be3-a7c4f549d35c should have 2 ratings and 3 reads for the current user with userid = 3.
I have tried conditional counts and sums, but no success yet. How can this be accomplished?
The main problem that I see is that you're joining in both tables together, which means that you're going to effectively be multiplying out by both numbers, which is why your counts aren't going to be correct. For example, if the Newsitem has been read 3 times by the user and rated by 8 users then you're going to end up getting 24 rows, so it will look like it has been rated 24 times. You can add a DISTINCT to your COUNT of the ratings IDs and that should correct that issue. Average should be unaffected because the average of 1 and 2 is the same as the average of 1, 1, 2, & 2 (for example).
You can then handle the reads by adding the userid to the JOIN condition (since it's an OUTER JOIN it shouldn't cause any loss of results) instead of in a CASE statement for your COUNT, then you can do a COUNT on distinct id values from Usernews. The resulting query would be:
SELECT
I.guid,
I.supplier,
COUNT(DISTINCT R.id) AS number_of_ratings,
AVG(R.rating) AS avg_rating,
COUNT(DISTINCT UN.id) AS number_of_reads
FROM
NewsItem I
LEFT OUTER JOIN NewsRating R ON R.newsguid = I.guid
LEFT OUTER JOIN UserNews UN ON
UN.newsguid = I.guid AND
UN.userid = #userid
GROUP BY
I.guid,
I.supplier
While that should work, you might get better results from a subquery, as the above needs to explode out the results and then aggregate them, perhaps unnecessarily. Also, some people might find the below to be a little clearer.
SELECT
I.guid,
I.supplier,
R.number_of_ratings,
R.avg_rating,
COUNT(*) AS number_of_reads
FROM
NewsItem I
LEFT OUTER JOIN
(
SELECT
newsguid,
COUNT(*) AS number_of_ratings,
AVG(rating) AS avg_rating
FROM
NewsRating
GROUP BY
newsguid
) R ON R.newsguid = I.guid
LEFT OUTER JOIN UserNews UN ON UN.newsguid = I.guid AND UN.userid = #userid
GROUP BY
I.guid,
I.supplier,
R.number_of_ratings,
R.avg_rating
I'm with Tom you should use a subquery to calculate the user count.
SQL Fiddle Demo
SELECT NI.guid,
NI.supplier,
COUNT(NR.ID) as numberofratings,
AVG(NR.rating) as rating,
user_read as numberofreads
FROM newsitem NI
LEFT JOIN newsrating NR
ON NI.guid = NR.newsguid
LEFT JOIN (SELECT NewsGuid, COUNT(*) user_read
FROM usernews
WHERE UserId = 3 -- use a variable #user_id here
GROUP BY NewsGuid) UR
ON NI.guid = UR.NewsGuid
GROUP BY NI.guid,
NI.supplier,
numberofreads;

MySQL MAX_JOIN_SIZE error

I have two tables. One is a call history table which logs calls made (starttime, endtime, phone number, user, etc). The other is an orders table which logs order details (order number, customer info, orderdate, etc.). Orders are not always created when a call is created so there isnt a guaranteed ID to match them up. Right now, I'm interested in getting totals by day. When I try to run a a query to sum calls and join orders by day I get the following error:
The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET MAX_JOIN_SIZE=# if the SELECT is okay
This is the query I use:
SELECT
DATE_FORMAT(c.date_call_start,'%Y-%m-%d') as date,
COUNT(c.id) as calls,
COUNT(o.id) as orders
FROM tbl_calls c
LEFT OUTER JOIN tbl_orders o
ON DATE_FORMAT(c.date_call_start,'%Y-%m-%d') = DATE_FORMAT(o.created,'%Y-%m-%d')
WHERE c.campaign_id = 1
AND DATE_FORMAT(c.date_call_start,'%Y-%m-%d') = '2013-12-09'
GROUP BY DATE_FORMAT(c.date_call_start,'%Y-%m-%d')
Even when there are only a few calls for a particular day, it still shows the same error. So I'm pretty sure it my query that needs work.
I have also tried a sub query, but that doesn't rollup the totals from the subquery.
SELECT
DATE_FORMAT(c.date_call_start,'%Y-%m-%d') as date,
count(c.id) as calls,
(select count(DISTINCT o.id)
FROM tbl_orders o
WHERE DATE_FORMAT(o.created,'%Y-%m-%d') = DATE_FORMAT(c.date_call_start,'%Y-%m-%d')
) as orders
FROM tb_calls c
WHERE c.campaign_id = 1
AND DATE_FORMAT(c.date_call_start,'%Y-%m-%d') BETWEEN '2013-12-09' AND '2013-12-15'
GROUP BY DATE_FORMAT(c.date_call_start,'%Y-%m-%d')
WITH ROLLUP
Any thoughts on how I can get this query to work? Ultimately I'd like a result like below so I can do other calculations like % orders etc.
date | calls | orders
------------------------------------
2013-12-01 | 100| 10
2013-12-02 | 125| 20
NULL | 225| 30
UPDATED:
Based on the answer I did the following:
created call_date field with a date field (no datetime) to tbl_calls
created date_order field with a date format (not datetime) to tbl_orders
Updated each table and set the new fields to = date_format(the_date_time_stamp,'%Y-%m-%d') from the same table.
Also added an index to each of the new date fields.
That made the following query work:
SELECT
c.call_date as date,
COUNT(DISTINCT c.id) as calls,
COUNT(DISTINCT o.id) as orders,
ROUND((COUNT(DISTINCT o.id) / COUNT(DISTINCT c.id))*100,2) as conversion
FROM tbl_calls c
JOIN tbl_orders o
ON c.call_date = o.date_order
WHERE c.campaign_id = 1
AND c.call_date BETWEEN '2013-12-09' AND '2013-12-15'
GROUP BY c.call_date
WITH ROLLUP
Which gives me the following result and I can build off of this. Thanks to each of you who provided suggestions. I tried each. All make sense. However, since I ultimately had to create the additional date fields I chose the answer by
date | calls | orders| conversion
-------------------------------------------
2013-12-09 | 151 | 6 | 3.97
2013-12-10 | 164 | 2 | 1.22
2013-12-11 | 165 | 6 | 3.64
2013-12-12 | 189 | 1 | 0.53
2013-12-13 | 116 | 4 | 3.45
null | 785 | 19 | 2.42
First - try the results of EXPLAIN SELECT.... where ... is the rest of your select query above.
Since you're performing the join on two fields which have a function applied to them - I'm take a guess and say MySQL is performing two full table scans and using type all for the join. See this for an explanation of the EXPLAIN output.
DATE_FORMAT(c.date_call_start,'%Y-%m-%d') = DATE_FORMAT(o.created,'%Y-%m-%d')
You'll most likely want to create a separate field in each table that contains just the result of the DATE_FORMAT call. Then create an index for each of these new fields. Then join on these new indexed fields. MySQL should like that much better.
Presumably you want to count the calls and orders for each date. However, that is not what your query does, because it creates a cartesian product for all orders on a given date.
Instead, summarize the data first by date and then combine the results. This may be what you want:
select c.date, calls, orders
from (select DATE_FORMAT(c.date_call_start, '%Y-%m-%d') as date, count(*) as calls
from tbl_calls c
WHERE c.campaign_id = 1 and
DATE_FORMAT(c.date_call_start, '%Y-%m-%d') = '2013-12-09'
group by DATE_FORMAT(c.date_call_start, '%Y-%m-%d')
) c left outer join
(select DATE_FORMAT(o.created,'%Y-%m-%d') as date, count(*) as orders
from tbl_orders o
group by DATE_FORMAT(o.created, '%Y-%m-%d')
) o
on c.date = o.date;
If #Barmar 's suggestion does not work, then you may need to split the fields into DATE and TIME.
A different direction is to make two temp tables (giving you three queries:
CREATE TEMPORARY TABLE `tbl_calls_temp` SELECT * FROM tbl_calls c WHERE DATE(c.date_call_start) = '2013-12-09' AND c.campaign_id = 1
Then do the same restricting for the tbl_orders TABLE
CREATE TEMPORARY TABLE `tbl_orders_temp` SELECT * FROM tbl_orders o WHERE DATE(o.created) = '2013-12-09'
Finally query against the two temporary tables. Depending on how much data you get, you may want to add indexes to the temporary tables... but in all likelihood you are facing a full-join
SELECT
DATE_FORMAT(c.date_call_start,'%Y-%m-%d') as date,
COUNT(c.id) as calls,
COUNT(o.id) as orders
FROM tbl_calls_temp c
LEFT OUTER JOIN tbl_orders_temp o
ON DATE_FORMAT(c.date_call_start,'%Y-%m-%d') = DATE_FORMAT(o.created,'%Y-%m-%d')
GROUP BY DATE_FORMAT(c.date_call_start,'%Y-%m-%d')
And that should be much faster... assuming you have any indexes in your initial tables that can be queried.

SELECTing SUM Across Two Tables for Each User

I have an application with a users table, credit_purchase table, and credit_spend table. I am trying to get the amount of credits the user currently has.
The credit tables are as follows:
credit_purchase
+----+---------+-------+---------------------+
| id | user_id | coins | timestamp |
+----+---------+-------+---------------------+
credit_spend
+----+---------+-------+---------------------+
| id | user_id | coins | timestamp |
+----+---------+-------+---------------------+
What I'm trying to do is select the SUM(credit_purchase.coins) - SUM(credit_spend.coins) for each user. Right now I have something like this:
SELECT users.*, COALESCE(SUM(credit_purchase.coins) - SUM(credit_spend.coins), 0)
FROM users
LEFT JOIN credit_purchase ON users.id = credit_purchase.user_id
LEFT JOIN credit_spend ON users.id = credit_spend.user_id
GROUP BY users.id
but, it the result is not the correct number. It seems to be SUMing the purchases properly, but multiplying the SUM from the spend by the number of purchases.
What am I doing wrong?
Thanks!
try this:
SELECT users.*,
COALESCE(SUM(credit_purchase.coins) - SUM(credit_spend.coins), 0)
FROM users
LEFT JOIN credit_purchase
ON users.id = credit_purchase.user_id
LEFT JOIN credit_spend
ON users.id = credit_spend.user_id
GROUP BY users.id

Getting results from one table, grouped by user and date, joined on the user's name and account info from another table?

Let's say I have three tables (with the columns in each):
USERS: id, name, account_id
ACCOUNTS: id, name
EVENTS: event_id, user_id, date
And I want to get the total count of events from EVENTS for each user, within a date range, along with the user's name and account name.
I can use GROUP BY to group the results by user_id, but how do I then join that to the users and accounts to get that info in each row? I'm trying to get an output like:
+-------+---------------------+--------+
| name | account_name |count(*)|
+-------+---------------------+--------+
| Joe | XYZ, Inc. | 10 |
| Bob | Vandalay Industries | 21 |
| Mary | Account Name Here | 32 |
+-------+---------------------+--------+
where the third column is the total number of events in the EVENTS table for that user_id in a specified date range.
Sorry, I can never get the hang of joins like this..
Assuming that you have an id column on users and accounts tables
SELECT
users.name,
accounts.name,
count(events.event_id)
FROM
users
INNER JOIN events ON events.user_id = users.id
INNER JOIN accounts ON accounts.id = users.account_id
WHERE
events.date between <startdate> AND <enddate>
GROUP BY
users.name,
accounts.name
The obvious ways of doing it would be:
1) group the results of a join query:
SELECT users.name, accounts.name, COUNT(*)
FROM users, accounts, events
WHERE users.account_id=accounts.id
AND users.id=events.user_id
WHERE events.date>$START_DATE
AND events.date<$END_DATE
GROUP BY users.name, accounts.name;
2) ALternatively you could use the consolidated query on just events as a data source and join to that:
SELECT users.name, accounts.name, ilv.event_count
FROM (SELECT user_id, count(*)
FROM events
WHERE events.date>$START_DATE
AND events.date<$END_DATE
GROUP BY user_id) as ilv,
users, accounts
WHERE users.id=ilv.user_id
AND users.account_id=accounts.id;
HTH
C.