Trying to divide 2 separate queries in MySQL - mysql

I've seen a couple posts on dividing 2 separate queries that seemed helpful but I am still having trouble dividing these two queries. I wrote different sub queries and followed some examples, but I just keep getting errors as the example queries seemed more straight forward (no Joins).
Here is the first query:
SELECT
YEAR(s.created_at) AS year,
COUNT(*) AS pre_sub_buys
FROM subscription_users s
INNER JOIN users u
ON s.user_id = u.uid
LEFT JOIN canvases c
ON u.email = c.ref_email
WHERE c.is_paid=1 AND c.date_created < s.created_at
GROUP BY year;
And I am trying to divide this by:
SELECT
YEAR(s.created_at) AS year,
COUNT(s.created_at) AS subscribers
FROM subscription_users s
LEFT JOIN canvases c
ON c.entries_updated_at = s.updated_at
GROUP BY year;
Essentially, I am looking to find the yearly average between presubscription purchases and subscribers.
Can anyone direct me in the right direction on how to properly do this?
Thank you so much,
Jonathan

You can approach this using Conditional Aggregation:
SELECT
YEAR(s.created_at) AS year,
COUNT(CASE WHEN c.is_paid=1 AND c.date_created < s.created_at THEN 1
ELSE NULL
END) AS pre_sub_buys,
COUNT(s.created_at) AS subscribers,
COUNT(CASE WHEN c.is_paid=1 AND c.date_created < s.created_at THEN 1
ELSE NULL
END) / COUNT(s.created_at) AS pre_sub_buys_divided_by_subscribers
FROM subscription_users s
INNER JOIN users u
ON s.user_id = u.uid
LEFT JOIN canvases c
ON u.email = c.ref_email
GROUP BY year;

Related

SQL query not returning the things I want

I am confused about why my query is not returning the things I want. Can someone please give me a hand on this?
Tables:
Query(CTE):
WITH cancel AS(
SELECT t.Request_at AS day, IFNULL(COUNT(t.Status),0) AS cancelled
FROM Trips t
LEFT JOIN Users u
ON t.Client_Id = u.Users_Id
WHERE (t.Status = "cancelled_by_driver" or t.Status = "cancelled_by_client")
AND t.Request_at BETWEEN "2013-10-01" AND "2013-10-03"
AND u.Banned = "No"
GROUP BY t.Request_at)
So what I want here is to make the cte I have above to return the number of trips that is canceled by the unbanned users or the driver between Oct 1, 2013 and Oct 3, 2013. My query is returning the correct number for the one that got canceled but it is not returning "0" for the date that has no cancellation. I can't figure out why the result is like this as I am using IFNULL and along with left join already.
The where clause evicts dates that have no cancelled drive before you get a chance to include them in the resultset.
If you have all dates available in the table (regardless of the status), you can just move the conditions within the aggregate:
select t.request_at,
sum(t.status in ('cancelled_by_driver', 'cancelled_by_client')) as cnt_cancelled
from trips t
inner join users u on u.user_id = t.client_id
where u.banned = 'No' and t.request_at between '2013-10-01' and '2013-10-03'
group by t.request_at
A more generic approach uses a calendar table to handle the dates, then brings the table with a left join. If you don't have such table, you can generate it on the fly with a recursive query (available in MySQL 8.0):
with recursive cal as (
select '2013-10-01' as dt
union all
select dt + interval 1 day from cal where dt < '2013-10-03'
)
select c.dt, count(t.id) as cnt_cancelled
from cal c
left join trips t on t.request_at = c.dt and t.status in ('cancelled_by_driver', 'cancelled_by_client')
left join users u on u.user_id = t.client_id and u.banned = 'No'
group by c.dt

Restricting query data in mysql

I am getting back into mysql after a couple years and have run into a problem. I have a query that works, but I am lost on how to optimize it better.
Here is the query:
select
u.id as 'User',
count(distinct tr.id) as Trips,
count(distinct ti.id) as 'Trip Items'
from
users u
inner join
user_emails ue on u.id = ue.user_id
inner join
trips tr on tr.user_id = u.id
inner join
trip_items ti on ti.trip_id = tr.id
where
ue.verified = true and ue.is_primary = true
and
tr.created_at between '2017-02-01 00:00:00' and '2017-02-01 00:59:59'
group by 1
having Trips < 30
I essentially need to get a count of all trips and trip items.. but only for those users who have 30 or less trips in the given date range. Right now I am accomplishing that by grouping the results by User, and then performing a 'having'. I'm looking at millions of results on a non-indexed field (created_at). ideally i'd like to just get 1 row back that has total trips, and total trip items. But still applying the "users w/ less than 30 trips" during the query. Is this possible? :)
Just a quick edit, i've tried looking around at other solutions but I am a bit lost on what I should be looking for. I'm not looking for a solution, perhaps just a "go check this out and try that".
count(distinct) can be expensive. Try aggregating before doing the join. I think the follow works (this assumes that items are not shared among different trips):
select u.id as `User`, tr.Trips, tr.items
from users u inner join
user_emails ue
on u.id = ue.user_id inner join
(select tr.user_id, count(*) as Trips, sum(items) as items
from trips tr join
(select ti.trip_id, count(*) as items
from trip_items ti
group by ti.trip_id
) ti
on ti.trip_id = tr.id
where tr.created_at >= '2017-02-01' and tr.created_at < '2017-02-01 01:00:00'
group by tr.user_id
having trips < 30
) tr
on tr.user_id = u.id inner join
where ue.verified = true and ue.is_primary = true
group by 1

Merging 2 sql statements with where clause

I've 2 tables, first one is users(13068), the other one invitations(211343)
fbuid on users is same with inviter on invitations.
So I'm trying export this 2 tables as an excel which should looks like this;
u.name, u.adress, u.fbuid ...., COUNT(i.id)
So for I've tried;
SELECT u.*,(SELECT COUNT(i.id) FROM invitations i WHERE i.isaccepted = 1 and i.inviter = u.fbuid) as chance FROM users WHERE u.datecreated BETWEEN '2013-01-01' AND '2014-01-01' LIMIT 0,50
and
SELECT *,COUNT(i.id) as chance FROM users u LEFT JOIN invitations i ON u.fbuid = i.inviter WHERE u.datecreated BETWEEN '$startdate' AND '$enddate' and i.isaccepted=1 GROUP BY fbuid
Problem is left join gives only users with invitations, but only about 2000 users invited, I need to list all of them.
First one is with limit 50 tooks 36 seconds. I can't imagine how much took all records. Other than join what else I can do? Or how should be the correct way.
This is the query with the left join:
SELECT *, COUNT(i.id) as chance
FROM users u LEFT JOIN
invitations i
ON u.fbuid = i.inviter
WHERE u.datecreated BETWEEN '$startdate' AND '$enddate' and i.isaccepted=1
GROUP BY fbuid;
The problem is that you are filtering on the i table in the where clause. Because of the left join, this could have a value of NULL. Move that condition to the on clause:
SELECT u.*, COUNT(i.id) as chance
FROM users u LEFT JOIN
invitations i
ON u.fbuid = i.inviter and i.isaccepted = 1
WHERE u.datecreated BETWEEN '$startdate' AND '$enddate'
GROUP BY fbuid;

LEFT JOIN after GROUP BY?

I have a table of "Songs", "Songs_Tags" (relating songs with tags) and "Songs_Votes" (relating songs with boolean like/dislike).
I need to retrieve the songs with a GROUP_CONCAT() of its tags and also the number of likes (true) and dislikes (false).
My query is something like that:
SELECT
s.*,
GROUP_CONCAT(st.id_tag) AS tags_ids,
COUNT(CASE WHEN v.vote=1 THEN 1 ELSE NULL END) as votesUp,
COUNT(CASE WHEN v.vote=0 THEN 1 ELSE NULL END) as votesDown,
FROM Songs s
LEFT JOIN Songs_Tags st ON (s.id = st.id_song)
LEFT JOIN Votes v ON (s.id=v.id_song)
GROUP BY s.id
ORDER BY id DESC
The problem is that when a Song has more than 1 tag, it gets returned more then once, so when I do the COUNT(), it returns more results.
The best solution I could think is if it would be possible to do the last LEFT JOIN after the GROUP BY (so now there would be only one entry for each song). Then I'd need another GROUP BY m.id.
Is there a way to accomplish that? Do I need to use a subquery?
There've been some good answers so far, but I would adopt a slightly different method quite similar to what you described originally
SELECT
songsWithTags.*,
COALESCE(SUM(v.vote),0) AS votesUp,
COALESCE(SUM(1-v.vote),0) AS votesDown
FROM (
SELECT
s.*,
COLLATE(GROUP_CONCAT(st.id_tag),'') AS tags_ids
FROM Songs s
LEFT JOIN Songs_Tags st
ON st.id_song = s.id
GROUP BY s.id
) AS songsWithTags
LEFT JOIN Votes v
ON songsWithTags.id = v.id_song
GROUP BY songsWithTags.id DESC
In this the subquery is responsible for collating songs with tags into a 1 row per song basis. This is then joined onto Votes afterwards. I also opted to simply sum up the v.votes column as you have indicated it is 1 or 0 and therefore a SUM(v.votes) will add up 1+1+1+0+0 = 3 out of 5 are upvotes, while SUM(1-v.vote) will sum 0+0+0+1+1 = 2 out of 5 are downvotes.
If you had an index on votes with the columns (id_song,vote) then that index would be used for this so it wouldn't even hit the table. Likewise if you had an index on Songs_Tags with (id_song,id_tag) then that table wouldn't be hit by the query.
edit added solution using count
SELECT
songsWithTags.*,
COUNT(CASE WHEN v.vote=1 THEN 1 END) as votesUp,
COUNT(CASE WHEN v.vote=0 THEN 1 END) as votesDown
FROM (
SELECT
s.*,
COLLATE(GROUP_CONCAT(st.id_tag),'') AS tags_ids
FROM Songs s
LEFT JOIN Songs_Tags st
ON st.id_song = s.id
GROUP BY s.id
) AS songsWithTags
LEFT JOIN Votes v
ON songsWithTags.id = v.id_song
GROUP BY songsWithTags.id DESC
Try this:
SELECT
s.*,
GROUP_CONCAT(DISTINCT st.id_tag) AS tags_ids,
COUNT(DISTINCT CASE WHEN v.vote=1 THEN id_vote ELSE NULL END) AS votesUp,
COUNT(DISTINCT CASE WHEN v.vote=0 THEN id_vote ELSE NULL END) AS votesDown
FROM Songs s
LEFT JOIN Songs_Tags st ON (s.id = st.id_song)
LEFT JOIN Votes v ON (s.id=v.id_song)
GROUP BY s.id
ORDER BY id DESC
Your code results in a mini-Cartesian product because you are doing two Joins in 1-to-many relationships and the 1 table is on the same side of both joins.
Convert to 2 subqueries with groupings and then Join:
SELECT
s.*,
COALESCE(st.tags_ids, '') AS tags_ids,
COALESCE(v.votesUp, 0) AS votesUp,
COALESCE(v.votesDown, 0) AS votesDown
FROM
Songs AS s
LEFT JOIN
( SELECT
id_song,
GROUP_CONCAT(id_tag) AS tags_ids
FROM Songs_Tags
GROUP BY id_song
) AS st
ON s.id = st.id_song
LEFT JOIN
( SELECT
id_song,
COUNT(CASE WHEN v.vote=1 THEN id_vote END) AS votesUp,
COUNT(CASE WHEN v.vote=0 THEN id_vote END) AS votesDown
FROM Votes
GROUP BY id_song
) AS v
ON s.id = v.id_song
ORDER BY s.id DESC

SQL Query Help for Chart Data

Here is our goal:
We want to select each month and the revenue gained for each month, for a particular user.
Here are our tables
product, purchase, user, months
user 1..* product, product 1..* purchase
product has a column 'user_id' and purchase has a column 'product_id'
months is just a table that contains each month as a string. Currently, we are using this to do some left joins as you can see below.
SELECT months.name, IFNULL(sum(purchase.price), 0) as revenue
FROM months
LEFT JOIN purchase
ON DATE_FORMAT(purchase.purchase_date, '%M') = months.name
AND DATE_FORMAT(purchase_date, '%Y') = DATE_FORMAT(CURRENT_DATE, '%Y')
AND purchase.status = 2
GROUP BY name
ORDER BY months.id ASC;
Which works great for ALL of the users and ALL of the purchases made this month (Purchase status of 2 means complete). The next part is how do I filter this based on a user id? The table 'product' has the user_id we're looking for, and table purchase has a 'product_id'. Everytime I try something it either does nothing or it removes all of the null values, which we don't want.
ATTEMPT #NathanialWools
The where clause will remove all of the rows with a revenue of zero (or null, because of the IFNULL statement). This is what I tried:
SELECT name, IFNULL(sum(purchase.price), 0) as revenue
FROM months
LEFT JOIN purchase
ON DATE_FORMAT(purchase.purchase_date, '%M') = months.name
AND DATE_FORMAT(purchase_date, '%Y') = DATE_FORMAT(CURRENT_DATE, '%Y')
AND purchase.status = 2
LEFT JOIN product
ON product.id = purchase.product_id
WHERE product.user_id = 1
GROUP BY name
ORDER BY months.id ASC;
If you have a list of users you could start with that, join to months, then join to product. (if you don't just start with product).
SELECT u.user_id, m.name, IFNULL(sum(p.price), 0) as revenue
FROM user u
CROSS JOIN months m
LEFT JOIN product d
ON u.user_id = d.user_id
LEFT JOIN purchase p
ON DATE_FORMAT(p.purchase_date, '%M') = m.name
AND DATE_FORMAT(p.purchase_date, '%Y') = DATE_FORMAT(CURRENT_DATE, '%Y')
AND p.product_id = d.product_id
AND p.status = 2
WHERE u.user_id = <user you care about>
GROUP BY u.user_id, m.name
ORDER BY m.id ASC;
LEFT JOIN on product, and add a WHERE clause where user_id is null or user_id is equal to your value.
What did you try?
SELECT months.name, u.user_id, sum(IFNULL(purchase.price, 0)) as revenue
FROM months
LEFT JOIN purchase pu
ON DATE_FORMAT(purchase.purchase_date, '%M') = months.name
AND DATE_FORMAT(purchase_date, '%Y') = DATE_FORMAT(CURRENT_DATE, '%Y')
AND purchase.status = 2
LEFT JOIN product pr
ON pr.product_id = pr.product_id
LEFT JOIN user u
ON u.user_id = pr.user_id
GROUP BY pu.name, u.user_id
ORDER BY months.id ASC;
An inner-select Seems to do the trick. The left join on the purchase alone was not enough filtering. Here is the solution that works for me:
SELECT m.name, IFNULL(sum(b.price), 0) as revenue
FROM months as m
LEFT JOIN (SELECT price, purchase_date, status
FROM purchase, product
WHERE purchase.product_id = product.id
AND product.user_id = 1) as b
ON DATE_FORMAT(b.purchase_date, '%M') = m.name
AND DATE_FORMAT(b.purchase_date, '%Y') = DATE_FORMAT(CURRENT_TIMESTAMP, '%Y')
AND status = 2
GROUP BY m.name
ORDER BY m.id ASC
Where you would obviously change the '1' based on which user you wanted to look at.
Thanks everyone for pitching in, made debugging faster.