I am getting back into mysql after a couple years and have run into a problem. I have a query that works, but I am lost on how to optimize it better.
Here is the query:
select
u.id as 'User',
count(distinct tr.id) as Trips,
count(distinct ti.id) as 'Trip Items'
from
users u
inner join
user_emails ue on u.id = ue.user_id
inner join
trips tr on tr.user_id = u.id
inner join
trip_items ti on ti.trip_id = tr.id
where
ue.verified = true and ue.is_primary = true
and
tr.created_at between '2017-02-01 00:00:00' and '2017-02-01 00:59:59'
group by 1
having Trips < 30
I essentially need to get a count of all trips and trip items.. but only for those users who have 30 or less trips in the given date range. Right now I am accomplishing that by grouping the results by User, and then performing a 'having'. I'm looking at millions of results on a non-indexed field (created_at). ideally i'd like to just get 1 row back that has total trips, and total trip items. But still applying the "users w/ less than 30 trips" during the query. Is this possible? :)
Just a quick edit, i've tried looking around at other solutions but I am a bit lost on what I should be looking for. I'm not looking for a solution, perhaps just a "go check this out and try that".
count(distinct) can be expensive. Try aggregating before doing the join. I think the follow works (this assumes that items are not shared among different trips):
select u.id as `User`, tr.Trips, tr.items
from users u inner join
user_emails ue
on u.id = ue.user_id inner join
(select tr.user_id, count(*) as Trips, sum(items) as items
from trips tr join
(select ti.trip_id, count(*) as items
from trip_items ti
group by ti.trip_id
) ti
on ti.trip_id = tr.id
where tr.created_at >= '2017-02-01' and tr.created_at < '2017-02-01 01:00:00'
group by tr.user_id
having trips < 30
) tr
on tr.user_id = u.id inner join
where ue.verified = true and ue.is_primary = true
group by 1
Related
I've seen a couple posts on dividing 2 separate queries that seemed helpful but I am still having trouble dividing these two queries. I wrote different sub queries and followed some examples, but I just keep getting errors as the example queries seemed more straight forward (no Joins).
Here is the first query:
SELECT
YEAR(s.created_at) AS year,
COUNT(*) AS pre_sub_buys
FROM subscription_users s
INNER JOIN users u
ON s.user_id = u.uid
LEFT JOIN canvases c
ON u.email = c.ref_email
WHERE c.is_paid=1 AND c.date_created < s.created_at
GROUP BY year;
And I am trying to divide this by:
SELECT
YEAR(s.created_at) AS year,
COUNT(s.created_at) AS subscribers
FROM subscription_users s
LEFT JOIN canvases c
ON c.entries_updated_at = s.updated_at
GROUP BY year;
Essentially, I am looking to find the yearly average between presubscription purchases and subscribers.
Can anyone direct me in the right direction on how to properly do this?
Thank you so much,
Jonathan
You can approach this using Conditional Aggregation:
SELECT
YEAR(s.created_at) AS year,
COUNT(CASE WHEN c.is_paid=1 AND c.date_created < s.created_at THEN 1
ELSE NULL
END) AS pre_sub_buys,
COUNT(s.created_at) AS subscribers,
COUNT(CASE WHEN c.is_paid=1 AND c.date_created < s.created_at THEN 1
ELSE NULL
END) / COUNT(s.created_at) AS pre_sub_buys_divided_by_subscribers
FROM subscription_users s
INNER JOIN users u
ON s.user_id = u.uid
LEFT JOIN canvases c
ON u.email = c.ref_email
GROUP BY year;
Query in text: "Display all active users and the completed orders entered by them + the completed orders they are assigned to for the specified date range".
Here is the query i managed to create with only one count
SELECT u.firstname, u.lastname, COUNT(l.id) AS totalCompleted
FROM users u
LEFT JOIN orders l
ON l.idDispatcher = u.id
WHERE u.disabled = '0'
AND l.smallStatus='1'
AND l.dateAdded >= :from
AND l.dateAdded <= :to
GROUP BY u.firstname;
This gives me all the orders where the user is assigned to an order:
LEFT JOIN orders l
ON l.idDispatcher = u.id
I need to combine this query with another one where the COUNT(l.id) is based on:
LEFT JOIN orders l
ON l.addedById= u.id
When I try this:
LEFT JOIN orders l
ON l.idDispatcher = u.id AND l.addedById= u.id
The COUNT(l.id) combines the result for assigned orders and orders added by the user, when i need it to be with two different numbers. I also tried putting a condition inside the COUNT, with no success
Not sure I understand exactly but if you said you got it working with two separate queries, and you need two counts, then just union the results together?
But if you need one count of ALL matches based on two separate conditions, use an "OR" in your join instead of "AND":
SELECT u.firstname, u.lastname, COUNT(l.id) AS totalCompleted
FROM users u
LEFT JOIN orders l
ON l.idDispatcher = u.id
or l.addedById= u.id
WHERE u.disabled = '0'
AND l.smallStatus='1'
AND l.dateAdded >= :from
AND l.dateAdded <= :to
GROUP BY u.firstname, u.lastname;
I have a very slow MySQL query that I would like to optimise.
The query is taking 66.2070 seconds to return 5 results from tables containing around 200 rows.
The database tables store users, experiments (A/B tests), goals (page URLs), visits (page visits) and conversions (clicks a goal's URL). The visit and conversion tables both have a combination column that records if version A or B of a page was visited or a conversion came from version A or B. Combinations are stored in the db as 1 or 2.
I'm trying to get a list of a user's experiments with the number of visits and conversions for each combination.
For some relationships I'm using composite primary keys, which does make the joins more complicated. I doubt it but could this be the cause of the problem?
How can I rewrite this query to make it run in a reasonable time, at least less than a second?
Here's my database schema:
and her's my query:
SELECT e.id AS id,
e.name AS name,
e.status AS status,
e.created AS created,
Count(DISTINCT v1.id) AS visits1,
Count(DISTINCT v2.id) AS visits2,
Count(DISTINCT c1.id) AS conversions1,
Count(DISTINCT c2.id) AS conversions2
FROM experiment e
LEFT JOIN visit v1
ON ( v1.experiment_id = e.id
AND v1.user_id = e.user_id
AND v1.combination = 1 )
LEFT JOIN visit v2
ON ( v2.experiment_id = e.id
AND v2.user_id = e.user_id
AND v2.combination = 2 )
LEFT JOIN goal g
ON ( g.experiment_id = e.id
AND g.user_id = e.user_id
AND g.principal = 1 )
LEFT JOIN conversion c1
ON ( c1.experiment_id = e.id
AND c1.user_id = e.user_id
AND c1.goal_id = g.id
AND c1.combination = 1 )
LEFT JOIN conversion c2
ON ( c2.experiment_id = e.id
AND c2.user_id = e.user_id
AND c2.goal_id = g.id
AND c2.combination = 2 )
WHERE e.user_id = 25
GROUP BY e.id
ORDER BY e.created DESC
LIMIT 5
The resulting table should look something like this:
You should do the aggregations before doing the joins, to avoid getting large intermediate results. I think the logic is
SELECT e.id, e.name, e.status, e.created,
v.visits1, v.visits2, g.conversions1, g.conversions2
FROM experiment e LEFT JOIN
(SELECT experiment_id, user_id,
SUM(combination = 1) as visits1,
SUM(combination = 2) as visits2
FROM visits
WHERE combination IN (1, 2)
GROUP BY experiment_id, user_id
) v
ON v.experiment_id = e.id AND
v.user_id = e.user_id LEFT JOIN
(SELECT g.experiment_id, g.user_id,
SUM(c.combination = 1) as conversions1,
SUM(c.combination = 2) as conversions2
FROM goal g LEFT JOIN
conversion c
ON c.experiment_id = g.experiment_id AND
c.user_id = g.user_id AND
c.goal_id = g.id
WHERE g.principal = 1
GROUP BY g.experiment_id, g.user_id
) g
ON g.experiment_id = e.id AND
g.user_id = e.user_id LEFT JOIN
WHERE e.user_id = 25
ORDER BY e.created DESC
LIMIT 5 ;
There are further optimizations for this. For instance, an index on experiment(user_id, created, id).
For your question about the drawback of using composite keys I found this:
Drawback of composite keys
I can't currently test ur database but use the EXPLAIN syntax in mysql to find out what is wrong with the perfomance of ur query:
MySQL docs about EXPLAIN and optimizing ur query with EXPLAIN
I'm struggling a little bit with a query and hope you can help.
I have two tables. On with all the users and one with information from submitted forms.
Both contain the user ID.
What I would need to find out is which user from the users table does not appear on the report table.
This is what I have so far:
SELECT u.ID, u.display_name, u.user_email, r.user_id
FROM users AS u
LEFT JOIN report AS r ON u.ID = r.user_id
WHERE NOT EXISTS(
SELECT *
FROM report AS rr
WHERE u.ID = rr.user_id
)
This seems to be fine for the users who absolutely have never submitted the form.
But the reports table also contains a date column and I was wondering how I can get this grouped by day.
In the front end then I will hopefully have a table which shows:
date: user:
2015-01-01 user a
2015-01-01 user f
2015-01-02 user g
2015-01-02 user a
2015-01-03 user z
2015-01-03 user x
Where the users are those who have not submitted the form that day.
Hope you can help. Thank in advance!
If you want to get a list of users that doesn't have any rows in the report table then you can generate a set that is the Cartesian product of the users and the dates that are present in the report table, and then do a left join with that set and check for null.
The Cartesian set formed by the cross join will contain all possible combinations of dates and users; that is would the report table would contain is all users had added reports on all available dates.
select r.date, u.user_id
from report r
cross join users u
left join (select r.date, r.user_id from users as u join report as r on u.id = r.user_id)
a on a.date = r.date and a.user_id = u.user_id
where a.date is null
Sample SQL Fiddle
With most other databases this could have been done with a set difference operator (minus or except) instead of a left join.
I'm making assumptions about column names in your report table for this answer:
SELECT x.report_date, u.user_id, u.display_name
FROM users u
JOIN (
SELECT DISTINCT report_date
FROM reports
) x
LEFT JOIN reports r
ON r.user_id = u.user_id
AND r.report_date = x.report_date
WHERE r.report_date IS NULL
ORDER BY x.report_date, u.user_id
Check out this fiddle: http://sqlfiddle.com/#!9/407ac/5
Left outer join with where clause...
Here is a good link ...
http://blog.codinghorror.com/a-visual-explanation-of-sql-joins/
SELECT * FROM `users`
LEFT OUTER JOIN `report`
ON `users`.`ID` = `report`.`user_id`
WHERE `report`.`user_id` IS null
ORDER BY `report`.`Date`
Surely you could just pass in the date you wanted to check?
so something like this (using #reportDate as the parameter):
SELECT * FROM users
LEFT OUTER JOIN report
ON users.ID = report.user_id
WHERE report.user_id IS NULL
AND report.Date = #reportDate
You can get the pairs of users/dates without reports. Generate all possible rows using a cross join and then filter out the ones that exist:
select u.*, r.date
from users u cross join
(select distinct date from reports r) d left join
reports r
on u.id = r.user_id and d.date = r.date
where r.userid is null;
I've 2 tables, first one is users(13068), the other one invitations(211343)
fbuid on users is same with inviter on invitations.
So I'm trying export this 2 tables as an excel which should looks like this;
u.name, u.adress, u.fbuid ...., COUNT(i.id)
So for I've tried;
SELECT u.*,(SELECT COUNT(i.id) FROM invitations i WHERE i.isaccepted = 1 and i.inviter = u.fbuid) as chance FROM users WHERE u.datecreated BETWEEN '2013-01-01' AND '2014-01-01' LIMIT 0,50
and
SELECT *,COUNT(i.id) as chance FROM users u LEFT JOIN invitations i ON u.fbuid = i.inviter WHERE u.datecreated BETWEEN '$startdate' AND '$enddate' and i.isaccepted=1 GROUP BY fbuid
Problem is left join gives only users with invitations, but only about 2000 users invited, I need to list all of them.
First one is with limit 50 tooks 36 seconds. I can't imagine how much took all records. Other than join what else I can do? Or how should be the correct way.
This is the query with the left join:
SELECT *, COUNT(i.id) as chance
FROM users u LEFT JOIN
invitations i
ON u.fbuid = i.inviter
WHERE u.datecreated BETWEEN '$startdate' AND '$enddate' and i.isaccepted=1
GROUP BY fbuid;
The problem is that you are filtering on the i table in the where clause. Because of the left join, this could have a value of NULL. Move that condition to the on clause:
SELECT u.*, COUNT(i.id) as chance
FROM users u LEFT JOIN
invitations i
ON u.fbuid = i.inviter and i.isaccepted = 1
WHERE u.datecreated BETWEEN '$startdate' AND '$enddate'
GROUP BY fbuid;