I'm using a simple left join query to fetch two rows of data from two separate tables. They both hold a common column named domain and I join them on this column to calculate a value based on the one tables visits and the other tables earnings.
SELECT t1.`domain` AS `domain`,
(SUM(earnings)/SUM(visits)) AS `rpv`
FROM hat_adsense_stats t1
LEFT JOIN hat_analytics_stats t4 ON t4.`domain`=t1.`domain`
WHERE(t1.`hat_analytics_id`='91' OR t1.`hat_analytics_id`='92')
AND t1.`date`>='2013-02-18'
AND t4.`date`>='2013-02-18'
GROUP BY t1.`domain`
ORDER BY rpv DESC
LIMIT 10;
this is the query i run and it takes 9.060 sec to execute.
The hat_adsense_stats table contains 60887 records
The hat_analytics_stats table contains 190780 records
but by grouping by domain it returns 186 rows of data that needs comparing.
Any suggestions on in-efficient code or on better way to resolve this will be appreciated!
thanks raheel for opening the door, this is what worked in the end, with a execution time of 0.051sec. :)
SELECT
t1.`domain` AS `domain`,
SUM(earnings)/visits AS `rpv`
FROM hat_adsense_stats t1
INNER JOIN (SELECT
domain,
SUM(visits) AS visits
FROM hat_analytics_stats
WHERE `date` >= "2013-02-18"
GROUP BY domain) AS t4
ON t4.domain = t1.domain
WHERE t1.`hat_analytics_id` IN('91','92')
AND t1.`date`>='2013-02-18'
GROUP BY t1.`domain`
ORDER BY rpv DESC
LIMIT 10
Change your query like this
SELECT
t1.`domain` AS `domain`,
t2.earnings/t2.visits AS `rpv`
FROM hat_adsense_stats t1
INNER JOIN (SELECT
domain,
sum(earnings) AS earnings,
SUM(visits) AS visits
FROM hat_adsense_stats
GROUP BY domain) AS t2
on t2.domain = t1.domain
LEFT JOIN hat_analytics_stats t4
ON t4.`domain` = t1.`domain`
WHERE t1.`hat_analytics_id` IN('91','92')
AND t1.`date` >= '2013-02-18'
AND t4.`date` >= '2013-02-18'
GROUP BY t1.`domain`
ORDER BY rpv DESC
LIMIT 10;
The LEFT JOIN is unnecessary as you check the value of an item from the right side of the join. An INNER JOIN would work just as well here and might well be quicker
Related
I have 14000 records in my sql table. They have columns ID, test_subject_id and date_created. I want to fetch all the records that have been created within a time difference of 3 minutes(difference in date_created values) and both records should have the same test_subject_id.
You should use a self join, I assume inner join is what will work for you:
SELECT a.ID, a.date_created, b.ID, b.date_created
FROM accounts a
INNER JOIN accounts b
ON a.test_subject_id = b.test_subject_id
AND TIMESTAMPDIFF(MINUTE,a.date_created,b.date_created) = 3
Note: TIMESTAMPDIFF is used assuming date_created has type datetime, details here.
You can use EXISTS:
SELECT t1.*
FROM tablename t1
WHERE EXISTS (
SELECT 1
FROM tablename t2
WHERE t2.test_subject_id = t1.test_subject_id
AND ABS(TIMESTAMPDIFF(SECOND, t1.date_created, t2.date_created)) <= 180
)
ORDER BY t1.test_subject_id, t1.date_created;
Right now it is taking a long long time to run.
The query is:
select count(id), variety_id, name
from tblItem
where order_id IN (
select order_id
from tblItem
where variety_id=4005
order by order_id DESC)
AND variety_id != 4005
GROUP BY variety_id
order by count(id) DESC
LIMIT 5;
I have indexes on variety_id and order_id. I'm basically trying to build a recommendation engine. The query is looking for the top 5 items people buy when they also bought variety_id 4005. But like i said it takes way to long to run.
Does anyone have a way to optimize this query?
Try this:
select count(t1.id), t1.variety_id, t1.name
from tblItem t1
inner join tblItem t2 ON t2.order_id = t1.order_id and t2.variety_id = 4005
where t1.variety_id != 4005
GROUP BY t1.variety_id, t1.name
ORDER BY count(t1.id) DESC
LIMIT 5;
I've often found that MySQL optimizes WHERE ... IN (SELECT ...) poorly, and JOIN works better; I've read that recent MySQL versions are better, so it may be version-dependent. Also, you should use COUNT(*) unless the column can be NULL and you need to ignore the null values in the count.
SELECT COUNT(*) count, variety_id, name
FROM tblItem AS t1
JOIN (SELECT DISTINCT order_id
FROM tblItem
WHERE variety_id = 4005) AS t2
ON t1.order_id = t2.order_id
WHERE t1.variety_id != 4005
GROUP BY variety_id
ORDER BY count DESC
LIMIT 5
The subquery with DISTINCT is needed to prevent multiplying the counts by the number of matching rows in the cross-product.
I have two tables, one holds user info (id, name, etc) and another table that holds user tickets and ticket status (ticket_id, user_id, ticket_status, etc).
I want to produce a list of ALL the users for example: ( SELECT * FROM user_table )
And for each user I need a count of their tickets for example:
(SELECT t1.user_id, COUNT(*) FROM user_tickets t1 WHERE t1.ticket_status = 15 GROUP BY t1.ticket_status, t1.user_id )
I can do this query to achieve what I’m looking for but it takes 5sec. to run the query on 50000 tickets, while each query running separately only takes fraction of a second.
SELECT t1.user_id, COUNT(*)
FROM user_tickets t1
LEFT JOIN user_table t2 ON t1.user_id = t2.id
WHERE t2.group_id = 20 AND t1.status_id = 15
GROUP BY t1.status_id, user_id
Any idea how to write the query to get same performance as each separately?
An indexing where clause fixed the problem.
Is it possible to pull 2 results from a sub query in a sql statement?
I have:
"SELECT
(SELECT bid FROM auction_bids WHERE itemID=a.id ORDER BY bid DESC LIMIT 1) as topbid,
a.* FROM auction_items a ORDER BY a.date DESC LIMIT 15"
The part where it returns the topbid, i'd also like it to pull not only bid (as topbid) but also date (as topdate) as well. How can I do that? Do I need another sub query or can it pull both in one?
Dependent subquery (depending on some values outside, like a.id in your case) is not a very efficient way to find maximum values in subsets.
Instead use a subquery with GROUP BY:
SELECT b.topbid, b.topdate, a.*
FROM auction_items a
LEFT JOIN
( SELECT itemID, MAX(bid) as topbid, MAX(date) as topdate
FROM auction_bids
GROUP BY itemID ) b
ON a.id = b.itemID
ORDER BY a.date DESC
LIMIT 15
I have this query in mysql and since it take almost 20 sec to execute, I want to do selects insted of innerjoins with limits in order to make the execution faster.
SELECT t1.order_id, CONCAT(t3.first_name,' ',t3.last_name),
buyer_first_name, buyer_last_name,
max(product_quantity) as product_quantity, order_status,
order_value, t5.first_name staff_firstnamelogin,
t5.last_name staff_lastnamelogin, t6.day_name
FROM t_seller_order t0
INNER JOIN t_orders t1
ON t0.event_id = t1.event_id
AND t1.seller_order_token = t0.seller_order_token
INNER JOIN t_tickets t2
ON t1.order_id = t2.order_id
INNER JOIN t_login t3
ON t3.login_id = t1.login_id
INNER JOIN t_login t5
ON t0.login_id = t5.login_id
INNER JOIN t_event_days t6
ON t2.product_id = t6.event_day_id
WHERE t0.event_id = 35
group by t1.order_id
order by order_id desc;
There are many things about the schema that prevent speeding up the query. Let's see what can or cannot be done...
Since the WHERE and GROUP BY hit different tables, no index is useful for both. The best is to have t0: INDEX(event_id).
Indexes for JOINs: t2..t6 need indexes (or PKs) on order_id, login_id, event_day_id. t1 needs INDEX(event_id, seller_order_token) in either order.
The GROUP BY and ORDER BY are the 'same', so that will take only one sort, not two.
A potential speedup is to finish the GROUP BY before doing some of the JOINs. The current structure is "inflate-deflate", wherein the JOINs conspire to create a huge temp table, then the GROUP BY deflates the results. So...
If see if you can write a SELECT like this:
SELECT t0.id, t1.id -- I need the PRIMARY KEYs for these two tables
FROM t_seller_order AS t0
JOIN t_orders AS t1
WHERE t0.event_id = 35
GROUP BY t1.order_id
How fast is that? Hopefully we can build the rest of the query around this, but without taking too much more time. There are two approaches; I don't know which will be better.
Plan A: Use subqueries (when possible) instead of JOINs. For example, instead of JOINing to t3, plan on this being one item in theSELECT`:
( SELECT CONCAT(first_name,' ',last_name)
FROM t_login WHERE login_id = t1.login_id
) AS login_name
(Ditto for any other columns in the SELECT that touch a table only once. As it stands, t5 is touched twice, so this approach may be impractical.)
Plan B: JOIN after the GROUP BY. That is, after then "deflate".
SELECT ...
FROM ( SELECT t0.id, t1.id ... GROUP BY... ) AS x -- as discussed above
JOIN y ON y.foo = x.foo
JOIN z ON z.bar = x.bar
-- the GROUP BY is avoided
ORDER BY x.order_id desc; -- The ORDER BY is still necessary
Is your example, I lean toward Plan B, but a mixture of both 'Plans' may be desirable.
Further notes: LEFT JOIN and LIMIT add wrinkles to the above discussion. Since you did not have either, I will not clutter this discussion with them.