sql nested join inner and left join - mysql

Hi I want to filter logs using MySQL with a list of trackings.
Every log belongs to a server,
Every tracking belongs to a server and have 0..N patterns
Every pattern belongs to a tracking
I have 3 tables :
logs : | id | ip | url | server_id | ...
tracking : | id | server_id | name | other fields...
pattern : | id | tracking_id | pattern |
I want to count logs that match tracking for a specific server my problem is that my query mix up tracking that have pattern and those that don't.
SQL Fiddle : http://sqlfiddle.com/#!2/f11b1/2
SELECT COUNT(DISTINCT logs.ip), tr.name
FROM `logs`
INNER JOIN `trackings` as tr ON
( tr.server_id = logs.server_id )
AND -- OTHER conditions between log and tracking
LEFT JOIN `patterns` as pt ON
( pt.tracking_id = tr.id )
AND (logs.url LIKE pt.pattern )
GROUP BY tr.id
My problem is on the second join, if I use INNER JOIN patterns as pt ON I get correct results but only on trackings that have some patterns,
If I use LEFT JOIN patterns as pt ON I get all tracking but with a false count (I get the result of SELECT COUNT(DISTINCT logs.ip) FROM logs )
EDIT
I Can get the correct result with a field in tracking that indicates if the tracking has patterns and a UNION :
(
SELECT COUNT(DISTINCT lg.ip), tr.name
FROM `logs` as lg
INNER JOIN `trackings` as tr ON
( tr.server_id = lg.server_id )
AND (tr.hasPatterns = 1)
AND -- Other conditions
INNER JOIN `patterns` as pt ON
( pt.tracking_id = tr.id )
AND (lg.url LIKE pt.pattern )
WHERE
GROUP BY tr.id
)
UNION
(
SELECT COUNT(DISTINCT lg.ip), tr.name
FROM `logs` as lg
INNER JOIN `trackings` as tr ON
( tr.server_id = lg.server_id )
AND (tr.hasPatterns = 0)
WHERE
GROUP BY tr.id, lg.date
)
But I guess there is a way to do that without using Union...

You can put a conditional inside count, so I think the following does what you want:
SELECT COUNT(DISTINCT (case when tr.hasPatterns = 1 and pt.tracking_id is not null
then lg.ip
when tr.hasPatterns = 0
then lg.ip
end)), tr.name
FROM `logs` as lg
INNER JOIN `trackings` as tr ON
( tr.server_id = lg.server_id )
AND -- Other conditions
LEFT JOIN `patterns` as pt ON
( pt.tracking_id = tr.id )
AND (lg.url LIKE pt.pattern )
WHERE
GROUP BY tr.id
EDIT:
This is returning what you want:
SELECT COUNT(DISTINCT (case when tr.size = 0 and pt.tracking_id is not null
then lg.ip
when tr.size > 0 and lg.size > tr.size
then lg.ip
end)), tr.name
FROM `logs` as lg
INNER JOIN `trackings` as tr ON
( tr.server_id = lg.server_id )
LEFT JOIN `patterns` as pt ON
( pt.tracking_id = tr.id )
AND (lg.url LIKE pt.pattern )
GROUP BY tr.id;
Your SQL Fiddle has the additional condition lg.size > tr.size which is not in the original question.

Related

I want to apply an INNER JOIN and use WHERE condition using 3 tables

SELECT
*
FROM
retailer
WHERE
states = 1
AND newsletter.status = 1
AND (
company_id IN (
SELECT
id
FROM
retailer
WHERE
muli_ret_id = 0
)
)
ORDER BY
email
I'm not sure that your exact current query can be expressee using an inner join instead of the where clause. You might try:
SELECT DISTINCT r1.*
FROM retailer r1
INNER JOIN retailer r2
ON r1.company_id = r2.id
WHERE
r1.states = 1 AND
r1.newsletter.status = 1 AND
r2.muli_ret_id = 0
ORDER BY
r1.email;
I use select distinct to remove possible duplicates which could arise from the self join. But honestly, you current approach is fine, though I might use exists here:
SELECT r1.*
FROM retailer r1
WHERE
r1.states = 1 AND
r1.newsletter.status = 1 AND
EXISTS (SELECT 1 FROM retailer r2
WHERE r2.id = r1.company_id AND r2.muli_ret_id = 0)
ORDER BY
r1.email;

how optimize prestashop category get product for random

this is prestashop 1.7 version category get product query. if use random, it is very slow, how optimize it?
SELECT
cp.id_category,
p.*,
product_shop.*,
stock.out_of_stock,
IFNULL( stock.quantity, 0 ) AS quantity,
IFNULL( product_attribute_shop.id_product_attribute, 0 ) AS id_product_attribute,
product_attribute_shop.minimal_quantity AS product_attribute_minimal_quantity,
pl.`description`,
pl.`description_short`,
pl.`available_now`,
pl.`available_later`,
pl.`link_rewrite`,
pl.`meta_description`,
pl.`meta_keywords`,
pl.`meta_title`,
pl.`name`,
image_shop.`id_image` id_image,
il.`legend` AS legend,
m.`name` AS manufacturer_name,
cl.`name` AS category_default,
DATEDIFF(
product_shop.`date_add`,
DATE_SUB( "2019-11-30 00:00:00", INTERVAL 7 DAY )) > 0 AS new,
product_shop.price AS orderprice
FROM
`ps_category_product` cp
LEFT JOIN `ps_product` p ON p.`id_product` = cp.`id_product`
INNER JOIN ps_product_shop product_shop ON ( product_shop.id_product = p.id_product AND product_shop.id_shop = 1 )
LEFT JOIN `ps_product_attribute_shop` product_attribute_shop ON ( p.`id_product` = product_attribute_shop.`id_product` AND product_attribute_shop.`default_on` = 1 AND product_attribute_shop.id_shop = 1 )
LEFT JOIN ps_stock_available stock ON ( stock.id_product = `p`.id_product AND stock.id_product_attribute = 0 AND stock.id_shop = 1 AND stock.id_shop_group = 0 )
LEFT JOIN `ps_category_lang` cl ON ( product_shop.`id_category_default` = cl.`id_category` AND cl.`id_lang` = 11 AND cl.id_shop = 1 )
LEFT JOIN `ps_product_lang` pl ON ( p.`id_product` = pl.`id_product` AND pl.`id_lang` = 11 AND pl.id_shop = 1 )
LEFT JOIN `ps_image_shop` image_shop ON ( image_shop.`id_product` = p.`id_product` AND image_shop.cover = 1 AND image_shop.id_shop = 1 )
LEFT JOIN `ps_image_lang` il ON ( image_shop.`id_image` = il.`id_image` AND il.`id_lang` = 11 )
LEFT JOIN `ps_manufacturer` m ON m.`id_manufacturer` = p.`id_manufacturer`
WHERE
product_shop.`id_shop` = 1
AND cp.`id_category` = 12
AND product_shop.`active` = 1
AND product_shop.`visibility` IN ( "both", "catalog" )
ORDER BY
RAND()
LIMIT 50
Please provide SHOW CREATE TABLE for each table. Meanwhile, ...
Let's start by optimizing the joins.
LEFT JOIN `ps_product_lang` pl ON ( p.`id_product` = pl.`id_product`
AND pl.`id_lang` = 11
AND pl.id_shop = 1 )
That needs INDEX(id_product, id_lang, id_shop) (The columns may be in any order.)
Don't use LEFT unless you really need to fetch a row from the righthand table as NULLs when it does not exist. In particular,
LEFT JOIN `ps_product` p
is probably getting in the way of optimization.
WHERE product_shop.`id_shop` = 1
AND product_shop.`active` = 1
AND product_shop.`visibility` IN ( "both", "catalog" )
would probably benefit from these indexes
INDEX(id_shop, active, visibility, id_product)
INDEX(id_product, id_shop, active, visibility)
product_category needs
INDEX(id_category, id_product) -- in this order.
In general many-to-many mapping tables need to follow the tips here: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
The query has the "explode-implode" syndrome. This is where it first does a JOINs, collecting a lot of data, then throws away much of it due, in your case, to the LIMIT 10. It can probably be cured by turning the query inside-out. The general ID is to start with a derived table that gets the 10 rows desired, then reaches into the other table for the rest of the desired columns. This "reaching" need happen only 10 times, not however many the JOINs currently require.
SELECT ...
FROM ( SELECT <<primary key columns from cp, p, and product_shop>>
FROM cp
JOIN p ON ...
JOIN product_shop ON ...
ORDER BY RAND()
LIMIT 10 ) AS x
JOIN <<p, product_shop ON their PKs>> -- to get p.*, product_shop.*>>
[LEFT] JOIN << each of the other tables>> -- to get the other tables
You should start by testing the subquery (a "derived" table) to verify that it is noticeably faster than the original query.

How to Make This SQL Query More Efficient?

I'm not sure how to make the following SQL query more efficient. Right now, the query is taking 8 - 12 seconds on a pretty fast server, but that's not close to fast enough for a Website when users are trying to load a page with this code on it. It's looking through tables with many rows, for instance the "Post" table has 717,873 rows. Basically, the query lists all Posts related to what the user is following (newest to oldest).
Is there a way to make it faster by only getting the last 20 results total based on PostTimeOrder?
Any help would be much appreciated or insight on anything that can be done to improve this situation. Thank you.
Here's the full SQL query (lots of nesting):
SELECT DISTINCT p.Id, UNIX_TIMESTAMP(p.PostCreationTime) AS PostCreationTime, p.Content AS Content, p.Bu AS Bu, p.Se AS Se, UNIX_TIMESTAMP(p.PostCreationTime) AS PostTimeOrder
FROM Post p
WHERE (p.Id IN (SELECT pc.PostId
FROM PostCreator pc
WHERE (pc.UserId IN (SELECT uf.FollowedId
FROM UserFollowing uf
WHERE uf.FollowingId = '100')
OR pc.UserId = '100')
))
OR (p.Id IN (SELECT pum.PostId
FROM PostUserMentions pum
WHERE (pum.UserId IN (SELECT uf.FollowedId
FROM UserFollowing uf
WHERE uf.FollowingId = '100')
OR pum.UserId = '100')
))
OR (p.Id IN (SELECT ssp.PostId
FROM SStreamPost ssp
WHERE (ssp.SStreamId IN (SELECT ssf.SStreamId
FROM SStreamFollowing ssf
WHERE ssf.UserId = '100'))
))
OR (p.Id IN (SELECT psm.PostId
FROM PostSMentions psm
WHERE (psm.StockId IN (SELECT sf.StockId
FROM StockFollowing sf
WHERE sf.UserId = '100' ))
))
UNION ALL
SELECT DISTINCT p.Id AS Id, UNIX_TIMESTAMP(p.PostCreationTime) AS PostCreationTime, p.Content AS Content, p.Bu AS Bu, p.Se AS Se, UNIX_TIMESTAMP(upe.PostEchoTime) AS PostTimeOrder
FROM Post p
INNER JOIN UserPostE upe
on p.Id = upe.PostId
INNER JOIN UserFollowing uf
on (upe.UserId = uf.FollowedId AND (uf.FollowingId = '100' OR upe.UserId = '100'))
ORDER BY PostTimeOrder DESC;
Changing your p.ID in (...) predicates to existence predicates with correlated subqueries may help. Also since both halves of your union all query are pulling from the Post table and possibly returning nearly identical records you might be able to combine the two into one query by left outer joining to UserPostE and adding upe.PostID is not null as an OR condition in the WHERE clause. UserFollowing will still inner join to UPE. If you want the same Post record twice once with upe.PostEchoTime and once with p.PostCreationTime as the PostTimeOrder you'll need keep the UNION ALL
SELECT
DISTINCT -- <<=- May not be needed
p.Id
, UNIX_TIMESTAMP(p.PostCreationTime) AS PostCreationTime
, p.Content AS Content
, p.Bu AS Bu
, p.Se AS Se
, UNIX_TIMESTAMP(coalesce( upe.PostEchoTime
, p.PostCreationTime)) AS PostTimeOrder
FROM Post p
LEFT JOIN UserPostE upe
INNER JOIN UserFollowing uf
on (upe.UserId = uf.FollowedId AND
(uf.FollowingId = '100' OR
upe.UserId = '100'))
on p.Id = upe.PostId
WHERE upe.PostID is not null
or exists (SELECT 1
FROM PostCreator pc
WHERE pc.PostId = p.ID
and pc.UserId = '100'
or exists (SELECT 1
FROM UserFollowing uf
WHERE uf.FollowedId = pc.UserID
and uf.FollowingId = '100')
)
OR exists (SELECT 1
FROM PostUserMentions pum
WHERE pum.PostId = p.ID
and pum.UserId = '100'
or exists (SELECT 1
FROM UserFollowing uf
WHERE uf.FollowedId = pum.UserId
and uf.FollowingId = '100')
)
OR exists (SELECT 1
FROM SStreamPost ssp
WHERE ssp.PostId = p.ID
and exists (SELECT 1
FROM SStreamFollowing ssf
WHERE ssf.SStreamId = ssp.SStreamId
and ssf.UserId = '100')
)
OR exists (SELECT 1
FROM PostSMentions psm
WHERE psm.PostId = p.ID
and exists (SELECT
FROM StockFollowing sf
WHERE sf.StockId = psm.StockId
and sf.UserId = '100' )
)
ORDER BY PostTimeOrder DESC
The from section could alternatively be rewritten to also use an existence clause with a correlated sub query:
FROM Post p
LEFT JOIN UserPostE upe
on p.Id = upe.PostId
and ( upe.UserId = '100'
or exists (select 1
from UserFollowing uf
where uf.FollwedID = upe.UserID
and uf.FollowingId = '100'))
Turn IN ( SELECT ... ) into a JOIN .. ON ... (see below)
Turn OR into UNION (see below)
Some the tables are many:many mappings? Such as SStreamFollowing? Follow the tips in http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
Example of IN:
SELECT ssp.PostId
FROM SStreamPost ssp
WHERE (ssp.SStreamId IN (
SELECT ssf.SStreamId
FROM SStreamFollowing ssf
WHERE ssf.UserId = '100' ))
-->
SELECT ssp.PostId
FROM SStreamPost ssp
JOIN SStreamFollowing ssf ON ssp.SStreamId = ssf.SStreamId
WHERE ssf.UserId = '100'
The big WHERE with all the INs becomes something like
JOIN ( ( SELECT pc.PostId AS id ... )
UNION ( SELECT pum.PostId ... )
UNION ( SELECT ssp.PostId ... )
UNION ( SELECT psm.PostId ... ) )
Get what you can done of that those suggestions, then come back for more advice if you still need it. And bring SHOW CREATE TABLE with you.

Getting the latest date from a id

I run the above sql statement and i got this.[IMG]http://i1093.photobucket.com/albums/i422/walkgirl_1993/asd-1_zps5506632e.jpg[/IMG] i'm trying display the latest date which you can see the 3 and 4. For caseid 3, it should display the latest row which is the 2012-12-20 16:12:36.000. I tried using group by, order by. Google some website said to use rank but i'm not sure about the rank as i dont really get rank. Some suggestions?
select [Case].CaseID, Agent.AgentName, Assignment.Description, A.AgentName as EditedBy, A.DateEdited from Agent inner join [Case-Agent] on [Case-Agent].AgentID = Agent.AgentID inner join [Assignment] on Assignment.AssignmentID = [Case-Agent].AssignmentID inner join [Case] on [Case].CaseID = [Case-Agent].CaseID inner join (select EditedCase.CaseID, [EditedCase].DateEdited, [Agent].AgentName from EditedCase inner join [Agent] on [Agent].AgentID = [EditedCase].AgentID) A on A.CaseID = [Case].CaseID where [Assignment].AssignmentID = 0
To do it using RANK you just need to add the RANK to the subquery and get to rank the DateEdited for each CaseID and Agent and then in the main query put a WHERE clause to only select rows where the rank is 1. I think I have got the partition clause right - its a bit hard without seeing your data.
Like this:
SELECT
[Case].CaseID
,Agent.AgentName
,Assignment.Description
,A.AgentName AS EditedBy
,A.DateEdited
FROM Agent
INNER JOIN [Case-Agent] ON [Case-Agent].AgentID = Agent.AgentID
INNER JOIN [Assignment] ON Assignment.AssignmentID = [Case-Agent].AssignmentID
INNER JOIN [Case] ON [Case].CaseID = [Case-Agent].CaseID
INNER JOIN (SELECT
EditedCase.CaseID
,[EditedCase].DateEdited
,[Agent].AgentName
,RANK ( ) OVER (PARTITION BY EditedCase.CaseID, [Agent].AgentName
ORDER BY [EditedCase].DateEdited DESC ) AS pos
FROM EditedCase
INNER JOIN [Agent] on [Agent].AgentID = [EditedCase].AgentID) A on A.CaseID = [Case].CaseID
WHERE [Assignment].AssignmentID = 0
AND pos = 1
You could also change the sub query into an aggregate query that brings back the MAX date like this:
SELECT
[Case].CaseID
,Agent.AgentName
,Assignment.Description
,A.AgentName AS EditedBy
,A.DateEdited
FROM Agent
INNER JOIN [Case-Agent] ON [Case-Agent].AgentID = Agent.AgentID
INNER JOIN [Assignment] ON Assignment.AssignmentID = [Case-Agent].AssignmentID
INNER JOIN [Case] ON [Case].CaseID = [Case-Agent].CaseID
INNER JOIN (SELECT
EditedCase.CaseID
,MAX([EditedCase].DateEdited) AS DateEdited
,[Agent].AgentName
FROM EditedCase
INNER JOIN [Agent] on [Agent].AgentID = [EditedCase].AgentID
GROUP BY
EditedCase.CaseID
,[Agent].AgentName) A on A.CaseID = [Case].CaseID
WHERE [Assignment].AssignmentID = 0
AND pos = 1
You were on the right track; you need to use a ranking function here, for example row_number():
with LatestCase as
(
select [Case].CaseID
, Agent.AgentName
, Assignment.Description
, A.AgentName as EditedBy
, A.DateEdited
, caseRank = row_number() over (partition by [Case].CaseID order by A.DateEdited desc)
from Agent
inner join [Case-Agent] on [Case-Agent].AgentID = Agent.AgentID
inner join [Assignment] on Assignment.AssignmentID = [Case-Agent].AssignmentID
inner join [Case] on [Case].CaseID = [Case-Agent].CaseID
inner join
(
select EditedCase.CaseID
, [EditedCase].DateEdited
, [Agent].AgentName
from EditedCase
inner join [Agent] on [Agent].AgentID = [EditedCase].AgentID
) A on A.CaseID = [Case].CaseID where [Assignment].AssignmentID = 0
)
select *
from LatestCase
where caseRank = 1

Sum of rows with join

This is the current table layout.
There are 3 legs
Each leg has 2 points, where is_start = 1 is the start of the leg, and is_start is the end of the leg.
When the user check in at a point, a entry in points_user are created.
In this application you have multiple legs which has 2 points where one marks the start of the leg, where the other marks the end of the leg. So the sum of User's (with id = 2) Leg (with id= 1) is points_users.created where points_users.leg_id = 1 and points_users.user_id = 2 and points_users.is_start = 0 minus points_users where is_start = 1 (and the other parameters stay the same). And that's for just one leg.
What I would like is to sum all the time differences for each leg, we get the data like this:
| User.id | User.name | total_time |
| 1 | John | 129934 |
Anyone know how I can join these tables and sum it up grouped by user?
(No, this is not homework)
As far as I got:
SELECT
( `end_time` - `start_time` ) AS `diff`
FROM
(
SELECT SUM(UNIX_TIMESTAMP(`p1`.`created`)) AS `start_time`
FROM `points_users` AS `pu1`
LEFT JOIN `points` AS `p1` ON `pu1`.`point_id` = `p1`.`id`
WHERE `p1`.`is_start` = 1
) AS `start_time`,
(
SELECT SUM(UNIX_TIMESTAMP(`pu2`.`created`)) AS `end_time`
FROM `points_users` AS `pu2`
LEFT JOIN `points` AS `p2` ON `pu2`.`point_id` = `p2`.`id`
WHERE `p2`.`is_start` = 0
) AS `end_time`
Try this:
select users.user_id,
users.user_name,
SUM(timeDuration) totalTime
from users
join (
select
pStart.User_id,
pStart.leg_id,
(pEnd.created - pStart.created) timeDuration
from (select pu.user_id, pu.leg_id, pu.created
from points_users pu
join points p on pu.id = p.point_id and pu.leg_id = p.leg_id
where p.is_start = 1 ) pStart
join (select pu.user_id, pu.leg_id, pu.created
from points_users pu
join points p on pu.id = p.point_id and pu.leg_id = p.leg_id
where p.is_start = 0 ) pEnd
on pStart.user_id = pEnd.user_id
and pStart.leg_id = pEnd.leg_id
) tt
on users.user_id = tt.user_id
group by users.user_id, users.user_name
Subquery gets the time duration for each user/leg, and main query then sums them for all the legs of each user.
EDIT: Added the points table now that I can see your attempt at a query.
The simplest way is to join points_users to itself:
select leg_start.user_id, sum(leg_end.created - leg_start.created)
from points_users leg_start
join points_users leg_end on leg_start.user_id = leg_end.user_id
and leg_start.leg_id = leg_end.leg_id
join points point_start on leg_start.point_id = point_start.id
join points point_end on leg_end.point_id = point_end.id
where point_start.is_start = 1 and point_end.is_start = 0
group by leg_start.user_id
Some people prefer to put those is_start filters in the join condition, but since it's an inner join that's mainly just a point of style. If it were an outer join, then moving them from the WHERE to the JOIN could have an effect on the results.