Optimization of MySQL query have millions of records - mysql

OBJECTIVE: Need query to count all "distinct" leads outside of current company that do not exist in current company. The query needs to account for millions of records between multiple tables (lead_details, domains, company)
EXAMPLE:
company 1 -> domain 1 -> lead 1 lead_details records exists.
company 2 -> domain 2 -> lead 1 lead_details records exists.
company 2 -> domain 2 -> lead 2 lead_details records exists.
company 3 -> domain 3 -> lead 2 lead_details records exists.
company 3 -> domain 3 -> lead 3 lead_details records exists.
RESULT: If I run the query for the data above on company 1, the result should be a count of (2) since lead 2 & lead 3 is unique and does not exist in company 1
domain_id domain_name company_id company_name lead_id lead_count
"2" "D2" "2" "C2" "2" "2"
"3" "D3" "3" "C3" "3" "1"
Here is my Query, Please let me know if anyone has any better suggestion.
SELECT al.*
FROM (
SELECT
d.id AS domain_id,
d.name AS domain_name,
c.id AS company_id,
c.name AS company_name,
ld.lead_id,
count(ld.lead_id) as lead_count
FROM domains d
INNER JOIN company c
ON (c.id = d.company_id AND c.id != 1)
INNER JOIN lead_details ld
ON (ld.domain_id = d.id)
GROUP BY ld.lead_id
) al
LEFT JOIN (
SELECT ld.lead_id FROM domains d
INNER JOIN company c
ON (c.id = d.company_id AND c.id = 1)
INNER JOIN lead_details ld
ON (ld.domain_id = d.id)
) ccl
ON al.lead_id = ccl.lead_id
WHERE ccl.lead_id IS NULL;
I have almost million rows, so need to figure out better solution..

Plan A
The pattern
FROM ( SELECT ... )
JOIN ( SELECT ... ) ON ...
is inefficient, especially in older versions of MySQL. This is because neither of the subqueries has any indexes, so (in older versions) a repeated full table scan is needed of one of the subqueries.
The better method is to try to reformulate as
FROM t1 ...
JOIN t2 ... ON ...
JOIN t3 ... ON ...
LEFT JOIN t4 ... ON ...
LEFT JOIN t5 ... ON ...
Plan B
This is closer to what you have...
CREATE TEMPORARY TABLE ccl
( INDEX(lead_id) )
SELECT ... -- the stuff that is after LEFT JOIN
Then replace that subquery with just ccl. This provides the index that is missing from the original query.
Plan C
Summary Table. (This may or may not be practical for your query, since you are looking distinct and do not exist.) Every month (or week or whatever) calculate subtotals for the last month and store it into another table. Then the query against this other table will be much faster.

Related

MySQL - Marking duplicates from several table fields, as well as data from another table

I have two tables - one shows user purchases, and one shows a product id with it's corresponding product type.
My client wants to make duplicate users inactive based on last name and email address, but wants to run the query by product type (based on what type of product they purchased), and only wants to include user_ids who haven't purchased paint (product ids 5 and 6). So the query will be run multiple times - once for all people who have purchased lawnmowers, and then for all people who have purchased leafblowers etc (and there will be some overlap between these two). No user_id that has purchased paint should be made inactive.
In terms of who should stay active among the duplicates, the one to stay active will be the one with the highest product id purchased (as products are released annually). If they have multiple records with the same product id, the record to stay active will be the one with most recent d_modified and t_modified.
I also want to shift the current value of 'inactive' to the 'previously_inactive' column, so that this can be easily reversed if need be.
Here is some sample table data
If the query was run by leafblower purchases, rows 5, 6, and 7 would be made inactive. This is the expected output:
If the query was run by lawnmower purchases, rows 1 and 2 would be made inactive. This would be the expected output:
If row 4 was not the most recent, it would still not be made inactive, as user_id 888 had bought paint (and we want to exclude these user_ids from being made inactive).
This is an un-optimised version of the query for 'leafblower' purchases (it is working, but will probably be too slow in the interface):
UPDATE test.user_purchases
SET inactive = 1
WHERE id IN (
SELECT z.id
FROM (SELECT * FROM test.user_purchases) z
WHERE z.product_id IN (
SELECT product_id
FROM test.products
WHERE product_type IN ("leafblower")
)
AND id NOT IN (
SELECT a.id
FROM (SELECT * FROM test.user_purchases) a
INNER JOIN (
SELECT r.surname, r.email
FROM (SELECT * FROM test.user_purchases) r
JOIN test.products s on r.product_id = s.product_id
WHERE s.product_type IN ("paint")
) b
WHERE a.surname = b.surname
AND a.email = b.email
)
AND id NOT IN (
SELECT MAX(z.id)
FROM (SELECT * FROM test.user_purchases) z
WHERE z.product_id IN (
SELECT product_id
FROM test.products
WHERE product_type IN ("leafblower")
)
AND id NOT IN (
SELECT a.id
FROM (SELECT * FROM test.user_purchases) a
INNER JOIN (
SELECT r.surname, r.email
FROM (SELECT * FROM test.user_purchases) r
JOIN test.products s on r.product_id = s.product_id
WHERE s.product_type IN ("paint")
) b
WHERE a.surname = b.surname
AND a.email = b.email
)
GROUP BY surname, email
)
)
Any suggestions on how I can streamline this query and optimise the speed of it would be much appreciated.

Sql query join on not eqal

So i have this relational model for hospital (not made by me).
Patient (has an adress and an id), hospital (has id and address), and also there's a table for relationship representing placement in the hospital (hospital.id, patient.id) (also there's other tables, but they don't matter in this query);
The purpose of the query is to find hospitals where is no placed patients from from other cities than hospital's one (on condition that address only contains city).
The problem that i have is theoretical, i don't really know if to use full outer join with a or b null, or something else in the query that finds hospitals containing "foreign" patients, (like join hospital with its placement and then full outer join with a or b table record null, but that leads to a question will i get results in the query? Because i need cities that don't match but all the explanations of that join are about .
Thanks to all who embraced my utterly imperfect english and understood it.
Upd.
Patient:
id=1, city =A;
id=2, city =B;
id=3, city =B;
id=4, city =A;
id=5, city =C;
Hospital:
Id =1, city=A
id =2, city=B;
Placement:
h.id p.id
1 1
1 4
2 2
2 3
2 5
Expected results is "1", id of the first hospital (where's no patients from other city) and others with that "feature"
my query is like
select id from hospital where id not in
(select id,address from hospital inner join placement on h.id=placement.h.id as b inner join patient on placement.p.id=p.id where hospital.address<>patient.address )
Sorry for the delay
Is shawn's query correct?
Can i use h.id instead 1? Idk if our teacher would accept that, because he's never showed us something like that and in 10 years he hasn't managed to create an example of that database for students to test queries on.
select * from hospitals h
where not exists (
select 1 -- dummy value, use h.id if you prefer
from patients p inner join placement pl on pl.pid = p.id
where pl.hid = h.id and p.city <> h.city
)
or
select h.id
from hospitals h
left outer join
placement pl inner join patients p on p.id = pl.pid
on pl.hid = h.id
group by h.id
-- this won't count nulls resulting from zero placements for that hospital
-- as long as standard sql null comparisons are active
having count(case when h.city <> p.city then 1 end) = 0
Looks like it works to me: http://rextester.com/BTJB59061

Mutual friends sql

I've seen multiple SO posts on mutual friends but I've structured my friends table in my db so that there are no duplicates e.g. (1,2) and not (2,1)
Create Table Friends(
user1_id int,
user2_id int
);
and then a constraint to make sure user1 id is always smaller than user2 id e.g 4 < 5
Mutual friends sql with join (Mysql)
I see suggestions that to find mutual friends it can be found using a join, so this is what I have but I think it's wrong because if I count the data in my db with the actual result from the query I get different results
select f1.user1_id as user1, f2.user1_id as user2, count(f1.user2_id) as
mutual_count from Friends f1 JOIN Friends f2 ON
f1.user2_id = f2.user2_id AND f1.user1_id <> f2.user1_id GROUP BY
f1.user1_id, f2.user1_id order by mutual_count desc
There are three join scenarios that I can see.
1 -> 2 -> 3 (mutual friend id between other IDs)
2 -> 3 -> 1 (mutual friend id > other IDs)
2 -> 1 -> 3 (mutual friend id < other IDs)
This can be resolved with this predicate...
ON f1.user1_id IN (f2.user1_id, f2.user2_id)
OR f1.user2_id IN (f2.user1_id, f2.user2_id)
AND <not joining the row to Itself>
But that will totally mess up the optimiser's ability to use indexes.
So, I'd union multiple queries.
(pseudo code as I'm on a phone)
SELECT u1, u2, COUNT(*) FROM
(
SELECT f1.u1, f2.u2 FROM f1 INNER JOIN f2 ON f1.u2 = f2.u1 AND f1.u1 <> f2.u2
UNION ALL
SELECT f1.u1, f2.u1 FROM f1 INNER JOIN f2 ON f1.u2 = f2.u2 AND f1.u1 <> f2.u1
UNION ALL
SELECT f1.u2, f2.u2 FROM f1 INNER JOIN f2 ON f1.u1 = f2.u1 AND f1.u2 <> f2.u2
) all_combinations
GROUP BY u1, u2
Each individual query will then be able to fully utilise indexes. (Put one index on u1 and another index on u2)
The result should be less esoteric code (with fairly long CASE statements) and a much lower costed execution plan.

Optimize complex MySQL selects for reporting

I'm building an application in which,
a) each librarian can create a campaign
b) actions carried out as part of that campaign are tracked in campaign_actions, actions being page loads
In order to report on the number of actions made in each campaign, I wrote this SQL query (for MySQL) for the following database structure, with the intention of tracking the number of actions undertaken by a librarian for each campaign:
LIBRARIANS
id | status
CAMPAIGNS
id | librarian_id
CAMPAIGN_ACTIONS
id | campaign_id | name
The problems I am having are:
a) I have to specify the fields I want to count in the correlated subselects
b) The query will be quite expensive as a result
My question is, since there are multiple actions for a campaign, how can I effectively count the number of actions per campaign in a more efficient manner?
Less complex queries amount to returning a result set like so:
librarians.id | librarians.status | campaign_actions.name
1 3 pageX
1 3 pageY
1 3 pageZ
1 3 pageA
1 3 pageB
2 3 pageX
which means i'd have to parse the result set in application code row by row, which is likely to be more expensive.
I appreciate any thoughts you may have on this problem.
Breaking the task into smaller tasks (views):
--- campaings per librarian
CREATE VIEW count_librarian_campaigns
AS ( SELECT lib.id AS lib_id
, COUNT(c.id)
AS num_campaigns
FROM librarians lib
LEFT JOIN campaigns c
ON c.librarian_id = lib.id
GROUP BY lib.id
)
--- campaign actions per campaign
CREATE VIEW count_campaign_actions
AS ( SELECT c.id AS c_id
, COUNT(ca.campaign_id)
AS num_actions
FROM campaigns c
LEFT JOIN campaign_actions ca
ON ca.campaign_id = c.id
GROUP BY c.id
)
So, you could have queries like this:
SELECT lib.id AS lib_id
, countlibc.num_campaigns
, c.id AS c_id
, countca.num_actions
FROM librarians lib
JOIN count_librarian_campaigns countlibc
ON countlibc.lib_id = lib.id
LEFT JOIN campaigns c
ON c.librarian_id = lib.id
JOIN count_campaign_actions countca
ON countca.c_id = c.id
Does something like this work for your purposes?
SELECT librarians_id,COUNT(librarians_id) FROM (
SELECT librarians.id as librarians_id,
librarians.status as librarians_status,
campaign_actions.name as campaign_actions_name
FROM campaign_actions
INNER JOIN campaigns
ON campaign_actions.campaign_id = campaigns.id
INNER JOIN librarians
ON campaigns.librarian_id = librarians.id
GROUP BY campaign_actions.name,librarians.id,librarians.status ) as a
Maybe I misunderstood what you're after. The inner query seems like it would return the table you represented above.

Mysql query in drupal database - groupwise maximum with duplicate data

I'm working on a mysql query in a Drupal database that pulls together users and two different cck content types. I know people ask for help with groupwise maximum queries all the time... I've done my best but I need help.
This is what I have so far:
# the artists
SELECT
users.uid,
users.name AS username,
n1.title AS artist_name
FROM users
LEFT JOIN users_roles ur
ON users.uid=ur.uid
INNER JOIN role r
ON ur.rid=r.rid
AND r.name='artist'
LEFT JOIN node n1
ON n1.uid = users.uid
AND n1.type = 'submission'
WHERE users.status = 1
ORDER BY users.name;
This gives me data that looks like:
uid username artist_name
1 foo Joe the Plumber
2 bar Jane Doe
3 baz The Tooth Fairy
Also, I've got this query:
# artwork
SELECT
n.nid,
n.uid,
a.field_order_value
FROM node n
LEFT JOIN content_type_artwork a
ON n.nid = a.nid
WHERE n.type = 'artwork'
ORDER BY n.uid, a.field_order_value;
Which gives me data like this:
nid uid field_order_value
1 1 1
2 1 3
3 1 2
4 2 NULL
5 3 1
6 3 1
Additional relevant info:
nid is the primary key for an Artwork
every Artist has one or more Artworks
valid data for field_order_value is NULL, 1, 2, 3, or 4
field_order_value is not necessarily unique per Artist - an Artist could have 4 Artworks all with field_order_value = 1.
What I want is the row with the minimum field_order_value from my second query joined with the artist information from the first query. In cases where the field_order_value is not valuable information (either because the Artist has used duplicate values among their Artworks or left that field NULL), I would like the row with the minimum nid from the second query.
The Solution
Using divide and conquer as a strategy and mysql views as a technique, and referencing this article about groupwise maximum queries, I solved my problem.
Create the View
# artists and artworks all in one table
CREATE VIEW artists_artwork AS
SELECT
users.uid,
users.name AS artist,
COALESCE(n1.title, 'Not Yet Entered') AS artist_name,
n2.nid,
a.field_image_fid,
COALESCE(a.field_order_value, 1) AS field_order_value
FROM users
LEFT JOIN users_roles ur
ON users.uid=ur.uid
INNER JOIN role r
ON ur.rid=r.rid
AND r.name='artist'
LEFT JOIN node n1
ON n1.uid = users.uid
AND n1.type = 'submission'
LEFT JOIN node n2
ON n2.uid = users.uid
AND n2.type = 'artwork'
LEFT JOIN content_type_artwork a ON n2.nid = a.nid
WHERE users.status = 1;
Query the View
SELECT
a2.uid,
a2.artist,
a2.artist_name,
a2.nid,
a2.field_image_fid,
a2.field_order_value
FROM (
SELECT
uid,
MIN(field_order_value) AS field_order_value
FROM artists_artwork
GROUP BY uid
) a1
JOIN artists_artwork a2
ON a2.nid = (
SELECT
nid
FROM artists_artwork a
WHERE a.uid = a1.uid
AND a.field_order_value = a1.field_order_value
ORDER BY
uid ASC, field_order_value ASC, nid ASC
LIMIT 1
)
ORDER BY artist;
A simple solution to this can be to create views in your database that can then be joined together. This is especially useful if you often want to see the intermediate data in the same way in some other place. While it is possible to mash together the one huge query, I just take the divide and conquer approach sometimes.