Index with JOIN and GROUP BY

Index with JOIN and GROUP BY - mysql

I have the following select in my application:
SELECT psd.type, count(*) as total
FROM question_answer_nps qan
JOIN question_answer qa ON
qan.question_answer_id = qa.id
JOIN post_sale_action psa ON
psa.id = qa.post_sale_action_id
JOIN post_sale_dispatch psd ON
psd.id = psa.post_sale_dispatch_id
JOIN post_sale_lot psl ON
psl.id = psd.post_sale_lot_id
WHERE qan.nps_answer is not null
AND psd.status != 0
AND psa.status in (2, 3)
AND psl.status != 0
GROUP BY psd.type
All joins happen with FK (of course). But I can not form an index to optimize this query.
The explain this select can be viewed at: https://mariadb.org/explain_analyzer/analyze/5mhc6
What would be the index for this select?

Related

Mysql subquery for sum all columns in inner Join

I am attempting to get the sum of 12 columns (in same table) in a subquery in an inner join.
Here is a link to my schema :
SqlFiddle
The query I am attempting to use is this:
SELECT
`inventory`.`part_number`,
`inventory`.`qty`,
`inventory`.`description`,
`reorder`.`reorder_point` * '1' `point`,
`inventory`.`cost`,
`vendor`.`name` AS `vendor_name`, SELECT (SUM(`saleshistory`.`Sales_1_Month_Prior`)+SUM(`saleshistory`.`Sales_2_Month_Prior`)+SUM(`saleshistory`.`Sales_3_Month_Prior`)+SUM(`saleshistory`.`Sales_4_Month_Prior`)+SUM(`saleshistory`.`Sales_5_Month_Prior`)+SUM(`saleshistory`.`Sales_6_Month_Prior`)+SUM(`saleshistory`.`Sales_7_Month_Prior`)+SUM(`saleshistory`.`Sales_8_Month_Prior`)+SUM(`saleshistory`.`Sales_9_Month_Prior`)+SUM(`saleshistory`.`Sales_10_Month_Prior`)+SUM(`saleshistory`.`Sales_11_Month_Prior`)+SUM(`saleshistory`.`Sales_12_Month_Prior`) AS TTL
FROM `inventory`
LEFT JOIN `reorder` ON `inventory`.`part_number` = `reorder`.`part_number`
LEFT JOIN `vendor` ON `inventory`.`vendor` = `vendor`.`vendor_id`
INNER JOIN `saleshistory` ON `saleshistory`.`location` = `inventory`.`location` AND `saleshistory`.`part_number` = `inventory`.`part_number`
WHERE `inventory`.`qty` <= `reorder`.`reorder_point`
AND `inventory`.`location` = '99'
AND `reorder`.`reorder_point` != '0'
GROUP BY `inventory`.`part_number`
ORDER BY `vendor`.`name` ASC
When using this query, it returns all the values for all the records not just the rows.

sql nested join inner and left join

Hi I want to filter logs using MySQL with a list of trackings.
Every log belongs to a server,
Every tracking belongs to a server and have 0..N patterns
Every pattern belongs to a tracking
I have 3 tables :
logs : | id | ip | url | server_id | ...
tracking : | id | server_id | name | other fields...
pattern : | id | tracking_id | pattern |
I want to count logs that match tracking for a specific server my problem is that my query mix up tracking that have pattern and those that don't.
SQL Fiddle : http://sqlfiddle.com/#!2/f11b1/2
SELECT COUNT(DISTINCT logs.ip), tr.name
FROM `logs`
INNER JOIN `trackings` as tr ON
( tr.server_id = logs.server_id )
AND -- OTHER conditions between log and tracking
LEFT JOIN `patterns` as pt ON
( pt.tracking_id = tr.id )
AND (logs.url LIKE pt.pattern )
GROUP BY tr.id
My problem is on the second join, if I use INNER JOIN patterns as pt ON I get correct results but only on trackings that have some patterns,
If I use LEFT JOIN patterns as pt ON I get all tracking but with a false count (I get the result of SELECT COUNT(DISTINCT logs.ip) FROM logs )
EDIT
I Can get the correct result with a field in tracking that indicates if the tracking has patterns and a UNION :
(
SELECT COUNT(DISTINCT lg.ip), tr.name
FROM `logs` as lg
INNER JOIN `trackings` as tr ON
( tr.server_id = lg.server_id )
AND (tr.hasPatterns = 1)
AND -- Other conditions
INNER JOIN `patterns` as pt ON
( pt.tracking_id = tr.id )
AND (lg.url LIKE pt.pattern )
WHERE
GROUP BY tr.id
)
UNION
(
SELECT COUNT(DISTINCT lg.ip), tr.name
FROM `logs` as lg
INNER JOIN `trackings` as tr ON
( tr.server_id = lg.server_id )
AND (tr.hasPatterns = 0)
WHERE
GROUP BY tr.id, lg.date
)
But I guess there is a way to do that without using Union...

You can put a conditional inside count, so I think the following does what you want:
SELECT COUNT(DISTINCT (case when tr.hasPatterns = 1 and pt.tracking_id is not null
then lg.ip
when tr.hasPatterns = 0
then lg.ip
end)), tr.name
FROM `logs` as lg
INNER JOIN `trackings` as tr ON
( tr.server_id = lg.server_id )
AND -- Other conditions
LEFT JOIN `patterns` as pt ON
( pt.tracking_id = tr.id )
AND (lg.url LIKE pt.pattern )
WHERE
GROUP BY tr.id
EDIT:
This is returning what you want:
SELECT COUNT(DISTINCT (case when tr.size = 0 and pt.tracking_id is not null
then lg.ip
when tr.size > 0 and lg.size > tr.size
then lg.ip
end)), tr.name
FROM `logs` as lg
INNER JOIN `trackings` as tr ON
( tr.server_id = lg.server_id )
LEFT JOIN `patterns` as pt ON
( pt.tracking_id = tr.id )
AND (lg.url LIKE pt.pattern )
GROUP BY tr.id;
Your SQL Fiddle has the additional condition lg.size > tr.size which is not in the original question.

SQL: return user table with calculated column for match percentage?

I'm currently writing a webapp that matches users based on answered question. I've realized my matching algorithm in just one query and tuned it so far that it takes 8.2ms to calculate the match percentage between 2 users. But my webapp has to take a list of users and iterate through the list performing this query. For 5000 users it took 50sec on my local machine. Is it possible to put everything in one query that returns one column with the user_id and one column with the calculated match? Or is a stored procedure an option?
I'm currently working with MySQL but willing to switch databases if needed.
For anyone interested in the schema and data, I've created a SQLFiddle: http://sqlfiddle.com/#!2/84233/1
and my matching query:
SELECT COALESCE(SQRT( (100.0*as1.actual_score/ps1.possible_score) * (100.0*as2.actual_score/ps2.possible_score) ) - (100/ps1.commonquestions), 0) AS perc
FROM (SELECT SUM(imp.value) AS actual_score
FROM user_questions AS uq1
INNER JOIN importances imp ON imp.id = uq1.importance
INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = 101
AND (uq1.accans1 = uq2.answer_id
OR uq1.accans2 = uq2.answer_id
OR uq1.accans3 = uq2.answer_id
OR uq1.accans4 = uq2.answer_id)
WHERE uq1.user_id = 1) AS as1,
(SELECT SUM(value) AS possible_score, COUNT(*) AS commonquestions
FROM user_questions AS uq1
INNER JOIN importances ON importances.id = uq1.importance
INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = 101
WHERE uq1.user_id = 1) AS ps1,
(SELECT SUM(imp.value) AS actual_score
FROM user_questions AS uq1
INNER JOIN importances imp ON imp.id = uq1.importance
INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id AND uq2.user_id = 1
AND (uq1.accans1 = uq2.answer_id
OR uq1.accans2 = uq2.answer_id
OR uq1.accans3 = uq2.answer_id
OR uq1.accans4 = uq2.answer_id)
WHERE uq1.user_id = 101) AS as2,
(SELECT SUM(value) AS possible_score
FROM user_questions AS uq1
INNER JOIN importances ON importances.id = uq1.importance
INNER JOIN user_questions uq2 ON uq1.question_id = uq2.question_id AND uq2.user_id = 1
WHERE uq1.user_id = 101) AS ps2

I was bored, so: Here's a rewritten version of your query - based on a PostgreSQL port of your schema - that calculates the matches for all user pairings at once:
http://sqlfiddle.com/#!12/30524/6
I've checked and it produces the same results for the user pair (1,5).
WITH
userids(uid) AS (
select distinct user_id from user_questions
),
users(u1,u2) AS (
SELECT u1.uid, u2.uid FROM userids u1 CROSS JOIN userids u2 WHERE u1 <> u2
),
scores AS (
SELECT
sum(CASE WHEN uq2.answer_id IN (uq1.accans1, uq1.accans2, uq1.accans3, uq1.accans4) THEN imp.value ELSE 0 END) AS actual_score,
sum(imp.value) AS potential_score,
count(1) AS common_questions,
users.u1,
users.u2
FROM user_questions AS uq1
INNER JOIN importances imp ON imp.id = uq1.importance
INNER JOIN user_questions uq2 ON uq2.question_id = uq1.question_id
INNER JOIN users ON (uq1.user_id=users.u1 AND uq2.user_id=users.u2)
GROUP BY u1, u2
),
score_pairs(u1,u2,u1_actual,u2_actual,u1_potential,u2_potential,common_questions) AS (
SELECT s1.u1, s1.u2, s1.actual_score, s2.actual_score, s1.potential_score, s2.potential_score, s1.common_questions
FROM scores s1 INNER JOIN scores s2 ON (s1.u1 = s2.u2 AND s1.u2 = s2.u1)
WHERE s1.u1 < s1.u2
)
SELECT
u1, u2,
COALESCE(SQRT( (100.0*u1_actual/u1_potential) * (100.0*u2_actual/u2_potential) ) - (100/common_questions), 0) AS "match"
FROM score_pairs;
There's no reason you couldn't port this back to MySQL, as the CTE is only there for readability and doesn't do anything you can't do with FROM (SELECT ...). There's no WITH RECURSIVE clause and no CTE is referenced from more than one other CTE. You'd have a bit of a scary nested query, but that's just a formatting challenge.
Changes:
Generate a set of distinct users
Self-join that set of distinct users to create a set of user pairings
and then join on that list of pairings in the score query to produce a table of scores
Produce the scores table by combining the largely duplicate queries for possiblescore1 and possiblescore2, actualscore1 and actualscore2.
then summarize it in the final outer query
I haven't optimised the query; as written it runs in 5ms on my system. On bigger data it's possible you may need to restructure some of it or use tricks like converting some CTE clauses into SELECT ... INTO TEMPORARY TABLE temp table creation statements that you then index before querying.
It's also possible that you'll want to move the generation of the users rowset out of the CTE and into a FROM subquery clause of scores. That's because WITH is required to behave as an optimisation fence between clauses, so the database must materialize rows and can't use tricks like pushing clauses up or down.

SQL - Multiple many-to-many relations filtering SELECT

These are my tables:
Cadastros (id, nome)
Convenios (id, nome)
Especialidades (id, nome)
Facilidades (id, nome)
And the join tables:
cadastros_convenios
cadastros_especialidades
cadastros_facilidades
The table I'm querying for: Cadastros
I'm using MySQL.
The system will allow the user to select multiple "Convenios", "Especialidades" and "Facilidades". Think of each of these tables as a different type of "tag". The user will be able to select multiple "tags" of each type.
What I want is to select only the results in Cadastros table that are related with ALL the "tags" from the 3 different tables provided. Please note it's not an "OR" relation. It should only return the row from Cadastros if it has a matching link table row for EVERY "tag" provided.
Here is what I have so far:
SELECT Cadastro.*, Convenio.* FROM Cadastros AS Cadastro
INNER JOIN cadastros_convenios AS CadastrosConvenio ON(Cadastro.id = CadastrosConvenio.cadastro_id)
INNER JOIN Convenios AS Convenio ON (CadastrosConvenio.convenio_id = Convenio.id AND Convenio.id IN(2,3))
INNER JOIN cadastros_especialidades AS CadastrosEspecialidade ON (Cadastro.id = CadastrosEspecialidade.cadastro_id)
INNER JOIN Especialidades AS Especialidade ON(CadastrosEspecialidade.especialidade_id = Especialidade.id AND Especialidade.id IN(1))
INNER JOIN cadastros_facilidades AS CadastrosFacilidade ON (Cadastro.id = CadastrosFacilidade.cadastro_id)
INNER JOIN Facilidades AS Facilidade ON(CadastrosFacilidade.facilidade_id = Facilidade.id AND Facilidade.id IN(1,2))
GROUP BY Cadastro.id
HAVING COUNT(*) = 5;
I'm using the HAVING clause to try to filter the results based on the number of times it shows (meaning the number of times it has been successfully "INNER JOINED"). So in every case, the count should be equal to the number of different filters I added. So if I add 3 different "tags", the count should be 3. If I add 5 different tags, the count should be 5 and so on. It works fine for a single relation (a single pair of inner joins). When I add the other 2 relations it starts to lose control.
EDIT
Here is something that I believe is working (thanks #Tomalak for pointing out the solution with sub-queries):
SELECT Cadastro.*, Convenio.*, Especialidade.*, Facilidade.* FROM Cadastros AS Cadastro
INNER JOIN cadastros_convenios AS CadastrosConvenio ON(Cadastro.id = CadastrosConvenio.cadastro_id)
INNER JOIN Convenios AS Convenio ON (CadastrosConvenio.convenio_id = Convenio.id)
INNER JOIN cadastros_especialidades AS CadastrosEspecialidade ON (Cadastro.id = CadastrosEspecialidade.cadastro_id)
INNER JOIN Especialidades AS Especialidade ON(CadastrosEspecialidade.especialidade_id = Especialidade.id)
INNER JOIN cadastros_facilidades AS CadastrosFacilidade ON (Cadastro.id = CadastrosFacilidade.cadastro_id)
INNER JOIN Facilidades AS Facilidade ON(CadastrosFacilidade.facilidade_id = Facilidade.id)
WHERE
(SELECT COUNT(*) FROM cadastros_convenios WHERE cadastro_id = Cadastro.id AND convenio_id IN(1, 2, 3)) = 3
AND
(SELECT COUNT(*) FROM cadastros_especialidades WHERE cadastro_id = Cadastro.id AND especialidade_id IN(3)) = 1
AND
(SELECT COUNT(*) FROM cadastros_facilidades WHERE cadastro_id = Cadastro.id AND facilidade_id IN(2, 3)) = 2
GROUP BY Cadastro.id
But I'm concerned about performance. It looks like these 3 sub-queries in the WHERE clause are gonna be over-executed...
Another solution
It joins subsequent tables only if the previous joins were a success (if no rows match one of the joins, the next joins are gonna be joining an empty result-set) (thanks #DRapp for this one)
SELECT STRAIGHT_JOIN
Cadastro.*
FROM
( SELECT Qualify1.cadastro_id
from
( SELECT cc1.cadastro_id
FROM cadastros_convenios cc1
WHERE cc1.convenio_id IN (1, 2, 3)
GROUP by cc1.cadastro_id
having COUNT(*) = 3 ) Qualify1
JOIN
( SELECT ce1.cadastro_id
FROM cadastros_especialidades ce1
WHERE ce1.especialidade_id IN( 3 )
GROUP by ce1.cadastro_id
having COUNT(*) = 1 ) Qualify2
ON (Qualify1.cadastro_id = Qualify2.cadastro_id)
JOIN
( SELECT cf1.cadastro_id
FROM cadastros_facilidades cf1
WHERE cf1.facilidade_id IN (2, 3)
GROUP BY cf1.cadastro_id
having COUNT(*) = 2 ) Qualify3
ON (Qualify2.cadastro_id = Qualify3.cadastro_id) ) FullSet
JOIN Cadastros AS Cadastro
ON FullSet.cadastro_id = Cadastro.id
INNER JOIN cadastros_convenios AS CC
ON (Cadastro.id = CC.cadastro_id)
INNER JOIN Convenios AS Convenio
ON (CC.convenio_id = Convenio.id)
INNER JOIN cadastros_especialidades AS CE
ON (Cadastro.id = CE.cadastro_id)
INNER JOIN Especialidades AS Especialidade
ON (CE.especialidade_id = Especialidade.id)
INNER JOIN cadastros_facilidades AS CF
ON (Cadastro.id = CF.cadastro_id)
INNER JOIN Facilidades AS Facilidade
ON (CF.facilidade_id = Facilidade.id)
GROUP BY Cadastro.id

Emphasis mine
"It should only return the row from Cadastros if it has a matching row for EVERY "tag" provided."
"where there is a matching row"-problems are easily solved with EXISTS.
EDIT After some clarification, I see that using EXISTS is not enough. Comparing the actual row counts is necessary:
SELECT
*
FROM
Cadastros c
WHERE
(SELECT COUNT(*) FROM cadastros_facilidades WHERE cadastro_id = c.id AND id IN (2,3)) = 2
AND
(SELECT COUNT(*) FROM cadastros_especialidades WHERE cadastro_id = c.id AND id IN (1)) = 1
AND
(SELECT COUNT(*) FROM cadastros_facilidades WHERE cadastro_id = c.id AND id IN (1,2)) = 2
The indexes on the link tables should be (cadastro_id, id) for this query.

Depending on the size of the tables (records), WHERE-based subqueries, running a test on every row CAN SIGNIFICANTLY hit performance. I have restructured it which MIGHT better help, but only you would be able to confirm. The premise here is to have the first table based on getting distinct IDs that meet the criteria, join THAT set to the next qualifier criteria... joined to the FINAL set. Once that has been determined, use THAT to join to your main table and its subsequent links to get the details you are expecting. You also had an overall group by by the ID which will eliminate all other nested entries as found in the support details table.
All that said, lets take a look at this scenario. Start with the table that would be EXPECTED TO HAVE THE LOWEST RESULT SET to join to the next and next. if cadastros_convenios has IDs that match all the criteria include IDs 1-100, great, we know at MOST, we'll have 100 ids.
Now, these 100 entries are immediately JOINED to the 2nd qualifying criteria... of which, say it only matches ever other... for simplicity, we are now matched on 50 of the 100.
Finally, JOIN to the 3rd qualifier based on the 50 that qualified and you get 30 entries. So, within these 3 queries you are now filtered down to 30 entries with all the qualifying criteria handled up front. NOW, join to the Cadastros and then subsequent tables for the details based ONLY on the 30 that qualified.
Since your original query would eventually TRY EVERY "ID" for the criteria, why not pre-qualify it up front with ONE query and get just those that hit, then move on.
SELECT STRAIGHT_JOIN
Cadastro.*,
Convenio.*,
Especialidade.*,
Facilidade.*
FROM
( SELECT Qualify1.cadastro_id
from
( SELECT cc1.cadastro_id
FROM cadastros_convenios cc1
WHERE cc1.convenio_id IN (1, 2, 3)
GROUP by cc1.cadastro_id
having COUNT(*) = 3 ) Qualify1
JOIN
( SELECT ce1.cadastro_id
FROM cadastros_especialidades ce1
WHERE ce1.especialidade_id IN( 3 )
GROUP by ce1.cadastro_id
having COUNT(*) = 1 ) Qualify2
ON Qualify1.cadastro_id = Qualify2.cadastro_id
JOIN
( SELECT cf1.cadastro_id
FROM cadastros_facilidades cf1
WHERE cf1.facilidade_id IN (2, 3)
GROUP BY cf1.cadastro_id
having COUNT(*) = 2 ) Qualify3
ON Qualify2.cadastro_id = Qualify3.cadastro_id ) FullSet
JOIN Cadastros AS Cadastro
ON FullSet.Cadastro_id = Cadastro.Cadastro_id
INNER JOIN cadastros_convenios AS CC
ON Cadastro.id = CC.cadastro_id
INNER JOIN Convenios AS C
ON CC.convenio_id = C.id
INNER JOIN cadastros_especialidades AS CE
ON Cadastro.id = CE.cadastro_id
INNER JOIN Especialidades AS E
ON CE.especialidade_id = E.id
INNER JOIN cadastros_facilidades AS CF
ON Cadastro.id = CF.cadastro_id
INNER JOIN Facilidades AS F
ON CF.facilidade_id = F.id

MySQL Query Optimisation

Looking for some help with optimising the query below. Seems to be two bottlenecks at the moment which cause it to take around 90s to complete the query. There's only 5000 products so it's not exactly a massive database/table. The bottlenecks are SQL_CALC_FOUND_ROWS and the ORDER BY statement - If I remove both of these it takes around a second to run the query.
I've tried removing SQL_CALC_FOUND_ROWS and running a count() statement, but that takes a long time as well..
Is the best thing going to be to use INNER JOIN's (which I'm not too familiar with) as per the following Stackoverflow post? Slow query when using ORDER BY
SELECT SQL_CALC_FOUND_ROWS *
FROM tbl_products
LEFT JOIN tbl_link_products_categories ON lpc_p_id = p_id
LEFT JOIN tbl_link_products_brands ON lpb_p_id = p_id
LEFT JOIN tbl_link_products_authors ON lpa_p_id = p_id
LEFT JOIN tbl_link_products_narrators ON lpn_p_id = p_id
LEFT JOIN tbl_linkfiles ON lf_id = p_id
AND (
lf_table = 'tbl_products'
OR lf_table IS NULL
)
LEFT JOIN tbl_files ON lf_file_id = file_id
AND (
file_nameid = 'p_main_image_'
OR file_nameid IS NULL
)
WHERE p_live = 'y'
ORDER BY p_title_clean ASC, p_title ASC
LIMIT 0 , 10

You could try reducing the size of the joins by using a derived table to retrieve the filtered and ordered products before joining. This assumes that p_live, p_title_clean and p_title are fields in your tbl_products table -
SELECT *
FROM (SELECT *
FROM tbl_products
WHERE p_live = 'y'
ORDER BY p_title_clean ASC, p_title ASC
LIMIT 0 , 10
) AS tbl_products
LEFT JOIN tbl_link_products_categories
ON lpc_p_id = p_id
LEFT JOIN tbl_link_products_brands
ON lpb_p_id = p_id
LEFT JOIN tbl_link_products_authors
ON lpa_p_id = p_id
LEFT JOIN tbl_link_products_narrators
ON lpn_p_id = p_id
LEFT JOIN tbl_linkfiles
ON lf_id = p_id
AND (
lf_table = 'tbl_products'
OR lf_table IS NULL
)
LEFT JOIN tbl_files
ON lf_file_id = file_id
AND (
file_nameid = 'p_main_image_'
OR file_nameid IS NULL
)
This is a "stab in the dark" as there is not enough detail in your question.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Index with JOIN and GROUP BY - mysql

Related

Mysql subquery for sum all columns in inner Join

sql nested join inner and left join

SQL: return user table with calculated column for match percentage?

SQL - Multiple many-to-many relations filtering SELECT

MySQL Query Optimisation

Categories

Resources