I have a database table, dictionary, that has two strings and a language id.
The columns are:
id
product_id
key
translation
language_id
Another table, dictionary_versions, just has the dictionary_id and version of translation
id
dictionary_id
version_id
I have two different translations in the table for each key, but sometimes, only one translation for the key. I am looking for a way to get all translations of one language, and, if there are any, translation of another language id.
Im looking to get all translation of one language_id and organizer_id and one version. I have tried to create a temporary table with all values of what I am comparing to, then doing a left join of the temporary table on the dictionary table to get all translated values of the desired language, and if there are any translations of another language, include those.
The problem is that it gets slow when there are 10000 translations for two languages. Is there a better way to join a table onto itself using the dictionary_versions table as a where clause on both?
SELECT
d.*
FROM
dictionary d
LEFT JOIN
dictionary_versions dv ON dv.dictionary_id = d.id
LEFT JOIN
dictionary d2
LEFT JOIN
dictionary_versions dv2 ON d2.id = dv2.dictionary_id
ON
d2.key = d.key
WHERE
d.product_id = 1 AND dv.version_id = 3
AND
d.language_id = 1
LIMIT
0,10
This was one of the queries I tried. However, if there are multiple versions, then it gets all version of the fd2 table, leading to data inaccuracies.
The other way I tried, was with the temporary table, this works, but it slow. it is done in two queries:
CREATE TEMPORARY TABLE IF NOT EXISTS dictionary_temp_1
AS (
SELECT d.* FROM `dictionary` AS `d`
LEFT JOIN `dictionary_versions` AS `dv` ON dv.dictionary_id = d.id
WHERE d.product_id = 1 AND dv.version_id = 3 AND d.language_id = 1
ORDER BY fd.key ASC LIMIT 0,10 )
Second:
SELECT
d.key,
d2.key AS toKey,
d.translation AS `1`,
d2.translation AS `2`
FROM
`dictionary` AS `d`
LEFT JOIN
`dictionary_versions` AS `dv` ON d.id = dv.dictionary_id
LEFT JOIN
`dictionary_temp_1` AS `d2` ON d2.key = d.key
WHERE
d.product_id = 1 AND dv.version_id = 3 AND d.language_id = 1
ORDER BY
d.key ASC LIMIT 0,10
I managed to find a solution using this as a reference, thanks to #philipxy, pointing me to this answer for reference.
SELECT d.key, d2.key AS toKey, d.translation AS `1`, d2.translation AS `2`
FROM dictionary d
INNER JOIN dictionary_versions dv ON dv.dictionary_id = d.id
LEFT JOIN (
SELECT d.*
FROM `dictionary` AS `d`
INNER JOIN `dictionary_versions` AS `dv` ON d.id = dv.dictionary_id
WHERE d.product_id = 1 AND dv.version_id = 3 AND d.language_id = 2
) d2 ON d2.key = d.key
WHERE d.product_id = 1 AND dv.version_id = 3 AND d.language_id = 1
This creates a subquery before joining it to the table itself. The where condition is applied to both tables, therefore getting the translations for one language_id, and matching any for a different language_id, if they exist.
Related
I have a query which gives result as below, how to replace duplicate values with NULL
Query:
SELECT
word.lemma,
synset.definition,
synset.pos,
sampletable.sample
FROM
word
LEFT JOIN
sense ON word.wordid = sense.wordid
LEFT JOIN
synset ON sense.synsetid = synset.synsetid
LEFT JOIN
sampletable ON synset.synsetid = sampletable.synsetid
WHERE
word.lemma = 'good'
Result:
Required Result: all the greyed out results as NULL
First, this is the type of transformation that is generally better done at the application level. The reason is that it presupposes that the result set is in a particular order -- and you seem to be assuming this even with no order by clause.
Second, it is often simpler in the application.
However, in MySQL 8+, it is not that hard. You can do:
SELECT w.lemma,
(CASE WHEN ROW_NUMBER() OVER (PARTITION BY w.lemma, ss.definition ORDER BY st.sample) = 1
THEN ss.definition
END) as definition,
ss.pos,
st.sample
FROM word w LEFT JOIN
sense s
ON w.wordid = s.wordid LEFT JOIN
synset ss
ON s.synsetid = ss.synsetid LEFT JOIN
sampletable st
ON ss.synsetid = st.synsetid
WHERE w.lemma = 'good'
ORDER BY w.lemma, ss.definition, st.sample;
For this to work reliably, the outer ORDER BY clause needs to be compatible with the ORDER BY for the window function.
If you are using Mysql 8 try with Rank().. As I didn't have your table or data couldn't test this query.
SELECT
word.lemma
,case when r = 1 synset.definition else null end as definition
,synset.pos
,sampletable.sample
FROM
(
SELECT
word.lemma
,synset.definition
,synset.pos
,sampletable.sample
,RANK() OVER (PARTITION BY synset.definition ORDER BY synset.definition) r
FROM
(
SELECT
word.lemma,
synset.definition,
synset.pos,
sampletable.sample
FROM
word
LEFT JOIN
sense ON word.wordid = sense.wordid
LEFT JOIN
synset ON sense.synsetid = synset.synsetid
LEFT JOIN
sampletable ON synset.synsetid = sampletable.synsetid
WHERE
word.lemma = 'good'
) t
)t1;
I would like to ask, how can I optimize this query:
select
h.jmeno hrac,
n1.url hrac_url,
t.nazev tym,
n2.url tym_url,
ss.pocet_zapasu zapasy,
ss.pocet_minut minuty,
s.celkem_golu goly,
s.zk,
s.ck
from
hraci h
left join
(
select
hrac_id,
tym_id,
count(minut_celkem) pocet_zapasu,
sum(minut_celkem) pocet_minut
from
statistiky_stridani ss
join
zapasy z
on z.id = ss.zapas_id
join
souteze s
on s.id = z.soutez_id
join
souteze_nazev sn
on sn.id = s.soutez_id
where
s.rocnik_id = 2
group by
hrac_id
) ss on ss.hrac_id = h.id
left join
(
select
hrac_id,
tym_id,
sum(typ_id = 1 or typ_id = 3) as celkem_golu,
sum(typ_id = 4) as zk,
sum(typ_id = 5) as ck
from
statistiky st
join
zapasy z
on z.id = st.zapas_id
join
souteze s
on s.id = z.soutez_id
join
souteze_nazev sn
on sn.id = s.soutez_id
where
s.rocnik_id = 2
group by
hrac_id
) s on s.hrac_id = h.id
join
navigace n1
on n1.id = h.nav_id
join
tymy t
on t.id = ss.tym_id
join
navigace n2
on n2.id = t.nav_id
order by
s.celkem_golu desc
limit
10
Because query takes about 1,5 - 2 seconds. For example, table statistiky_stridani has about 500 000 rows and statistiky about 250 000 rows.
This returns EXPLAIN:
Thank you for your help
Don't use LEFT JOIN instead of JOIN unless you really need the empty rows.
Try to reformulate because JOIN ( SELECT ... ) JOIN ( SELECT ... ) optimizes poorly.
Please do not use the same alias (s) for two different tables; it confuses the reader.
Add the composite index INDEX(rocnik_id, soutez_id) to souteze.
LEFT JOIN ... JOIN ... -- Please add parentheses to show whether the JOIN should be before doing the LEFT JOIN or after:
either
FROM ...
LEFT JOIN ( ... JOIN ... )
or
FROM ( ... LEFT JOIN ... )
JOIN ...
It may make a big difference in how the Optimizer performs the query, which may change the speed.
There may be more suggestions; work through those and ask again (if it is still "too slow").
Situation
I have a database which heavily makes use of joins due to the various situations in which each entity is used. Here is a simplified diagram:
Goal
I would like to be able to get details of all modules and the "name" fields regardless of whether the "fk_chapter_id" within user_has_module is set or not.
In the case where "user_has_module.fk_chapter_id" is null, the system can return details of the module and then null chapter.
In the case where there is a user_has_module, I would like to get the status
Issue
Whenever I perform SQL statements, I get the results only partially returned. I.E. If I have 4 module records in total, two of which where the user has an entry in "user_has_module" returns the two records in full and then 2 null records for the other modules.
Update based on feedback, almost there
Now, the only problem is I get duplicates. Using some test data
SELECT DISTINCT
chapter_id,
chapter_name,
module_id,
module_name,
(null ) AS user_module_progress,
(SELECT COUNT(fk_chapter_id) FROM module_has_chapter WHERE fk_module_id = m.module_id) AS chapter_count
FROM
module as m
LEFT JOIN
module_has_chapter as mhc ON m.module_id = mhc.fk_module_id
LEFT JOIN
chapter as c ON mhc.fk_chapter_id = c.chapter_id
group by m.module_id
UNION
SELECT DISTINCT
chapter_id,
chapter_name,
module_id,
module_name,
user_module_progress,
(SELECT COUNT(fk_chapter_id) FROM module_has_chapter WHERE fk_module_id = m.module_id) AS chapter_count
FROM
module as m
LEFT JOIN
user_has_module as uhm ON m.module_id = uhm.fk_module_id
LEFT JOIN
user as u ON uhm.fk_user_id = u.user_id
LEFT JOIN
chapter as c ON uhm.fk_latest_chapter_id = c.chapter_id
WHERE u.user_id = 2
group by m.module_id;
I got there in the end but, not particularly happy about it. This works but, it's a bloody mess...Does anyone have a better solution please?
SELECT DISTINCT
(null) AS chapter_id,
(null) AS chapter_name,
module_id,
module_name,
(null ) AS user_module_progress,
(SELECT COUNT(fk_chapter_id) FROM module_has_chapter WHERE fk_module_id = m.module_id) AS chapter_count
FROM
module as m
LEFT JOIN
user_has_module as uhm ON m.module_id = uhm.fk_module_id
WHERE
uhm.fk_user_id IS NULL
UNION ALL
SELECT DISTINCT
chapter_id,
chapter_name,
module_id,
module_name,
user_module_progress,
(SELECT COUNT(fk_chapter_id) FROM module_has_chapter WHERE fk_module_id = m.module_id) AS chapter_count
FROM
module as m
LEFT JOIN
user_has_module as uhm ON m.module_id = uhm.fk_module_id
INNER JOIN
user as u ON uhm.fk_user_id = u.user_id
INNER JOIN
chapter as c ON uhm.fk_latest_chapter_id = c.chapter_id
WHERE
u.user_id = 2;
These are my tables:
Cadastros (id, nome)
Convenios (id, nome)
Especialidades (id, nome)
Facilidades (id, nome)
And the join tables:
cadastros_convenios
cadastros_especialidades
cadastros_facilidades
The table I'm querying for: Cadastros
I'm using MySQL.
The system will allow the user to select multiple "Convenios", "Especialidades" and "Facilidades". Think of each of these tables as a different type of "tag". The user will be able to select multiple "tags" of each type.
What I want is to select only the results in Cadastros table that are related with ALL the "tags" from the 3 different tables provided. Please note it's not an "OR" relation. It should only return the row from Cadastros if it has a matching link table row for EVERY "tag" provided.
Here is what I have so far:
SELECT Cadastro.*, Convenio.* FROM Cadastros AS Cadastro
INNER JOIN cadastros_convenios AS CadastrosConvenio ON(Cadastro.id = CadastrosConvenio.cadastro_id)
INNER JOIN Convenios AS Convenio ON (CadastrosConvenio.convenio_id = Convenio.id AND Convenio.id IN(2,3))
INNER JOIN cadastros_especialidades AS CadastrosEspecialidade ON (Cadastro.id = CadastrosEspecialidade.cadastro_id)
INNER JOIN Especialidades AS Especialidade ON(CadastrosEspecialidade.especialidade_id = Especialidade.id AND Especialidade.id IN(1))
INNER JOIN cadastros_facilidades AS CadastrosFacilidade ON (Cadastro.id = CadastrosFacilidade.cadastro_id)
INNER JOIN Facilidades AS Facilidade ON(CadastrosFacilidade.facilidade_id = Facilidade.id AND Facilidade.id IN(1,2))
GROUP BY Cadastro.id
HAVING COUNT(*) = 5;
I'm using the HAVING clause to try to filter the results based on the number of times it shows (meaning the number of times it has been successfully "INNER JOINED"). So in every case, the count should be equal to the number of different filters I added. So if I add 3 different "tags", the count should be 3. If I add 5 different tags, the count should be 5 and so on. It works fine for a single relation (a single pair of inner joins). When I add the other 2 relations it starts to lose control.
EDIT
Here is something that I believe is working (thanks #Tomalak for pointing out the solution with sub-queries):
SELECT Cadastro.*, Convenio.*, Especialidade.*, Facilidade.* FROM Cadastros AS Cadastro
INNER JOIN cadastros_convenios AS CadastrosConvenio ON(Cadastro.id = CadastrosConvenio.cadastro_id)
INNER JOIN Convenios AS Convenio ON (CadastrosConvenio.convenio_id = Convenio.id)
INNER JOIN cadastros_especialidades AS CadastrosEspecialidade ON (Cadastro.id = CadastrosEspecialidade.cadastro_id)
INNER JOIN Especialidades AS Especialidade ON(CadastrosEspecialidade.especialidade_id = Especialidade.id)
INNER JOIN cadastros_facilidades AS CadastrosFacilidade ON (Cadastro.id = CadastrosFacilidade.cadastro_id)
INNER JOIN Facilidades AS Facilidade ON(CadastrosFacilidade.facilidade_id = Facilidade.id)
WHERE
(SELECT COUNT(*) FROM cadastros_convenios WHERE cadastro_id = Cadastro.id AND convenio_id IN(1, 2, 3)) = 3
AND
(SELECT COUNT(*) FROM cadastros_especialidades WHERE cadastro_id = Cadastro.id AND especialidade_id IN(3)) = 1
AND
(SELECT COUNT(*) FROM cadastros_facilidades WHERE cadastro_id = Cadastro.id AND facilidade_id IN(2, 3)) = 2
GROUP BY Cadastro.id
But I'm concerned about performance. It looks like these 3 sub-queries in the WHERE clause are gonna be over-executed...
Another solution
It joins subsequent tables only if the previous joins were a success (if no rows match one of the joins, the next joins are gonna be joining an empty result-set) (thanks #DRapp for this one)
SELECT STRAIGHT_JOIN
Cadastro.*
FROM
( SELECT Qualify1.cadastro_id
from
( SELECT cc1.cadastro_id
FROM cadastros_convenios cc1
WHERE cc1.convenio_id IN (1, 2, 3)
GROUP by cc1.cadastro_id
having COUNT(*) = 3 ) Qualify1
JOIN
( SELECT ce1.cadastro_id
FROM cadastros_especialidades ce1
WHERE ce1.especialidade_id IN( 3 )
GROUP by ce1.cadastro_id
having COUNT(*) = 1 ) Qualify2
ON (Qualify1.cadastro_id = Qualify2.cadastro_id)
JOIN
( SELECT cf1.cadastro_id
FROM cadastros_facilidades cf1
WHERE cf1.facilidade_id IN (2, 3)
GROUP BY cf1.cadastro_id
having COUNT(*) = 2 ) Qualify3
ON (Qualify2.cadastro_id = Qualify3.cadastro_id) ) FullSet
JOIN Cadastros AS Cadastro
ON FullSet.cadastro_id = Cadastro.id
INNER JOIN cadastros_convenios AS CC
ON (Cadastro.id = CC.cadastro_id)
INNER JOIN Convenios AS Convenio
ON (CC.convenio_id = Convenio.id)
INNER JOIN cadastros_especialidades AS CE
ON (Cadastro.id = CE.cadastro_id)
INNER JOIN Especialidades AS Especialidade
ON (CE.especialidade_id = Especialidade.id)
INNER JOIN cadastros_facilidades AS CF
ON (Cadastro.id = CF.cadastro_id)
INNER JOIN Facilidades AS Facilidade
ON (CF.facilidade_id = Facilidade.id)
GROUP BY Cadastro.id
Emphasis mine
"It should only return the row from Cadastros if it has a matching row for EVERY "tag" provided."
"where there is a matching row"-problems are easily solved with EXISTS.
EDIT After some clarification, I see that using EXISTS is not enough. Comparing the actual row counts is necessary:
SELECT
*
FROM
Cadastros c
WHERE
(SELECT COUNT(*) FROM cadastros_facilidades WHERE cadastro_id = c.id AND id IN (2,3)) = 2
AND
(SELECT COUNT(*) FROM cadastros_especialidades WHERE cadastro_id = c.id AND id IN (1)) = 1
AND
(SELECT COUNT(*) FROM cadastros_facilidades WHERE cadastro_id = c.id AND id IN (1,2)) = 2
The indexes on the link tables should be (cadastro_id, id) for this query.
Depending on the size of the tables (records), WHERE-based subqueries, running a test on every row CAN SIGNIFICANTLY hit performance. I have restructured it which MIGHT better help, but only you would be able to confirm. The premise here is to have the first table based on getting distinct IDs that meet the criteria, join THAT set to the next qualifier criteria... joined to the FINAL set. Once that has been determined, use THAT to join to your main table and its subsequent links to get the details you are expecting. You also had an overall group by by the ID which will eliminate all other nested entries as found in the support details table.
All that said, lets take a look at this scenario. Start with the table that would be EXPECTED TO HAVE THE LOWEST RESULT SET to join to the next and next. if cadastros_convenios has IDs that match all the criteria include IDs 1-100, great, we know at MOST, we'll have 100 ids.
Now, these 100 entries are immediately JOINED to the 2nd qualifying criteria... of which, say it only matches ever other... for simplicity, we are now matched on 50 of the 100.
Finally, JOIN to the 3rd qualifier based on the 50 that qualified and you get 30 entries. So, within these 3 queries you are now filtered down to 30 entries with all the qualifying criteria handled up front. NOW, join to the Cadastros and then subsequent tables for the details based ONLY on the 30 that qualified.
Since your original query would eventually TRY EVERY "ID" for the criteria, why not pre-qualify it up front with ONE query and get just those that hit, then move on.
SELECT STRAIGHT_JOIN
Cadastro.*,
Convenio.*,
Especialidade.*,
Facilidade.*
FROM
( SELECT Qualify1.cadastro_id
from
( SELECT cc1.cadastro_id
FROM cadastros_convenios cc1
WHERE cc1.convenio_id IN (1, 2, 3)
GROUP by cc1.cadastro_id
having COUNT(*) = 3 ) Qualify1
JOIN
( SELECT ce1.cadastro_id
FROM cadastros_especialidades ce1
WHERE ce1.especialidade_id IN( 3 )
GROUP by ce1.cadastro_id
having COUNT(*) = 1 ) Qualify2
ON Qualify1.cadastro_id = Qualify2.cadastro_id
JOIN
( SELECT cf1.cadastro_id
FROM cadastros_facilidades cf1
WHERE cf1.facilidade_id IN (2, 3)
GROUP BY cf1.cadastro_id
having COUNT(*) = 2 ) Qualify3
ON Qualify2.cadastro_id = Qualify3.cadastro_id ) FullSet
JOIN Cadastros AS Cadastro
ON FullSet.Cadastro_id = Cadastro.Cadastro_id
INNER JOIN cadastros_convenios AS CC
ON Cadastro.id = CC.cadastro_id
INNER JOIN Convenios AS C
ON CC.convenio_id = C.id
INNER JOIN cadastros_especialidades AS CE
ON Cadastro.id = CE.cadastro_id
INNER JOIN Especialidades AS E
ON CE.especialidade_id = E.id
INNER JOIN cadastros_facilidades AS CF
ON Cadastro.id = CF.cadastro_id
INNER JOIN Facilidades AS F
ON CF.facilidade_id = F.id
Looking for some help with optimising the query below. Seems to be two bottlenecks at the moment which cause it to take around 90s to complete the query. There's only 5000 products so it's not exactly a massive database/table. The bottlenecks are SQL_CALC_FOUND_ROWS and the ORDER BY statement - If I remove both of these it takes around a second to run the query.
I've tried removing SQL_CALC_FOUND_ROWS and running a count() statement, but that takes a long time as well..
Is the best thing going to be to use INNER JOIN's (which I'm not too familiar with) as per the following Stackoverflow post? Slow query when using ORDER BY
SELECT SQL_CALC_FOUND_ROWS *
FROM tbl_products
LEFT JOIN tbl_link_products_categories ON lpc_p_id = p_id
LEFT JOIN tbl_link_products_brands ON lpb_p_id = p_id
LEFT JOIN tbl_link_products_authors ON lpa_p_id = p_id
LEFT JOIN tbl_link_products_narrators ON lpn_p_id = p_id
LEFT JOIN tbl_linkfiles ON lf_id = p_id
AND (
lf_table = 'tbl_products'
OR lf_table IS NULL
)
LEFT JOIN tbl_files ON lf_file_id = file_id
AND (
file_nameid = 'p_main_image_'
OR file_nameid IS NULL
)
WHERE p_live = 'y'
ORDER BY p_title_clean ASC, p_title ASC
LIMIT 0 , 10
You could try reducing the size of the joins by using a derived table to retrieve the filtered and ordered products before joining. This assumes that p_live, p_title_clean and p_title are fields in your tbl_products table -
SELECT *
FROM (SELECT *
FROM tbl_products
WHERE p_live = 'y'
ORDER BY p_title_clean ASC, p_title ASC
LIMIT 0 , 10
) AS tbl_products
LEFT JOIN tbl_link_products_categories
ON lpc_p_id = p_id
LEFT JOIN tbl_link_products_brands
ON lpb_p_id = p_id
LEFT JOIN tbl_link_products_authors
ON lpa_p_id = p_id
LEFT JOIN tbl_link_products_narrators
ON lpn_p_id = p_id
LEFT JOIN tbl_linkfiles
ON lf_id = p_id
AND (
lf_table = 'tbl_products'
OR lf_table IS NULL
)
LEFT JOIN tbl_files
ON lf_file_id = file_id
AND (
file_nameid = 'p_main_image_'
OR file_nameid IS NULL
)
This is a "stab in the dark" as there is not enough detail in your question.