Query with CONCAT inside JOIN is slow - mysql

I need to get different values from 3 different tables (table1, table2, table3) where the common value is a reference number. This number appears on all 3 tables, except on table3 where the number is divided on three different columns. I tried to make a LEFT OUTER JOIN concatenating these three columns to make the whole reference number, but the query becomes significantly slower. This is the part of the query where the issue is found:
SELECT t1.type AS type, t2.client AS client, t3.somenumber AS somenumber, t4.anothernumber AS anothernumber
FROM table1 t1 JOIN table2 t2 ON t1.somevalue = t2.somevalue
JOIN table4 t4 ON t4.reference_number = t1.reference_number --Some validation I need to make on another table
--Here's the problem. table3's values 1 through 3 make the reference number found in the other tables.
--The CONCAT makes the query significantly slow.
LEFT OUTER JOIN table3 t3 ON CONCAT(t3.value1, t3.value2, t3.value3) = t1.reference_number
WHERE t1.date BETWEEN '2022-04-01' AND '2022-05-01'
AND t1.client IN ('client1', 'client2', 'client3', 'client4', 'client5')
GROUP BY t1.reference_number --Group by the reference number
I tried making a view to create a column where the reference number is 'stored', but it still takes a lot to run the query. Is there a way to optimize this?
Running on 10.3.32-MariaDB

The GROUP BY does not need CONCAT:
GROUP BY t3.value1, t3.value2, t3.value3
I don't understand why you tacked on t1.reference_number; it is either similar info or NULL. The NULL case might lead to extra groups, by it seems like a waste. (Add it on if necessary.)
Indexes:
t1: INDEX(date)
t1: INDEX(client, date)
t2: INDEX(somevalue, client)
t3: INDEX(value1, value2, value3)
t4: INDEX(reference_number)
Was t3.value a typo for t3.value3?
Consider getting rid of t4; you are not using any values from it. The only thing it is doing is to verify that table4 has a matching row.
What version of MySQL are you using?
It may be useful to have VIRTUAL or PERSISTENT (generated) column that is CONCAT (value1, value2, value3) and index it.
(And I agree with the Commenters that the "reference number" is ambiguous.)

Related

Mysql select optimization (huge db)

I have a select request in MySQL that takes between 25-30s, which is extremely long and I was wondering if you could help me fasten it.
CREATE TEMPORARY TABLE results(
id VARCHAR(30),
secondid VARCHAR(5),
allele VARCHAR(30),
translation VARCHAR(10),
level VARCHAR(20),
subgroup VARCHAR(20),
subgroup2 VARCHAR(20)
);
INSERT INTO results(id, secondid, allele, level) SELECT DISTINCT t1.id, t1.secondid, t1.texte, t3.texte
FROM database t1
JOIN database t2 ON t1.id=t2.id
JOIN database t3 ON t1.id=t3.id AND t1.secondid=t3.secondid
WHERE (t1.qualifier,t2.qualifier) = ("allele","organism") AND t3.qualifier = "level_length" AND t3.texte NOT REGEXP "X" AND t3.texte IS NOT NULL
AND t2.texte = ? AND t1.texte REGEXP ?
GROUP BY t1.texte;
UPDATE results SET translation = (SELECT t1.qualifier
FROM database t1
JOIN database t2 ON t1.id=t2.id AND t1.secondid=t2.secondid
JOIN database t3 ON t1.id=t3.id AND t1.secondid=t3.secondid
WHERE t1.qualifier IN ("protein","ncRNA","rRNA") AND t2.texte=results.allele AND t3.texte=results.level LIMIT 1);
UPDATE results SET subgroup = (SELECT t2.subgrp
FROM alleledb.alleleSubgroups t1
JOIN alleledb.subgroups t2 ON t1.subgroup=t2.subgroup
WHERE t1.gene=SUBSTRING_INDEX(results.allele, "*", 1) AND t1.species=? LIMIT 1);
ALTER TABLE results DROP id, DROP secondid;
SELECT * FROM results ORDER BY subgroup ASC, level ASC;
DROP TABLE results;
I need to go through many dbs to get join (same id), database are huge but results to extract are quite low (less than 1% of all the database). The majority of the results are stored in the same db, in different rows (with the same id and secondid). However, id and secondid are not unique to the rows I need to select, only the combinaison of two is.
Thank you.
I would start by having a proper composite index on your database table
First on
(qualifier, id, secondid, texte)
This will help your joins, the where testing and NOT have to go back to the actual raw data tables for the records as the index has the data you are interested in.
Next, I would adjust the query/joins. Since you are specifically looking for the "allele" and "organism" from t1 and t2 respectively, make them as such.
I have no idea what you are doing with your REGEXP "X" or "?" values for texte, but you'll figure that out after.
Here is how I would revise the queries
insert into ...
SELECT DISTINCT
t1.id,
t1.secondid,
t1.texte,
t3.texte
FROM
database t1
JOIN database t2
ON t1.id = t2.id
AND t2.qualifier = 'organism'
JOIN database t3 ON
t1.id = t3.id
AND t1.secondid = t3.secondid
AND t3.qualifier = 'level_length'
WHERE
t1.qualifier = 'allele'
AND t1.texte REGEXP ?
-- I would move these t2 and t3 into the respective JOINs above directly.
AND t3.texte NOT REGEXP "X"
AND t3.texte IS NOT NULL
AND t2.texte = ?
GROUP BY
t1.texte;
As for your UPDATE commands, having a second index on (id, secondid) will help on the join to t2 and t3 since there is no qualifier context to the join.
As for your UPDATE commands, as Rick mentioned, without some context of an ORDER BY clause, you have no guarantee WHICH record is returned back by the LIMIT 1.
First of all, thank you for all your help.
My first table (The insert to and the first update, database named) looks like this :
I want all things in red. In others words, I need some parameters which has the same id and secondid as the "level" which is unique among the id. Whereas others parameters may be repeated within the same id (but different second id).
I am filtering using the allele name (ECK in EC locus) with thé REGEXP and species. For example, all allèles from EC locus of human.
Then (last update), I take one parameter (allele), substring it and go to a database that gives me one id (one row -> one id). And I use this id on annoter database that gives me one or two rows (one subgroup or two subgroups/rare). So as in my example I only has one group, the absence of ORDER BY was not seen. But yes I want to order (get the subgroup that contains the allele in first). I don't know how to do that.
Finally, I can try to make an index but due to the size of the db, I'm wondering the time and the size of such an index. Would it significally improve time and can I remove it ?
The REGEXP "X" is to remove every matches that are not relevant regarding this parameter (I don't want them).
The ? is user input (for the species/2 occurrences this one and the locus).
The operations on the first database takes 30s, last operation on the two databases lasts 1-2s. Others (drop , select) are <20ms (not the problem).

Natural join works but not with all values

I can't understand whats happening...
I use two sql queries which do not return the same thing...
this one :
SELECT * FROM table1 t1 JOIN table1 t2 on t1.attribute1 = t2.attribute1
I get 10 rows
this other :
SELECT * FROM table1 NATURAL JOIN table1
I get 8 rows
With the NATURAL JOIN 2 rows aren't returned... I look for the missing lines and they are the same values ​​for the attribute1 column ...
It's impossible for me.
If anyone has an answer I could sleep better ^^
Best regards
Max
As was pointed out in the comments, the reason you are getting a different row count is that the natural join is connecting your self join using all columns. All columns are being compared because the same table appears on both sides of the join. To test this hypothesis, just check the column values from both tables, which should all match.
The moral of the story here is to avoid natural joins. Besides not being clear as to the join condition, the logic of the join could easily change should table structure change, e.g. if a new column gets added.
Follow the link below for a small demo which tried to reproduce your current results. In a table of 8 records, the natural join returns 8 records, whereas the inner join on one attribute returns 10 records due to some duplicate matching.
Demo
You need to 'project away' the attribute you don't want used in the join e.g. in a derived table (dt):
SELECT *
FROM table1
NATURAL JOIN ( SELECT attribute1 FROM table1 ) dt;

Join from multiple tables using WHERE IN

Im having a hard time finding anything on Google related to this problem.
What im trying to do is query from multiple tables with an unknown number of values using an IN statement like so...
SELECT * FROM table_1 t1 WHERE t1.t1_id IN ('12345223', '2343374') JOIN table_2 t2 WHERE t2.t2_id IN ('2164158194', '3232422423')
The code above demonstrates what I am trying to achieve. I'm not an SQL guru so im not entirely sure if what i'm going after can be accomplished this way or if there is a much better way to do it. Any help is much appreciated.
Update your query like this:
SELECT *
FROM table_1 t1
JOIN table_2 t2
ON t1.t1_id = t2.reft1_id
WHERE t1.t1_id IN ('12345223', '2343374')
AND t2.t2_id IN ('2164158194', '3232422423')
The "ON" clause will have to contain the two columns that are linked in the two tables.
You got the order of your clauses mixed up. You should go 1.)SELECT, 2.) FROM with JOINs, 3.) WHERE
Like this:
SELECT *
FROM table_1 t1
JOIN table_2 t2
WHERE t1.t1_id IN ('12345223', '2343374')
AND t2.t2_id IN ('2164158194', '3232422423')
But your statement also seems to miss a JOIN-condition so it will either result in an error (it does in oracle) or (assuming t1_id and t2_id are primary keys) give you 4 result lines (seems it does so in mysql):
t1_id t2_id
12345223 2164158194
12345223 3232422423
2343374 2164158194
2343374 3232422423
A JOIN without condition is almost never what you really want and if so it should be explicit in the statement and use CROSS JOIN.

What is better way to join in mysql?

I wanted to join 3 or more tables
table1 - 1 thousand record
table2 - 100 thousands record
table3 - 10 millions record
Which of the following is best(speed wise performance):-
Note: pk and fk are primary and foreign key for respective tables and FILTER_CONDITION1 and FILTER_CONDITION2 are respective restricting records query normally found in where
Case 1 :taking smaller tables first and joining larger one later
Select table1.*,table2.*,table3.*
from table1
join table2
on table1.fk = table2.pk and FILTER_CONDITION1
join table3
on table2.fk = table3.pk and FILTER_CONDITION2
Case 2
Select table1.*,table2.*,table3.*
from table3
join table2
on table2.fk = table3.pk and FILTER_CONDITION2
join table1
on table1.fk = table2.pk and FILTER_CONDITION1
Case 3
Select table1.*,table2.*,table3.*
from table3
join table2
on table2.fk = table3.pk
join table1
on table1.fk = table2.pk
where FILTER_CONDITION1 and FILTER_CONDITION2
The cases you show are equivalent. What you are describing is in the end the same query and will be seen by the database as such: the database will make a query plan.
The best thing you can do is use EXPLAIN and check out what your query actually does: this way you can see they will probably be run the same, AND if there might be a bottle neck in there.
As #Nanne updated in his answer that normally mysql do it its own (right ordering) but some time (rare case) mysql can read table join in wrong order and can kill query performance in this case you can follow below approach-
If you can filter data from your bulky tables like table2 and table3 (suppose you can get only 500 records after joining these tables and applying filter) then first you filter your data and then you can join that filtered data with your small table..in this way you can get performance but there can be various combinations, so you have to check by which join you can do more filteration..yes explain will help you to know it and index will help you to get filtered data.
After above approach you can say mysql to use ordering as you have in your query by syntax "SELECT STRAIGHT_JOIN....." same as some time mysql does not use proper index and we have to use force index

Vertically Merge Multiple Tables in MySQL by Joint Primary Key

I've got 3 MySQL MyISAM tables: table1, table2 and table3.
Each table has an ID column (ID, ID2, ID3 respectively), and different data columns.
For example table1 has [ID, Name, Birthday, Status, ...] columns,
table2 has [ID2, Country, Zip, ...],
table3 has [ID3, Source, Phone, ...]
you get the idea.
The ID, ID2, ID3 columns are common to all three tables... if there's an ID value in table1 it will also appear in table2 and table3. The number of rows in these tables is identical, about 10m rows in each table.
What I'd like to do is create a new table that contains (most of) the columns of all three tables and merge them into it.
The dates, for instance, must be converted because right now they're in VARCHAR YYYYMMDD format. Reading the MySQL manual I figured STR_TO_DATE() would do the job, but I don't know how to write the query itself in the first place so I have no idea how to integrate the date conversion.
So basically, after I create the new table (which I do know how to do), how can I merge the three tables into it, integrating into the query the date conversion?
Update:
The only thing that's unclear to me is how I can convert the dates within the query.
As far as I understand the query should be something like that:
INSERT INTO [new table]
SELECT table1.ID, table1.Name, table1.Birthday, table2.Country, table3.Phone
FROM table1
INNER JOIN table2 ON table1.ID = table2.ID2
INNER JOIN table3 ON table1.ID = table3.ID3;
...but how can I convert the dates within it? Or for that matter, apply any function to a field before it's inserted? For instance how can I convert the Birthday field before inserting it using STR_TO_DATE()? Where do I put it?
STR_TO_DATE(table1.Birthday, '%Y%m%d')
[Err I figured just replace "table1.Birthday" with "STR_TO_DATE(table1.Birthday, ...)"? Is that correct?]
Looks like you want an INSERT SELECT query along the lines of:
INSERT INTO [new table]
SELECT [values]
FROM table1
INNER JOIN table2 on table1.ID = table2.ID2
INNER JOIN table3 ON table1.ID = table3.ID3;
Where you fill in [new table] as the name of the new table and [values] as the values you want in the new table.
Here are the relevant parts of the manual for more details.
INSERT...SELECT syntax - for details of the INSERT SELECT statement
JOIN syntax - for details on JOINing tables in queries