I have a select request in MySQL that takes between 25-30s, which is extremely long and I was wondering if you could help me fasten it.
CREATE TEMPORARY TABLE results(
id VARCHAR(30),
secondid VARCHAR(5),
allele VARCHAR(30),
translation VARCHAR(10),
level VARCHAR(20),
subgroup VARCHAR(20),
subgroup2 VARCHAR(20)
);
INSERT INTO results(id, secondid, allele, level) SELECT DISTINCT t1.id, t1.secondid, t1.texte, t3.texte
FROM database t1
JOIN database t2 ON t1.id=t2.id
JOIN database t3 ON t1.id=t3.id AND t1.secondid=t3.secondid
WHERE (t1.qualifier,t2.qualifier) = ("allele","organism") AND t3.qualifier = "level_length" AND t3.texte NOT REGEXP "X" AND t3.texte IS NOT NULL
AND t2.texte = ? AND t1.texte REGEXP ?
GROUP BY t1.texte;
UPDATE results SET translation = (SELECT t1.qualifier
FROM database t1
JOIN database t2 ON t1.id=t2.id AND t1.secondid=t2.secondid
JOIN database t3 ON t1.id=t3.id AND t1.secondid=t3.secondid
WHERE t1.qualifier IN ("protein","ncRNA","rRNA") AND t2.texte=results.allele AND t3.texte=results.level LIMIT 1);
UPDATE results SET subgroup = (SELECT t2.subgrp
FROM alleledb.alleleSubgroups t1
JOIN alleledb.subgroups t2 ON t1.subgroup=t2.subgroup
WHERE t1.gene=SUBSTRING_INDEX(results.allele, "*", 1) AND t1.species=? LIMIT 1);
ALTER TABLE results DROP id, DROP secondid;
SELECT * FROM results ORDER BY subgroup ASC, level ASC;
DROP TABLE results;
I need to go through many dbs to get join (same id), database are huge but results to extract are quite low (less than 1% of all the database). The majority of the results are stored in the same db, in different rows (with the same id and secondid). However, id and secondid are not unique to the rows I need to select, only the combinaison of two is.
Thank you.
I would start by having a proper composite index on your database table
First on
(qualifier, id, secondid, texte)
This will help your joins, the where testing and NOT have to go back to the actual raw data tables for the records as the index has the data you are interested in.
Next, I would adjust the query/joins. Since you are specifically looking for the "allele" and "organism" from t1 and t2 respectively, make them as such.
I have no idea what you are doing with your REGEXP "X" or "?" values for texte, but you'll figure that out after.
Here is how I would revise the queries
insert into ...
SELECT DISTINCT
t1.id,
t1.secondid,
t1.texte,
t3.texte
FROM
database t1
JOIN database t2
ON t1.id = t2.id
AND t2.qualifier = 'organism'
JOIN database t3 ON
t1.id = t3.id
AND t1.secondid = t3.secondid
AND t3.qualifier = 'level_length'
WHERE
t1.qualifier = 'allele'
AND t1.texte REGEXP ?
-- I would move these t2 and t3 into the respective JOINs above directly.
AND t3.texte NOT REGEXP "X"
AND t3.texte IS NOT NULL
AND t2.texte = ?
GROUP BY
t1.texte;
As for your UPDATE commands, having a second index on (id, secondid) will help on the join to t2 and t3 since there is no qualifier context to the join.
As for your UPDATE commands, as Rick mentioned, without some context of an ORDER BY clause, you have no guarantee WHICH record is returned back by the LIMIT 1.
First of all, thank you for all your help.
My first table (The insert to and the first update, database named) looks like this :
I want all things in red. In others words, I need some parameters which has the same id and secondid as the "level" which is unique among the id. Whereas others parameters may be repeated within the same id (but different second id).
I am filtering using the allele name (ECK in EC locus) with thé REGEXP and species. For example, all allèles from EC locus of human.
Then (last update), I take one parameter (allele), substring it and go to a database that gives me one id (one row -> one id). And I use this id on annoter database that gives me one or two rows (one subgroup or two subgroups/rare). So as in my example I only has one group, the absence of ORDER BY was not seen. But yes I want to order (get the subgroup that contains the allele in first). I don't know how to do that.
Finally, I can try to make an index but due to the size of the db, I'm wondering the time and the size of such an index. Would it significally improve time and can I remove it ?
The REGEXP "X" is to remove every matches that are not relevant regarding this parameter (I don't want them).
The ? is user input (for the species/2 occurrences this one and the locus).
The operations on the first database takes 30s, last operation on the two databases lasts 1-2s. Others (drop , select) are <20ms (not the problem).
Related
I need to check two tables and find inconsistencies, ie where the value of table T1 is not present in the italy_cities table. I'll explain:
T1: Includes personal data (with place of birth)
italy_city: Includes all the municipalities of Italy.
Table T1 has about 9000 tuples.
T2 has 7,903 tuples.
Using "NOT IN" the query takes approximately 16 seconds to execute.
Here is the query:
SELECT
`T1`.*
FROM
T1
WHERE
(
`T1`.place NOT IN ( SELECT municipality FROM italy_cities )
)
MY QUESTION
what is the best and fast option to check for inconsistencies? to check all the "incorrect" municipalities that do not exist in the official database?
Thanks in advance
I generally recommend NOT EXISTS for this purpose:
SELECT T1.*
FROM T1
WHERE NOT EXISTS (SELECT 1
FROM italy_cities ic
WHERE t1.place = ic.municipality
);
Why? There are two reasons:
NOT IN does not do what you expect if the subquery returns any NULL values. If even one value is NULL all rows end up being filtered out.
This version of the query can take advantage of an index on italy_cities(municipality) which seems like a reasonable index on the table.
Not exists can perform better but there is also another way which is left join as follows:
SELECT T1.*
FROM T1
LEFT JOIN italy_cities I ON I.municipality = T1.PLACE
WHERE I.municipality IS NULL;
I'm trying to learn sql better, views more specifically but I can't get the following to work out for me.
I've put a slimmed down version of it here. There's more joins I have to do based on foreign keys from the tbl2 matches.
Since it's a view, I can't create temp tables.
I can't rely on stored procedures in this case.
I could do outer apply, but only to get specific references (row 1, 2...) and that would be by doing a Select * from Table2 where.... and that would mean 1 index scan per time I use it.
I could create the view using "With tbl2 (FK_TABLE1...) as SELECT FK_TABLE1 from dbo.TABLE2) but that doesn't seem to be helpful. Each reference to it does a sort or a scan so no gain there.
Is there some way I'm able to create some type of list that I can reuse so I can simply just run 1 index scan to get the matching ones from Table2?
Or is there another way to think about this?
Table1 (PK, XX, YY)
Table2 (PK, FK_TABLE1, Type, Progress, ZZ, FK_Status)
Create View MyView
as
Select
Table1.PK
,Table1.XX
,Table1.YY
---- I want to present data from the first 3 matches
,(SELECT ZZ from tbl2 where tbl2.FK_TABLE1 = FK_TABLE1.PK ORDER BY Type ASC OFFSET(0) ROWS FETCH NEXT (1) ROWS ONLY) ZZ1
,(SELECT ZZ from tbl2 where tbl2.FK_TABLE1 = FK_TABLE1.PK ORDER BY Type ASC OFFSET(1) ROWS FETCH NEXT (1) ROWS ONLY) ZZ2
,(SELECT ZZ from tbl2 where tbl2.FK_TABLE1 = FK_TABLE1.PK ORDER BY Type ASC OFFSET(2) ROWS FETCH NEXT (1) ROWS ONLY) ZZ3
,sts.StatusName CurrentStatus
From Table1
LEFT OUTER JOIN Table2 AS tbl2 ON (tbl2.FK_TABLE1= Table1.PK) ---- Here I want to make some sort of join so I get all matching rows from the other table
LEFT OUTER JOIN STATUS AS sts ON (sts.PK = [tbl2 ordered by type, if last elements status = X take that, else status of first).FK_STATUS) ---- Here I'm a bit puzzled, since I have to order by, but also have a fallback value if last element isn't matching.
I am relativly new to the SQL language. I can do a basic select, but for performance increase, I'd love to know if it is possible to merge the two queries I am doing at the moment into one.
Scenario: There are two tables. Table one has a few columns, one of them is a VARCHAR(45) named 'user', and another one is a INT which is called 'gid'. In the second table, there is a primary key column called 'gid' (INT) and a column called 'permissions' which is a TEXT column and it contains values seperated by ';'.
I have a user name, and want the text in the permissions column. The current way I do it is by fetching the gid of the first table, then doing a second query with the gid to get the permissions.
I've heard there are other ways to do this, and I have searched on Google, but I'm not sure what I should do.
EDIT:
Like this:
select t2.permissions
from table1 t1, table2 t2
where t1.user = '<SPECIFIED NAME>'
and t1.gid = t2.gid;
or you could use INNER JOIN syntax:
select t2.permissions
from table1 t1
inner join table2 t2 on t1.gid = t2.gid
where t1.user = '<SPECIFIED VALUE>'
To do this you use a JOIN. A join connects two tables in a select statement.
Like this
select *
from usertable u
join permissiontable p on u.gid = p.gid
This will give you all the columns from both tables with the id column joined. You can treat the joined table just like any table (eg select a sub-set of columns in the select list, add a where clause, etc).
You can read more about joins in any intro sql book or doing a google search.
I am having a pretty complicated query. What I have is a table with car parts (parts_list, about 600 rows) which contains some information about the part like it's name and if it's motor dependent or not. For motor dependency I am having two different queries, so this is the one with no motor dependency (0, I save it as a boolean). Most of the parts can be disassembled and broken into more parts, that's why I am saving the parts as a tree and in that query I take only the parts that can't be disassembled (tree leaves). This table represents only the list of possible parts. Now for each car model I save a row in another table (parts) and I stick the parts_list_id and model_id together and then the price and quantity. Now if I run the query it will successfully generate about 500 rows(taking only the leaf parts) in the table "parts" and it will do what I need. About 500 (leaf) parts for the model id. But sometimes I generate a row for another model for a specific part. And then the query doesn't make the rest 499 rows. It only works if WHERE NOT EXISTS select query gives back 0. If even one row exists it doesn't insert the rest. But it doesn't make sense to me, because shouldn't it check with different values like a loop?
INSERT INTO parts (parts_list_id, model_id, motor_id)
SELECT orig1.id, '" . $this->model_id . "', '0'
FROM parts_list AS orig1
LEFT JOIN parts_list AS orig2 ON ( orig1.id = orig2.parent_id )
WHERE orig2.id IS NULL
AND orig1.motor_dependent = '0'
AND NOT EXISTS (
SELECT t1.id
FROM parts_list AS t1
LEFT JOIN parts_list AS t2 ON ( t1.id = t2.parent_id )
LEFT JOIN parts ON ( parts.parts_list_id = t1.id )
WHERE t2.id IS NULL
AND t1.motor_dependent = '0'
AND parts.parts_list_id = t1.id
AND parts.model_id = :model_id
)
Well, the sql statement seems fine. If there is one row, NOT EXISTS returns false. And that is correct. No one row is inserted. Perhaps you want to put some different checks using parts_list_id NOT IN (your subquery) instead of NOT EXISTS.
NOT EXISTS express a condition for the set as a whole, NOT IN is used to determine if the set has the right items.
I hope to get it right. It is a little bit difficult to understand your domain just from a single statement.
I currently have a table which has these columns:
id (INT)
parent_id (INT)
col0
col1
col2
As an example there are the following entries saved in this table:
1 NULL abc def NULL
2 1 test NULL NULL
3 1 NULL NULL xyz
Now I'd like to search in all rows A which haven't got any rows B which are pointing to them (B.parent_id = A.id). In addition the row values should be either the ones that are present in the current row or if there is a NULL, the values of the parent should be considered.
To illustrate my requirements I'd like to show some examples:
SEARCH(col0=test) => #2 (#1 has some children, #3.col0 = abc (inherited from #1))
SEARCH(col1=def) => #2, #3 (#1 has some children)
SEARCH(col2=xyz) => #3 (#1 has some children, #2.col2 = NULL (inherited from #1))
Does anyone know how to implement such a search in MySQL?
SELECT
# if first table has no value, use parent table
IF(t1.col0, t1.col0, t2.col0) as virtcol0,
IF(t1.col1, t1.col1, t2.col1) as virtcol1,
IF(t1.col2, t1.col2, t2.col2) as virtcol2
FROM table as t1
LEFT JOIN table as t2 ON t1.parent_id = t2.id
LEFT JOIN table as t3 ON t1.id = t3.parent_id
# t3 would be children of t1. We don't want t1 to procreate. :)
WHERE t3.id IS NULL
# Your actual search goes here:
AND virtcol0/1/2 = whatever
Fast? No. Best index use you can get out of this are the joins on id/parent_id.
If you have a lot of data and small result sets, you can query the columns directly on an index and then run checks for parents and children in separate queries. That'd be a lot faster than running the above query on a huge table.
The cleanest solution would probably be to create a view with the parent and child columns merged:
CREATE VIEW foo
AS SELECT
c.id AS id,
COALESCE(c.col0, p.col0) AS col0,
COALESCE(c.col1, p.col1) AS col1,
COALESCE(c.col2, p.col2) AS col2
FROM table AS c
LEFT JOIN table AS p ON p.id = c.parent_id
WHERE NOT EXISTS
(SELECT * FROM table AS x WHERE c.id = x.parent_id)
Then you can write queries against this view as if it were a normal table.
However, as Mantriur notes, this will not be very efficient. If the table doesn't change very often, you could use CREATE TABLE ... SELECT instead of CREATE VIEW to create an actual table containing the merged data and create some indexes on it so that it can be efficiently queried. However, such a table won't track changes to the original table the way a view does.
In principle, you could use triggers (or application logic) to update the merged table in real time as the underlying table changes, but this can easily get complicated and prone to errors. Unfortunately, while some other RDBMSes do support materialized views, which are essentially a way to do this automatically, MySQL currently does not.
(Well, not natively, anyway. There is Flexviews, although I haven't tried it myself.)