(Fast) searching in spite of row inheritance

(Fast) searching in spite of row inheritance - mysql

I currently have a table which has these columns:
id (INT)
parent_id (INT)
col0
col1
col2
As an example there are the following entries saved in this table:
1 NULL abc def NULL
2 1 test NULL NULL
3 1 NULL NULL xyz
Now I'd like to search in all rows A which haven't got any rows B which are pointing to them (B.parent_id = A.id). In addition the row values should be either the ones that are present in the current row or if there is a NULL, the values of the parent should be considered.
To illustrate my requirements I'd like to show some examples:
SEARCH(col0=test) => #2 (#1 has some children, #3.col0 = abc (inherited from #1))
SEARCH(col1=def) => #2, #3 (#1 has some children)
SEARCH(col2=xyz) => #3 (#1 has some children, #2.col2 = NULL (inherited from #1))
Does anyone know how to implement such a search in MySQL?

SELECT
# if first table has no value, use parent table
IF(t1.col0, t1.col0, t2.col0) as virtcol0,
IF(t1.col1, t1.col1, t2.col1) as virtcol1,
IF(t1.col2, t1.col2, t2.col2) as virtcol2
FROM table as t1
LEFT JOIN table as t2 ON t1.parent_id = t2.id
LEFT JOIN table as t3 ON t1.id = t3.parent_id
# t3 would be children of t1. We don't want t1 to procreate. :)
WHERE t3.id IS NULL
# Your actual search goes here:
AND virtcol0/1/2 = whatever
Fast? No. Best index use you can get out of this are the joins on id/parent_id.
If you have a lot of data and small result sets, you can query the columns directly on an index and then run checks for parents and children in separate queries. That'd be a lot faster than running the above query on a huge table.

The cleanest solution would probably be to create a view with the parent and child columns merged:
CREATE VIEW foo
AS SELECT
c.id AS id,
COALESCE(c.col0, p.col0) AS col0,
COALESCE(c.col1, p.col1) AS col1,
COALESCE(c.col2, p.col2) AS col2
FROM table AS c
LEFT JOIN table AS p ON p.id = c.parent_id
WHERE NOT EXISTS
(SELECT * FROM table AS x WHERE c.id = x.parent_id)
Then you can write queries against this view as if it were a normal table.
However, as Mantriur notes, this will not be very efficient. If the table doesn't change very often, you could use CREATE TABLE ... SELECT instead of CREATE VIEW to create an actual table containing the merged data and create some indexes on it so that it can be efficiently queried. However, such a table won't track changes to the original table the way a view does.
In principle, you could use triggers (or application logic) to update the merged table in real time as the underlying table changes, but this can easily get complicated and prone to errors. Unfortunately, while some other RDBMSes do support materialized views, which are essentially a way to do this automatically, MySQL currently does not.
(Well, not natively, anyway. There is Flexviews, although I haven't tried it myself.)

Related

Mysql select optimization (huge db)

I have a select request in MySQL that takes between 25-30s, which is extremely long and I was wondering if you could help me fasten it.
CREATE TEMPORARY TABLE results(
id VARCHAR(30),
secondid VARCHAR(5),
allele VARCHAR(30),
translation VARCHAR(10),
level VARCHAR(20),
subgroup VARCHAR(20),
subgroup2 VARCHAR(20)
);
INSERT INTO results(id, secondid, allele, level) SELECT DISTINCT t1.id, t1.secondid, t1.texte, t3.texte
FROM database t1
JOIN database t2 ON t1.id=t2.id
JOIN database t3 ON t1.id=t3.id AND t1.secondid=t3.secondid
WHERE (t1.qualifier,t2.qualifier) = ("allele","organism") AND t3.qualifier = "level_length" AND t3.texte NOT REGEXP "X" AND t3.texte IS NOT NULL
AND t2.texte = ? AND t1.texte REGEXP ?
GROUP BY t1.texte;
UPDATE results SET translation = (SELECT t1.qualifier
FROM database t1
JOIN database t2 ON t1.id=t2.id AND t1.secondid=t2.secondid
JOIN database t3 ON t1.id=t3.id AND t1.secondid=t3.secondid
WHERE t1.qualifier IN ("protein","ncRNA","rRNA") AND t2.texte=results.allele AND t3.texte=results.level LIMIT 1);
UPDATE results SET subgroup = (SELECT t2.subgrp
FROM alleledb.alleleSubgroups t1
JOIN alleledb.subgroups t2 ON t1.subgroup=t2.subgroup
WHERE t1.gene=SUBSTRING_INDEX(results.allele, "*", 1) AND t1.species=? LIMIT 1);
ALTER TABLE results DROP id, DROP secondid;
SELECT * FROM results ORDER BY subgroup ASC, level ASC;
DROP TABLE results;
I need to go through many dbs to get join (same id), database are huge but results to extract are quite low (less than 1% of all the database). The majority of the results are stored in the same db, in different rows (with the same id and secondid). However, id and secondid are not unique to the rows I need to select, only the combinaison of two is.
Thank you.

I would start by having a proper composite index on your database table
First on
(qualifier, id, secondid, texte)
This will help your joins, the where testing and NOT have to go back to the actual raw data tables for the records as the index has the data you are interested in.
Next, I would adjust the query/joins. Since you are specifically looking for the "allele" and "organism" from t1 and t2 respectively, make them as such.
I have no idea what you are doing with your REGEXP "X" or "?" values for texte, but you'll figure that out after.
Here is how I would revise the queries
insert into ...
SELECT DISTINCT
t1.id,
t1.secondid,
t1.texte,
t3.texte
FROM
database t1
JOIN database t2
ON t1.id = t2.id
AND t2.qualifier = 'organism'
JOIN database t3 ON
t1.id = t3.id
AND t1.secondid = t3.secondid
AND t3.qualifier = 'level_length'
WHERE
t1.qualifier = 'allele'
AND t1.texte REGEXP ?
-- I would move these t2 and t3 into the respective JOINs above directly.
AND t3.texte NOT REGEXP "X"
AND t3.texte IS NOT NULL
AND t2.texte = ?
GROUP BY
t1.texte;
As for your UPDATE commands, having a second index on (id, secondid) will help on the join to t2 and t3 since there is no qualifier context to the join.
As for your UPDATE commands, as Rick mentioned, without some context of an ORDER BY clause, you have no guarantee WHICH record is returned back by the LIMIT 1.

First of all, thank you for all your help.
My first table (The insert to and the first update, database named) looks like this :
I want all things in red. In others words, I need some parameters which has the same id and secondid as the "level" which is unique among the id. Whereas others parameters may be repeated within the same id (but different second id).
I am filtering using the allele name (ECK in EC locus) with thé REGEXP and species. For example, all allèles from EC locus of human.
Then (last update), I take one parameter (allele), substring it and go to a database that gives me one id (one row -> one id). And I use this id on annoter database that gives me one or two rows (one subgroup or two subgroups/rare). So as in my example I only has one group, the absence of ORDER BY was not seen. But yes I want to order (get the subgroup that contains the allele in first). I don't know how to do that.
Finally, I can try to make an index but due to the size of the db, I'm wondering the time and the size of such an index. Would it significally improve time and can I remove it ?
The REGEXP "X" is to remove every matches that are not relevant regarding this parameter (I don't want them).
The ? is user input (for the species/2 occurrences this one and the locus).
The operations on the first database takes 30s, last operation on the two databases lasts 1-2s. Others (drop , select) are <20ms (not the problem).

Alternative to NOT IN?

I need to check two tables and find inconsistencies, ie where the value of table T1 is not present in the italy_cities table. I'll explain:
T1: Includes personal data (with place of birth)
italy_city: Includes all the municipalities of Italy.
Table T1 has about 9000 tuples.
T2 has 7,903 tuples.
Using "NOT IN" the query takes approximately 16 seconds to execute.
Here is the query:
SELECT
`T1`.*
FROM
T1
WHERE
(
`T1`.place NOT IN ( SELECT municipality FROM italy_cities )
)
MY QUESTION
what is the best and fast option to check for inconsistencies? to check all the "incorrect" municipalities that do not exist in the official database?
Thanks in advance

I generally recommend NOT EXISTS for this purpose:
SELECT T1.*
FROM T1
WHERE NOT EXISTS (SELECT 1
FROM italy_cities ic
WHERE t1.place = ic.municipality
);
Why? There are two reasons:
NOT IN does not do what you expect if the subquery returns any NULL values. If even one value is NULL all rows end up being filtered out.
This version of the query can take advantage of an index on italy_cities(municipality) which seems like a reasonable index on the table.

Not exists can perform better but there is also another way which is left join as follows:
SELECT T1.*
FROM T1
LEFT JOIN italy_cities I ON I.municipality = T1.PLACE
WHERE I.municipality IS NULL;

MySQL INNER JOIN - how to add extra IF statement?

I just learned how to use JOINS (so please be gentle ;) ), and wrote this query:
SELECT 1
FROM `T1`
INNER JOIN `T2` ON `T1`.`T1_ID` = `T2`.`REQUIREMENT`
WHERE `T2`.`T1_ID` = XXX
AND `T1`.`STATEMENT` = YYY
AND `T2`.`REQUIREMENT` != 0 //last row does not work as intended!
It works perfectly without the last condition: if T1_ID from T1 does match REQUIREMENT from T2 for given T1_ID (XXX) - it does work. I wanted additional STATEMENT from T1 to be match (YYY) - and it still does work.
But then I realized, that I need to exclude one cause: when T2.REQUIREMENT is equal to 0, I want this query to return 1 regardless of the JOIN formula. The problem is, that if T2.REQUIREMENT = 0, I know for sure that there will not be any T1.T1_ID entry that will match the JOIN requirements. So I understand, that this last condition has no right to work like I'd wish it was.
What I need is some kind of IF statement. Something that would work like:
SELECT 1
IF (`T2`.`REQUIREMENT`!=0) //if true, don't even go to join, and return 1
OR (my previous join query)
The thing is, that I have no idea how to implement such IF statement into mysql.
Any ideas? Thanks.
Sample data:
T1:
id STATEMENT T1_ID
1 irrelevant 1
2 irrelevant 5
T2:
id T1_ID REQUIREMENT
1 1 0
2 2 0
3 3 1
4 4 3
5 5 4
6 6 5
7 7 6
Such setup should return 1 for T1_ID equal 1, 2, 3, 6.
In addition, if it's even possible in single query, I'd like it to return 1 as well even if T1 was empty, for all T2.REQUIREMENT=0 - in this case T1_ID equal 1, 2.

Just FYI, good start on your post and example... tableName.columnName references (or alias.columnName) should always be provided to prevent ambiguity that others don't know your table structure. Also, you only really need the tick marks for things like reserved words or column names that have spaces (never like these anyhow).
From my reading your question and sample tables T1 and T2... T1 appears to be some Lookup table and has IDs and descriptions associated to said IDs. Your T2 table appears to be your detail/transaction based table and it may or not have an actual requirement hence your desire to always include those records without a specific requirement.
If this is the case, it sounds like you want "all detail records that have some condition REGARDLESS of a matched requirement ID as found in the lookup table." If this is accurate, you would be looking for
Select
T2.T1_ID,
coalesce( T1.Statement, '' ) StatementFromT1Table
from
SomeMainTable T2
LEFT JOIN SomeLookupTable T1
on T2.T1_ID = T1.T1_ID
where
T2.T1_ID = SomeIdParameterValue
AND ( T1.T1_ID is NULL
OR T1.Statement = SomeOtherParameterValue )
The join between tables I always try to list the left-side table first, then indent to the right-side table and have my ON condition show the left.column = right.column so you always see the relationship and how table A gets to table B (and nested more as other joins come into play).
The different between (INNER) JOIN and LEFT JOIN is that (INNER) JOIN REQUIRES a record to always match on both tables. LEFT JOIN means I want everything from the table on the left side REGARDLESS of an actual match in the right table.
So at this point, I get all from T2 alias table first regardless of the answer in T1 alias. Now, how to deal with the zero remarks id value. If zero indicates you KNOW there wont be a match in T1, then you can just say I want all records on the T2 side if the T1 side IS NULL... But you also care for a specific statement, hence the OR within the parenthesis test.
The first part of the where is if you were specifically looking for all things of a T1_ID = some value, so that is a primary parameter and only applicable to the T2 side... you are qualifying the T1 side via the AND ( null or other equality test ) condition.
If you have some confidential data, it is ok to randomize / provide sample data, but having real table name reference / context will help us better understand what you are trying to accomplish. Ambiguity in table names and columns does not help us mentally understand and might have better query solutions having a better understanding.
If I am close and you need additional clarification, please advise and/or edit your original post with additional sample data and final output... such as parameters being filtered for too.
POST CLARIFICATION.
Per your comments, here are the clarifications...
The "SomeMainTable T2" is actually a breakdown of the actual table name within your database and "T2" is the ALIAS reference. Imagine your table name is "SomethingReallyLongDetail". Would you prefer to write your query something like
select
SomethingReallyLongDetail.Column1,
SomethingReallyLongDetail.Column7,
SomethingReallyLongDetail.Column20
from
SomethingReallyLongDetail
OR...
select
SRL.Column1,
SRL.Column7,
SRL.Column20
from
SomethingReallyLongDetail SRL
In this case, I used an alias "SRL" to more easily correlate to the table name as an acronym / abbreviation vs having to type the long value over and over, then have more chance of typing mistakes. Simply for readability providing the "alias" reference within the query. So, I did not know your ACTUAL table name, so I made it up but using the "T1" and "T2" references to stay in-line with your original post.
Next, COALESCE(). Since this query does a LEFT-JOIN, The right-side table may (or not) actually have a record match on the ID as you know might not always exist. Since I was trying to pull the "Statement" column from that second table (alias T1), that description could be NULL which you probably would not want to show in any sort of output. To prevent that, COALESCE() says, give me the value from the first parameter in the list... If that value is null, give me the second value. In this case the second value is just an empty string.
Parameters in the query. Your original query had reference to XXX and YYY such as you knew of a specific T1.ID value you wanted to narrow down to pulling out, but a different value YYY as being part of the statement. So the place where you had an "XXX", I just put a place-holder here for you to apply/put any value you were specifically looking for. Similarly for your "YYY" value, another place-holder for that. Just substitute whatever criteria you were looking for.
Finally that AND part of the where clause. This is for the condition of the LEFT-JOIN. Since you KNOW that not all records will have a match in the "T1" secondary table, with the LEFT JOIN, the ID will be found and HAVE a value, or there will not be a value and thus NULL.
If there is no matching record, you would never be able to compare some string, int, date, whatever to a column as it would be null. So I am doing
(T1.T1_ID IS NULL -- as in there was no match
OR T1.Statement = SomeOtherParameterValue ) -- there WAS a match, and I only want where the statement equals a given value.
Per your comments and example results, your query SHOULD be simplified to...
Select
T2.ID,
T2.T1_ID,
T2.Requirement,
coalesce( T1.Statement, '' ) StatementFromT1Table
from
T2
LEFT JOIN T1
on T2.Requirement = T1.T1_ID
where
T2.Requirement = 0
OR T1.T1_ID IS NOT NULL
In your case, the final answer is... I want all records where there is no requirement (thus = 0) OR the record DOES have a match in the T1 table (thus T1.T1_ID IS NOT NULL)

I am thinking that you want a LEFT JOIN:
SELECT 1
FROM `T2` LEFT JOIN
`T1`
ON `T1`.`T1_ID` = `T2`.`REQUIREMENT`
WHERE (`T2`.`REQUIREMENT` <> 0) OR
(`T1`.`STATEMENT` = YYY AND `T2`.`T1_ID` = XXX);
This returns rows for all T2 values where REQUIREMENT != 0. It also returns the rows generated by the JOIN. Of course 1 is not very descriptive, so you want be able to tell which rows are which.
Your question would be much easier to follow with sample data and desired results.

if T2.REQUIREMENT=0 I know for sure, that there will not be any
T1.T1_ID entry that will match the JOIN requirements
So in order to get 1 returned when T2.REQUIREMENT=0, the join must match this condition too:
SELECT 1
FROM `T1` INNER JOIN `T2`
ON `T1`.`T1_ID` = `T2`.`REQUIREMENT` OR `T2`.`REQUIREMENT`=0
WHERE `T2`.`T1_ID`=XXX
AND `T1`.`STATEMENT`=YYY
Edit:
or just append 1s with UNION for all rows that have T2.REQUIREMENT=0:
SELECT 1
FROM `T1` INNER JOIN `T2`
ON `T1`.`T1_ID` = `T2`.`REQUIREMENT`
WHERE `T2`.`T1_ID`=XXX
AND `T1`.`STATEMENT`=YYY
UNION ALL
SELECT 1
FROM `T2`
WHERE `T2.REQUIREMENT=0`
this will work even if T1 is empty.

Mysql view multiple join from other table, without several lookups

I'm trying to learn sql better, views more specifically but I can't get the following to work out for me.
I've put a slimmed down version of it here. There's more joins I have to do based on foreign keys from the tbl2 matches.
Since it's a view, I can't create temp tables.
I can't rely on stored procedures in this case.
I could do outer apply, but only to get specific references (row 1, 2...) and that would be by doing a Select * from Table2 where.... and that would mean 1 index scan per time I use it.
I could create the view using "With tbl2 (FK_TABLE1...) as SELECT FK_TABLE1 from dbo.TABLE2) but that doesn't seem to be helpful. Each reference to it does a sort or a scan so no gain there.
Is there some way I'm able to create some type of list that I can reuse so I can simply just run 1 index scan to get the matching ones from Table2?
Or is there another way to think about this?
Table1 (PK, XX, YY)
Table2 (PK, FK_TABLE1, Type, Progress, ZZ, FK_Status)
Create View MyView
as
Select
Table1.PK
,Table1.XX
,Table1.YY
---- I want to present data from the first 3 matches
,(SELECT ZZ from tbl2 where tbl2.FK_TABLE1 = FK_TABLE1.PK ORDER BY Type ASC OFFSET(0) ROWS FETCH NEXT (1) ROWS ONLY) ZZ1
,(SELECT ZZ from tbl2 where tbl2.FK_TABLE1 = FK_TABLE1.PK ORDER BY Type ASC OFFSET(1) ROWS FETCH NEXT (1) ROWS ONLY) ZZ2
,(SELECT ZZ from tbl2 where tbl2.FK_TABLE1 = FK_TABLE1.PK ORDER BY Type ASC OFFSET(2) ROWS FETCH NEXT (1) ROWS ONLY) ZZ3
,sts.StatusName CurrentStatus
From Table1
LEFT OUTER JOIN Table2 AS tbl2 ON (tbl2.FK_TABLE1= Table1.PK) ---- Here I want to make some sort of join so I get all matching rows from the other table
LEFT OUTER JOIN STATUS AS sts ON (sts.PK = [tbl2 ordered by type, if last elements status = X take that, else status of first).FK_STATUS) ---- Here I'm a bit puzzled, since I have to order by, but also have a fallback value if last element isn't matching.

MySQL Insert where not exists query won't insert if even one row exists

I am having a pretty complicated query. What I have is a table with car parts (parts_list, about 600 rows) which contains some information about the part like it's name and if it's motor dependent or not. For motor dependency I am having two different queries, so this is the one with no motor dependency (0, I save it as a boolean). Most of the parts can be disassembled and broken into more parts, that's why I am saving the parts as a tree and in that query I take only the parts that can't be disassembled (tree leaves). This table represents only the list of possible parts. Now for each car model I save a row in another table (parts) and I stick the parts_list_id and model_id together and then the price and quantity. Now if I run the query it will successfully generate about 500 rows(taking only the leaf parts) in the table "parts" and it will do what I need. About 500 (leaf) parts for the model id. But sometimes I generate a row for another model for a specific part. And then the query doesn't make the rest 499 rows. It only works if WHERE NOT EXISTS select query gives back 0. If even one row exists it doesn't insert the rest. But it doesn't make sense to me, because shouldn't it check with different values like a loop?
INSERT INTO parts (parts_list_id, model_id, motor_id)
SELECT orig1.id, '" . $this->model_id . "', '0'
FROM parts_list AS orig1
LEFT JOIN parts_list AS orig2 ON ( orig1.id = orig2.parent_id )
WHERE orig2.id IS NULL
AND orig1.motor_dependent = '0'
AND NOT EXISTS (
SELECT t1.id
FROM parts_list AS t1
LEFT JOIN parts_list AS t2 ON ( t1.id = t2.parent_id )
LEFT JOIN parts ON ( parts.parts_list_id = t1.id )
WHERE t2.id IS NULL
AND t1.motor_dependent = '0'
AND parts.parts_list_id = t1.id
AND parts.model_id = :model_id
)

Well, the sql statement seems fine. If there is one row, NOT EXISTS returns false. And that is correct. No one row is inserted. Perhaps you want to put some different checks using parts_list_id NOT IN (your subquery) instead of NOT EXISTS.
NOT EXISTS express a condition for the set as a whole, NOT IN is used to determine if the set has the right items.
I hope to get it right. It is a little bit difficult to understand your domain just from a single statement.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008