Will a key in sql still stay a key in a view - mysql

Let's say I have a mysql table called FISH with fields A, B and C.
I run SELECT * FROM FISH. This gets me a view with all fields. So, if A was a key in the original table, is it also a key in the view? Meaning, if I have a table FISH2, and I ran
SELECT * FROM (SELECT * FROM FISH) D, (SELECT * FROM FISH2) E WHERE D.A = E.A
Will the relevant fields still be keys?
Now, let's take this 1 step further. If I run
SELECT * FROM (SELECT CONCAT(A,B) AS DUCK, C FROM FISH) D, (SELECT CONCAT(A,B) AS DUCK2, C FROM FISH2) E WHERE D.DUCK = E.DUCK2
If A and B were keys in the original tables, will their concatenation also be a key?
Thanks :)

If A is a key in fish, any projection on fish only, will produce a resultset where A is still unique.
A join between table fish and any table with 1:1 relation (such as fish_type) will produce a result set where A is unique.
A join with another table that has 1:M or M:M relation from fish (such as fish_beits) will NOT produce a result where A is unique, unless you provide a filter predicate on the "other" side (such as bait='Dynamite').
SELECT * FROM (SELECT * FROM FISH) D, (SELECT * FROM FISH2) E WHERE D.A = E.A
...is logically equivalent to the following statement, and most databases (including MySQL) will perform the transformatiion:
select *
from fish
join fish2 on(fish.a = fish2.a)
Whether A is still unique in the resultset depends on the key of fish2 and their relation (see above).
Concatenation does not preserve uniqueness. Consider the following case:
concat("10", "10") => "1010"
concat("101", "0") => "1010"
Therefore, your final query...
SELECT *
FROM (SELECT CONCAT(A,B) AS DUCK, C FROM FISH) D
,(SELECT CONCAT(A,B) AS DUCK2, C FROM FISH2) E
WHERE D.DUCK = E.DUCK2
...won't (necessarily) produce the same result as
select *
from fish
join fish2 on(
fish.a = fish2.a
and fish.b = fish2.b
)
I wrote necessarily because the collisions depend on the actual values. I hunted down a bug about some time ago where the root cause was exactly this. The code had worked for several years before the bug manifested itself.

If by "key" you mean "unique", yes, tuples of a cartesian product over unique values will be unique.
(One can prove it via by reductio ad absurdum.)

For step 1, think of a view as a subquery containing everything in the AS clause when CREATE VIEW was executed.
For example, if view v is created as SELECT a, b, c FROM t, then when you execute...
SELECT * FROM v WHERE a = some_value
...it's conceptually treated as...
SELECT * FROM (SELECT a, b, c FROM t) WHERE a = some_value
Any database with a decent optimizer will notice that column a is passed straight into the results and that that it can take advantage of the indexing in t (if there is any) by moving it into the subquery:
SELECT * FROM (SELECT a, b, c FROM t WHERE a = some_value)
This all happens behind the scenes and is not an optimization you need to do yourself. Obviously, it can't do that for every condition in the WHERE clause, but understanding where you can is part of the art of writing a good optimizer.
For step 2, the concatenated keys will be part of intermediate results, and whether or not the database decides they need indexing is an implementation detail. Also note fche's comment about duplication.
If your database has a query plan explainer, running it and learning to interpret the results will give you a lot of insight about what makes your queries run fast and what slows them down.

Related

How to structure a database schema to allow for the "1 in a million" case?

Among all the tables in my database, I have two which currently have a Many-to-Many join. However, the actual data population being captured nearly always has a One-to-Many association.
Considering that I want database look-ups (doctrine queries) to be as unencumbered as possible, should I instead:
Create two associations between the tables (where the second is only
populated in these exceptional cases)?
Change the datatype for the association (eg to a text/tinyblob) to record a mini array of the 2 (or technically even 3) associated records?
This is what I currently have (although TableB-> JoinTable is usually just one-to-one):
TableA.id --< a_id.JoinTable.b_id >-- TableB.id
So, I am looking to see if I can capture the 'exceptions'. Is the below the correct way to go about it?
TableA.id TableB.id
+----< TableB.A_id1
+----- TableB.A_id2
+----- TableB.A_id3
You seem to be interested in:
-- a and b are related by the association of interest
Foo(a, b)
-- foo(a, b) but not foo(a2, b) for some a2 <> a
Boring(a, b)
unique(b)
FK (a, b) references Foo
-- foo(a, b) and foo(a2, b) for some a2 <> a
Rare(a, b)
FK (a, b) references foo
If you want queries to be unencumbered, just define Foo. You can query it for Rare.
Rare = select * from Foo f join Foo f2
where f.a <> f2.a and f.b = f2.b
Any other design suffers from update complexity in keeping the database consistent.
You have some fuzzy concern about Rare being much smaller than Foo. But what is your requirement re only n in a million Foo records being many:many by which you would choose some other design?
The next level of complexity is to have Foo and Rare. Updates have to keep the above equation true.
It seems extremely unlikely that there is a benefit in reducing the 2-or-3-in-a-million redundancy of Foo + Rare by only having Boring + Rare and reconstructing Foo from them. But it may be of benefit to define a unique index (b) for Boring which will maintain that a b in it has only one a. When you need Foo:
Foo = select * from Boring union select * from Rare
But your updates must maintain that
not exists (select * from Boring b join Rare r where b.b = r.b)
Change the datatype for the association (eg to a text/tinyblob) ?
Please don't do that. If you do the people maintaining your database will curse your name unto the thousandth generation. No joke.
Your best bet here is to rig a one-to-many association. Let's say your table a has an integer primary key a_id.
Then, put that a_id as a foreign key column in your second table b.
You can retrieve your information as follows. This will always give you one row in your result set for each row in a.
SELECT a.this, a.that, GROUP_CONCAT(b.value) value
FROM a
LEFT JOIN b ON a.a_id = b.a_id
GROUP BY a.this, a.that
If you don't mind the extra row for your one-in-a-million case it's even easier.
SELECT a.this, a.that, b.value
FROM a
LEFT JOIN b ON a.a_id = b.a_id
The LEFT JOIN operation allows for the case where your a row has no corresponding b row.
Put an index on b.a_id.

merging tables which consist of 17 million records

I have 3 tables in which 2 tables have 200 000 records and another table of 1 800 000 records. I do merge these 3 tables using 2 contraints that is OCN and TIMESTAMP(month,year). first two tables has columns for month and year as Monthx (which includes both month,date and year). and other table as seperate columns for each month and year. I gave the query as,
mysql--> insert into trail
select * from A,B,C
where A.OCN=B.OCN
and B.OCN=C.OCN
and C.OCN=A.OCN
and date_format(A.Monthx,'%b')=date_format(B.Monthx,'%b')
and date_format(A.Monthx,'%b')=C.IMonth
and date_format(B.Monthx,'%b')=C.month
and year(A.Monthx)=year(B.Monthx)
and year(B.Monthx)=C.Iyear
and year(A.Monthx)=C.Iyear
I gave this query 4days before its still running.could u tell me whether this query is correct or wrong and provide me a exact query..(i gave tat '%b' because my C table has a column which has months in the form JAN,MAR).
Please don't use implicit where joins, bury it in 1989, where it belongs. Use explicit joins instead
select * from a inner join b on (a.ocn = b.ocn and
date_format(A.Monthx,'%b')=date_format(B.Monthx,'%b') ....
This select part of the query (had to rewrite it because I refuse to deal with '89 syntax)
select * from A
inner join B on (
A.OCN=B.OCN
and date_format(A.Monthx,'%b')=date_format(B.Monthx,'%b')
and year(A.Monthx)=year(B.Monthx)
)
inner join C on (
C.OCN=A.OCN
and date_format(A.Monthx,'%b')=C.IMonth
and date_format(B.Monthx,'%b')=C.month
and year(B.Monthx)=C.Iyear
and year(A.Monthx)=C.Iyear
)
Has a lot of problems.
using a function on a field will kill any opportunity to use an index on that field.
you are doing a lot of duplicate test. if (A = B) and (B = C) then it logically follows that (A = C)
the translations of the date fields take a lot of time
I would suggest you rewrite your tables to use fields that don't need translating (using functions), but can be compared directly.
A field like yearmonth : char(6) e.g. 201006 can be indexed and compared much faster.
If the table A,B,C have a field called ym for short than your query can be:
INSERT INTO TRAIL
SELECT a.*, b.*, c.* FROM a
INNER JOIN b ON (
a.ocn = b.ocn
AND a.ym = b.ym
)
INNER JOIN c ON (
a.ocn = c.ocn
AND a.ym = c.ym
);
If you put indexes on ocn (primary index probably) and ym the query should run about a million rows a second (or more).
To test if your query is ok, import a small subset of records from A, B and C to a temporary database and test it their.
You have redundancies in your implicit JOIN because you are joining A.OCN with B.OCN, B.OCN with C.OCN and then C.OCN to A.OCN, on of those can be deleted. If A.OCN = B.OCN and B.CON = C.OCN, A.OCN = C.OCN is implied. Further, I guess you have redundancies in your date comparisons.

Explain SQL and Query optimization

Explain SQL (in phpmyadmin) of a query that is taking more than 5 seconds is giving me the above. I read that we can study the Explain SQL to optimize a query. Can anyone tell if this Explain SQL telling anything as such?
Thanks guys.
Edit:
The query itself:
SELECT
a.`depart` , a.user,
m.civ, m.prenom, m.nom,
CAST( GROUP_CONCAT( DISTINCT concat( c.id, '~', c.prenom, ' ', c.nom ) ) AS char ) AS coordinateur,
z.dr
FROM `0_activite` AS a
JOIN `0_member` AS m ON a.user = m.id
LEFT JOIN `0_depart` AS d ON ( m.depart = d.depart AND d.rank = 'mod' AND d.user_sec =2 )
LEFT JOIN `0_member` AS c ON d.user_id = c.id
LEFT JOIN `zone_base` AS z ON m.depart = z.deprt_num
GROUP BY a.user
Edit 2:
Structures of the two tables a and d. Top: a and bottom: d
Edit 3:
What I want in this query?
I first want to get the value of 'depart' and 'user' (which is an id) from the table 0_activite. Next, I want to get name of the person (civ, prenom and name) from 0_member whose id I am getting from 0_activite via 'user', by matching 0_activite.user with 0_member.id. Here depart is short of department which is also an id.
So at this point, I have depart, id, civ, nom and prenom of a person from two tables, 0_activite and 0_member.
Next, I want to know which dr is related with this depart, and this I get from zone_base. The value of depart is same in both 0_activite and 0_member.
Then comes the trickier part. A person from 0_member can be associated with multiple departs and this is stored in 0_depart. Also, every user has a level, one of what is 'mod', stands for moderator. Now I want to get all the people who are moderators in the depart from where the first user is, and then get those moderaor's name from 0_member again. I also have a variable user_sec, but this is probably less important in this context, though I cannot overlook it.
This is what makes the query a tricky one. 0_member is storing id, name of users, + one depart, 0_depart is storing all departs of users, one line for each depart, and 0_activite is storing some other stuffs and I want to relate those through userid of 0_activite and the rest.
Hope I have been clear. If I am not, please let me know and I will try again to edit this post.
Many many thanks again.
Aside from the few answers provided by the others here, it might help to better understand the "what do I want" from the query. As you've accepted a rather recent answer from me in another of your questions, you have filters applied by department information.
Your query is doing a LEFT join at the Department table by rank = 'mod' and user_sec = 2. Is your overall intent to show ALL records in the 0_activite table REGARDLESS of a valid join to the 0_Depart table... and if there IS a match to the 0_Depart table, you only care about the 'mod' and 2 values?
If you only care about those people specifically associated with the 0_depart with 'mod' and 2 conditions, I would reverse the query starting with THIS table first, then join to the rest.
Having keys on tables via relationship or criteria is always a performance benefit (vs not having the indexes).
Start your query with whatever would be your smallest set FIRST, then join to other tables.
From clarification in your question... I would start with the inner-most... Who it is and what departments are they associated with... THEN get the moderators (from department where condition)... Then get actual moderator's name info... and finally out to your zone_base for the dr based on the department of the MODERATOR...
select STRAIGHT_JOIN
DeptPerMember.*
Moderator.Civ as ModCiv,
Moderator.Prenom as ModPrenom,
Moderator.Nom as ModNom,
z.dr
from
( select
m.ID,
m.Depart,
m.Civ,
m.Prenom,
m.Nom
from
0_Activite as a
join 0_member m
on a.User = m.ID
join 0_Depart as d
on m.depart = d.depart ) DeptPerMember
join 0_Depart as DeptForMod
on DeptPerMember.Depart = DeptForMod.Depart
and DeptForMod.rank = 'mod'
and DeptForMod.user_sec = 2
join 0_Member as Moderator
on DeptForMod.user_id = Moderator.ID
join zone_base z
on Moderator.depart = z.deprt_num
Notice how I tier'd the query to get each part and joined to the next and next and next. I'm building the chain based on the results of the previous with clear "alias" references for clarification of content. Now, you can get whatever respective elements from any of the levels via their distinct "alias" references...
The output from EXPLAIN is showing us that the first and third tables listed (a & d) are not having any indexes utilised by the database engine in executing this query. The key column is NULL for both - which is a shame since both are 'large' tables (OK, they're not really large, but compared to the rest of the tables they're the big 'uns).
Judging from the query, an index on user on 0_activite and an index on (depart, rank, user_sec) on 0_depart would go some way to improving performance.
you can see that columns key and key_len are null this means its not using any key in the possible_keys column. So table a and d are both scanning all rows. (check larger numbers in rows column. you want this smaller).
To deal with 0_depart:
Make sure you have a key on (d.depart, d.rank,d.user_sec) which are part of the join of 0_depart.
To deal with 0_activite:
I'm not positive but a GROUP column should be indexed too so you need a key on a.user

Selecting multiple columns/fields in MySQL subquery

Basically, there is an attribute table and translation table - many translations for one attribute.
I need to select id and value from translation for each attribute in a specified language, even if there is no translation record in that language. Either I am missing some join technique or join (without involving language table) is not working here since the following do not return attributes with non-existing translations in the specified language.
select a.attribute, at.id, at.translation
from attribute a left join attributeTranslation at on a.id=at.attribute
where al.language=1;
So I am using subqueries like this, problem here is making two subqueries to the same table with the same parameters (feels like performance drain unless MySQL groups those, which I doubt since it makes you do many similar subqueries)
select attribute,
(select id from attributeTranslation where attribute=a.id and language=1),
(select translation from attributeTranslation where attribute=a.id and language=1),
from attribute a;
I would like to be able to get id and translation from one query, so I concat columns and get the id from string later, which is at least making single subquery but still not looking right.
select attribute,
(select concat(id,';',title)
from offerAttribute_language
where offerAttribute=a.id and _language=1
)
from offerAttribute a
So the question part.
Is there a way to get multiple columns from a single subquery or should I use two subqueries (MySQL is smart enough to group them?) or is joining the following way to go:
[[attribute to language] to translation] (joining 3 tables seems like a worse performance than subquery).
Yes, you can do this. The knack you need is the concept that there are two ways of getting tables out of the table server. One way is ..
FROM TABLE A
The other way is
FROM (SELECT col as name1, col2 as name2 FROM ...) B
Notice that the select clause and the parentheses around it are a table, a virtual table.
So, using your second code example (I am guessing at the columns you are hoping to retrieve here):
SELECT a.attr, b.id, b.trans, b.lang
FROM attribute a
JOIN (
SELECT at.id AS id, at.translation AS trans, at.language AS lang, a.attribute
FROM attributeTranslation at
) b ON (a.id = b.attribute AND b.lang = 1)
Notice that your real table attribute is the first table in this join, and that this virtual table I've called b is the second table.
This technique comes in especially handy when the virtual table is a summary table of some kind. e.g.
SELECT a.attr, b.id, b.trans, b.lang, c.langcount
FROM attribute a
JOIN (
SELECT at.id AS id, at.translation AS trans, at.language AS lang, at.attribute
FROM attributeTranslation at
) b ON (a.id = b.attribute AND b.lang = 1)
JOIN (
SELECT count(*) AS langcount, at.attribute
FROM attributeTranslation at
GROUP BY at.attribute
) c ON (a.id = c.attribute)
See how that goes? You've generated a virtual table c containing two columns, joined it to the other two, used one of the columns for the ON clause, and returned the other as a column in your result set.

Help me figure out a MySQL query

These are tables I have:
Class
- id
- name
Order
- id
- name
- class_id (FK)
Family
- id
- order_id (FK)
- name
Genus
- id
- family_id (FK)
- name
Species
- id
- genus_id (FK)
- name
I'm trying to make a query to get a list of Class, Order, and Family names that does not have any Species under them. You can see that the table has some form of hierarchy from Order all the way down to Species. Each table has Foreign Key (FK) that relates to the immediate table above itself on the hierarchy.
Trying to get this at work, but I am not doing so well.
Any help would be appreciated!
Meta-answer (comment on the two previous answers):
Using IN tends to degrade to something very like an OR (a disjunction) of all terms in the IN. Bad performance.
Doing a left join and looking for null is an improvement, but it's obscurantist. If we can say what we mean, let's say it in a wau that's clossest to how we'd say it in natural language:
select f.name
from family f left join genus g on f.id = g.family_id
WHERE NOT EXISTS (select * from species c where c.id = g.id);
We want where something doesn't exist, so if we can say "where not exists" all the better. And, the select * in the subquery doesn't mean it's really bringing back a whole row, so it's not an "optimization" to replace select * with select 1, at least not on any modern RDBMS.
Further, where a family has many genera (and in biology, most families do), we're going to get one row per (family, genus) when all we care about is the family. So let's get one row per family:
select DISTINCT f.name
from family f left join genus g on f.id = g.family_id
WHERE NOT EXISTS (select * from species c where c.id = g.id);
This is still not optimal. Why? Well it fulfills the OP's requirement, in that it finds "empty" genera, but it fails to find families that have no genera, "empty" families. Can we make it do that too?
select f.name
from family f
WHERE NOT EXISTS (
select * from genus g
join species c on c.id = g.id
where g.id = f.id);
We can even get rid of the distinct, because we're not joining family to anything. And that is an optimization.
Comment from OP:
That was a very lucid explanation. However, I'm curious as to why using IN or disjunctions is bad for performance. Can you elaborate on that or point me to a resource where I can learn more about the relative performance cost of different DB operations?
Think of it this way. Say that there was not IN operator in SQL. How would you fake an IN?
By a series of ORs:
where foo in (1, 2, 3)
is equivalent to
where ( foo = 1 ) or ( foo = 2 ) or (foo = 3 )
Ok, you say, but that still doesn't tell me why it's bad. It's bad because there's often no decent way to use a key or index to look this up. So what you get is either a) a table scan, where for each disjunction (or'd predicate or element of an IN list), the row gets tested, until a test is true or the list is exhausted. Or b) you get a table scan for each of these disjunctions. The second case (b) may actually be better, which is why you sometimes see a select with an OR turned into one select for each leg of the OR union'd together:
select * from table where x = 1 or x = 3 ;
select * from table where x = 1
union select * from table where x = 3 ;
Now this is not to say you can never use an OR or an IN list. And in some cases, the query optimizer is smart enough to turn an IN list into a join -- and the other answers you were given are precisely the cases where that's most likely.
But if we can explicitly turn our query into a join, well, we don't have to wonder if the query optimizer is smart. And in general, joins are what the databse is best at doing.
Well, just giving this a quick and dirty shot, I'd write something like this. I spend most of my time using Firebird so the MySQL syntax may be a little different, but the idea should be clear
select f.name
from family f left join genus g on f.id = g.family_id
left join species s on g.id = species.genus_id
where ( s.id is null )
if you want to enforce there being a genus then you just remove the "left" portion of the join from family to genus.
I hope I'm not misunderstanding the question and thus, leading you down the wrong path. Good luck!
edit: Actually, re-reading this I think this will just catch families where there's no species within a genus. You could add a " and ( g.id is null )" too, I think.
Sub-select to the rescue...
select f.name from family as f, genus as g
where
f.id == g.family_id and
g.id not in (select genus_id from species);
SELECT f.name
FROM family f
WHERE NOT EXISTS (
SELECT 1
FROM genus g
JOIN species s
ON g.id = s.genus_id
WHERE g.family_id = f.id
)
Note, than unlike pure LEFT JOIN solutions, this is more efficient.
It does not select ALL rows filtering out those with NOT NULL values, but instead selects at most one row from genus and species.