I'm trying to write a query which combines data from two tables. Both tables contain information about certain objects. An object is uniquely defined by a triple (x,y,z) (each x, y and z have a separate column in T1) as well as by its name (the primary key in T1). In T2 the primary key is name2. There's also a column name1 which indicates the object with which object name2 is linked to, and a column w. So T1 contains the objects' names and their triples while T2 tells us which objects are linked ('linked' is not an equivalence, if object1 is linked to object2 then object2 is not linked to object 1).
We are given x, y and t and want the triples of the object that are linked to objects with x and y on their first coordinate with w>t.
I tried to write a query where we need matches object with a specific name:
SELECT
(SELECT x FROM T1 WHERE name=name2),
(SELECT y FROM T1 where name=name2) AS k,
w
FROM T2
WHERE name1=nn3
AND w>t
ORDER BY k;
but I can't get to how to write it when we may have more than one object from T1.
I don't have access to the database, I only know the columns of the tables so I'm having a hard time solving this without trying it out on the database. I'm confused by going from T1 to T2 back to T1 again.
From your question I read the table definitions to be:
T1 -- a list of objects
x,
y,
z,
name -- primary key, hence unique
T2 -- a list of objects linked to an object in T1
name2, -- primary key
name1, -- foreign key to T1.name
w
If we have three objects A, B and C all three will appear in T1. If B and C are linked to A, T2 will be (inter alia)
Name2 Name1
B A
C A
We are given x, y and t and want the triples of the object that are linked to objects with x and y on their first coordinate with w>t.
As you point out, T1 contains information about the objects and T2 about how objects are linked. To retrieve matching information from more than one table you use the JOIN syntax. Since there are two objects in a linkage you must have a separate JOIN for each. It is OK to reference a table more than once in a query. You should use aliases to clarify what role a table is fulfiling each time it is mentioned. This give something like
from T2 as linkage
inner join T1 as linked_from
on linked_from.name = T2.name1 -- note different columns in T2
inner join T1 as linked_to
on linked_to.name = T2.name2 -- note different columns in T2
Personally I don't like linked_from and linked_to as they are vague and generic. You should use whatever is specific and meaningful in the context of your problem.
You don't say whether the given x and y refer to the linked_from or the linked_to object. This will be very significant to the output of the query, but it is trivial to change in the SQL - just use the other alias - so I'll assume it is the parent object. The where clause is
where link_from.x = #x_given_value
and link_from.y = #y_given_value
and linkage.w > #t_given_value
You do not need to put additional sub-queries in the SELECT clause since you have already referenced all the tables and table-roles you need in the FROM cluase. So it becomes
SELECT
link_to.x,
link_to.y as k,
linkage.w
I'm confident you can work out the ordering.
Be aware of the cardinalities involved in your database i.e. how many rows match each condition. You state that x, y and z together are unique. Since only x and y are supplied you may well get multiple rows returned. Similarly there may well be many rows with a value of w greater than the given value of t, even for the same name1, name2 values. There will be many rows in T2 for each row in T1.
I realise this is an example and that you have probably "simplified" the names for the question, but any time you find yourself naming columns with a numeric suffix, you're doing something wrong. Maybe you haven't normalised sufficiently, or maybe you haven't understood the relationships between the data items.
Related
I'm trying to learn sql better, views more specifically but I can't get the following to work out for me.
I've put a slimmed down version of it here. There's more joins I have to do based on foreign keys from the tbl2 matches.
Since it's a view, I can't create temp tables.
I can't rely on stored procedures in this case.
I could do outer apply, but only to get specific references (row 1, 2...) and that would be by doing a Select * from Table2 where.... and that would mean 1 index scan per time I use it.
I could create the view using "With tbl2 (FK_TABLE1...) as SELECT FK_TABLE1 from dbo.TABLE2) but that doesn't seem to be helpful. Each reference to it does a sort or a scan so no gain there.
Is there some way I'm able to create some type of list that I can reuse so I can simply just run 1 index scan to get the matching ones from Table2?
Or is there another way to think about this?
Table1 (PK, XX, YY)
Table2 (PK, FK_TABLE1, Type, Progress, ZZ, FK_Status)
Create View MyView
as
Select
Table1.PK
,Table1.XX
,Table1.YY
---- I want to present data from the first 3 matches
,(SELECT ZZ from tbl2 where tbl2.FK_TABLE1 = FK_TABLE1.PK ORDER BY Type ASC OFFSET(0) ROWS FETCH NEXT (1) ROWS ONLY) ZZ1
,(SELECT ZZ from tbl2 where tbl2.FK_TABLE1 = FK_TABLE1.PK ORDER BY Type ASC OFFSET(1) ROWS FETCH NEXT (1) ROWS ONLY) ZZ2
,(SELECT ZZ from tbl2 where tbl2.FK_TABLE1 = FK_TABLE1.PK ORDER BY Type ASC OFFSET(2) ROWS FETCH NEXT (1) ROWS ONLY) ZZ3
,sts.StatusName CurrentStatus
From Table1
LEFT OUTER JOIN Table2 AS tbl2 ON (tbl2.FK_TABLE1= Table1.PK) ---- Here I want to make some sort of join so I get all matching rows from the other table
LEFT OUTER JOIN STATUS AS sts ON (sts.PK = [tbl2 ordered by type, if last elements status = X take that, else status of first).FK_STATUS) ---- Here I'm a bit puzzled, since I have to order by, but also have a fallback value if last element isn't matching.
Among all the tables in my database, I have two which currently have a Many-to-Many join. However, the actual data population being captured nearly always has a One-to-Many association.
Considering that I want database look-ups (doctrine queries) to be as unencumbered as possible, should I instead:
Create two associations between the tables (where the second is only
populated in these exceptional cases)?
Change the datatype for the association (eg to a text/tinyblob) to record a mini array of the 2 (or technically even 3) associated records?
This is what I currently have (although TableB-> JoinTable is usually just one-to-one):
TableA.id --< a_id.JoinTable.b_id >-- TableB.id
So, I am looking to see if I can capture the 'exceptions'. Is the below the correct way to go about it?
TableA.id TableB.id
+----< TableB.A_id1
+----- TableB.A_id2
+----- TableB.A_id3
You seem to be interested in:
-- a and b are related by the association of interest
Foo(a, b)
-- foo(a, b) but not foo(a2, b) for some a2 <> a
Boring(a, b)
unique(b)
FK (a, b) references Foo
-- foo(a, b) and foo(a2, b) for some a2 <> a
Rare(a, b)
FK (a, b) references foo
If you want queries to be unencumbered, just define Foo. You can query it for Rare.
Rare = select * from Foo f join Foo f2
where f.a <> f2.a and f.b = f2.b
Any other design suffers from update complexity in keeping the database consistent.
You have some fuzzy concern about Rare being much smaller than Foo. But what is your requirement re only n in a million Foo records being many:many by which you would choose some other design?
The next level of complexity is to have Foo and Rare. Updates have to keep the above equation true.
It seems extremely unlikely that there is a benefit in reducing the 2-or-3-in-a-million redundancy of Foo + Rare by only having Boring + Rare and reconstructing Foo from them. But it may be of benefit to define a unique index (b) for Boring which will maintain that a b in it has only one a. When you need Foo:
Foo = select * from Boring union select * from Rare
But your updates must maintain that
not exists (select * from Boring b join Rare r where b.b = r.b)
Change the datatype for the association (eg to a text/tinyblob) ?
Please don't do that. If you do the people maintaining your database will curse your name unto the thousandth generation. No joke.
Your best bet here is to rig a one-to-many association. Let's say your table a has an integer primary key a_id.
Then, put that a_id as a foreign key column in your second table b.
You can retrieve your information as follows. This will always give you one row in your result set for each row in a.
SELECT a.this, a.that, GROUP_CONCAT(b.value) value
FROM a
LEFT JOIN b ON a.a_id = b.a_id
GROUP BY a.this, a.that
If you don't mind the extra row for your one-in-a-million case it's even easier.
SELECT a.this, a.that, b.value
FROM a
LEFT JOIN b ON a.a_id = b.a_id
The LEFT JOIN operation allows for the case where your a row has no corresponding b row.
Put an index on b.a_id.
I came across this question in an interview and I have been wondering if what I did was right. Let's say I have a table 'A' with the following attributes:
R S T
-----------
a1 b1 c1
a1 b2 c2
a1 b3 c3
a4 b4 c4
and lets say I need to calculate the Relational Algebra for given B = {[(projection)R,S (A) NATURAL JOIN (projection) S,T (A) ] NATURAL JOIN (projection)R,T (A)}
what would be the result?
This is what I tried:
-We know (A) NATURAL JOIN (A) = A
-I did the first set of join within the square bracket. Since we had attribute 'S' in common I just yielded the result to be a table of (R S T) with the same 4 rows of tuples.
-Finally, I joined (R S T) with the second set of join where attributes 'R' and 'T' are common which I assumed will yield R S T again with 4 rows of tuples.
Meaning, with the way I did it, I ended up getting B = A.
I did not consider the tuples at all, I just did a natural join based on the common attributes between two projections.
I know that's very stupid.. but I am trying to execute it in MySQL and for some reason I am getting errors when I try to execute such a query:
select A,B from dbt2.relationalalgebra as r1 NATURAL JOIN (select B, C from dbt2.relationalalgebra as r2); and I am getting an error saying every derived table must have its own aliases!
Please help me clarify on how Natural join works on same table.
Thanks in advance for any help.
What you did is correct. And it's correct that you obtained B = A -- given that content for A.
This is a question about the functional dependencies between the values in the data. (So you might not get B = A, if the data was different.)
For attributes S and T, there's a different value in each tuple. IoW given a value for S (or T), you know which row it's from, so you know the value for the other two attributes in that tuple. The functional dependencies are S -> R, T; T -> R, S. (You might say that S or T are each keys for A.)
The pairs of attributes in the projections you give each include at least one key, so uniquely determine which 'missing' attribute gets joined. You're seeing a lossless join decomposition, as per Heath's Theorem. http://en.wikipedia.org/wiki/Functional_dependency
A natural join is a shorthand for joining two tables (or subqueries) on all columns that have the same name.
A natural join of a table to itself could have several consequences. The most common would be the table itself -- if none of the values are NULL and the rows are unique. If each row had a NULL value in some column, then the natural join would return no rows at all. If rows are duplicated, then multiple rows might appear.
I do not recommend ever using natural join. A small change to an underlying table structure could break a query.
Let's say I have a mysql table called FISH with fields A, B and C.
I run SELECT * FROM FISH. This gets me a view with all fields. So, if A was a key in the original table, is it also a key in the view? Meaning, if I have a table FISH2, and I ran
SELECT * FROM (SELECT * FROM FISH) D, (SELECT * FROM FISH2) E WHERE D.A = E.A
Will the relevant fields still be keys?
Now, let's take this 1 step further. If I run
SELECT * FROM (SELECT CONCAT(A,B) AS DUCK, C FROM FISH) D, (SELECT CONCAT(A,B) AS DUCK2, C FROM FISH2) E WHERE D.DUCK = E.DUCK2
If A and B were keys in the original tables, will their concatenation also be a key?
Thanks :)
If A is a key in fish, any projection on fish only, will produce a resultset where A is still unique.
A join between table fish and any table with 1:1 relation (such as fish_type) will produce a result set where A is unique.
A join with another table that has 1:M or M:M relation from fish (such as fish_beits) will NOT produce a result where A is unique, unless you provide a filter predicate on the "other" side (such as bait='Dynamite').
SELECT * FROM (SELECT * FROM FISH) D, (SELECT * FROM FISH2) E WHERE D.A = E.A
...is logically equivalent to the following statement, and most databases (including MySQL) will perform the transformatiion:
select *
from fish
join fish2 on(fish.a = fish2.a)
Whether A is still unique in the resultset depends on the key of fish2 and their relation (see above).
Concatenation does not preserve uniqueness. Consider the following case:
concat("10", "10") => "1010"
concat("101", "0") => "1010"
Therefore, your final query...
SELECT *
FROM (SELECT CONCAT(A,B) AS DUCK, C FROM FISH) D
,(SELECT CONCAT(A,B) AS DUCK2, C FROM FISH2) E
WHERE D.DUCK = E.DUCK2
...won't (necessarily) produce the same result as
select *
from fish
join fish2 on(
fish.a = fish2.a
and fish.b = fish2.b
)
I wrote necessarily because the collisions depend on the actual values. I hunted down a bug about some time ago where the root cause was exactly this. The code had worked for several years before the bug manifested itself.
If by "key" you mean "unique", yes, tuples of a cartesian product over unique values will be unique.
(One can prove it via by reductio ad absurdum.)
For step 1, think of a view as a subquery containing everything in the AS clause when CREATE VIEW was executed.
For example, if view v is created as SELECT a, b, c FROM t, then when you execute...
SELECT * FROM v WHERE a = some_value
...it's conceptually treated as...
SELECT * FROM (SELECT a, b, c FROM t) WHERE a = some_value
Any database with a decent optimizer will notice that column a is passed straight into the results and that that it can take advantage of the indexing in t (if there is any) by moving it into the subquery:
SELECT * FROM (SELECT a, b, c FROM t WHERE a = some_value)
This all happens behind the scenes and is not an optimization you need to do yourself. Obviously, it can't do that for every condition in the WHERE clause, but understanding where you can is part of the art of writing a good optimizer.
For step 2, the concatenated keys will be part of intermediate results, and whether or not the database decides they need indexing is an implementation detail. Also note fche's comment about duplication.
If your database has a query plan explainer, running it and learning to interpret the results will give you a lot of insight about what makes your queries run fast and what slows them down.
I have the following problem to solve:
Let's say there is a table A that contains an number of elements, this table is copied to become table B. In table B some of the original elements are lost and some are added. A reference table AB keeps track of these changes. Then table B is copied to be table C and again some of the existing elements get lost and some are added. Another reference Table BC keeps track of these relations. ... etc.
There is an n number of such tables with an n-1 number of reference tables.
If I want to know which of the elements of my choice in table C where already present in A, I can do that by doing something like:
SELECT AB.oldID
FROM AB
JOIN BC
WHERE BC.newID IN (x, y, z)
Now since the number of reference tables can wary, the number of JOIN lines can wary.
Should I concatenate the query by looping over the steps and adding JOIN lines or shoudl I rather write a recursive function that selects only the members of the next step and then let the function call itself until I have the end result?
Or is there an other even better way to do something like that?
Since your table names vary, you'll need to build some kind of a dynamical query.
If you do the recursive function approach, you'll need to pass the resultsets between the function calls somehow.
MySQL has no array datatype, and storing the results in a temp table is way too long.
Conclusion: use joins.
Update:
Here's a sample query which returns the entries that persisted through revision A to revision M (with one table design):
SELECT *
FROM entries e
WHERE NOT EXISTS
(
SELECT *
FROM revisions r
JOIN revision_changes rc
ON rc.revision_id = r.id
WHERE rc.entry_id = e.id
AND rc.deleted
AND r.revision_id BETWEEN 'A' AND 'M'
)
This way, you just fill the added and deleted fields of revision_changes for the revisions where the entry was added or deleted.