I need help with a data extraction. I'm an sql noob and I think I have a serious issue with my data design skills. DB system is MYSQL running on Linux.
Table A is structured like this one:
TYPE SUBTYPE ID
-------------------
xyz aaa 0001
xyz aab 0001
xyz aac 0001
xyz aad 0001
xyz aaa 0002
xyz aaj 0002
xyz aac 0002
xyz aav 0002
Table B is:
TYPE1 SUBTYPE1 TYPE2 SUBTYPE2
-------------------------------------
xyz aaa xyz aab
xyz aac xyz aad
Looking at whole table A, I need to extract all rows where both type and subtype are present as columns in a single table B row. Of course this condition is never met since A.subtype can't be at same time equal to B.subtype1 AND B.subtype2 ...
In the example the result set for id should be:
xyz aaa 0001
xyz aab 0001
xyz aac 0001
xyz aad 0001
I m trying to use a join with 2 AND conditions, but of course I got an empty set.
EDIT:
#Barmar thank you for your support. It seems that I m really near the final solution. Just to keep things clear, I opened this thread with a shortened and simplified data structure, just to highlight the point where I was stuck.
I thought about your solution, and is acceptable to have both result on a single row. Now, I need to reduce execution time.
First join takes about 2 minutes to complete, and it produce around 23Million of rows. The second join (table B) is probably taking longer.
In fact, I need 3 hours to have the final set of 10 millions of rows. How can we impove things a bit? I noticed that mysql engine is not threaded, and the query is only using a single CPU. I indexed all fields used by join, but I m not sure its the right thing to do...since I m not a DBA
I suppose also having to rely on VARCHAR comparison for such a big join is not the best solution. Probably I should rewrite things using numerical ID that should be faster..
Probably split things into different query will help parallelism. thanks for a feedback
You can join Table A with itself to find all combinations of types and subtypes with the same ID, then compare them with the values in Table B.
SELECT t1.type AS type1, t1.subtype AS subtype1, t2.type AS type2, t2.subtype AS subtype2, t1.id
FROM TableA AS t1
JOIN TableA AS t2 ON t1.id = t2.id AND NOT (t1.type = t2.type AND t1.subtype = t2.subtype)
JOIN TableB AS b ON t1.type = b.type1 AND t1.subtype = b.subtype1 AND t2.type = b.type2 AND t2.subtype = b.subtype2
This returns the two rows from Table A as a single row in the result, rather than as separate rows, I hope that's OK. If you need to split them up, you can move this into a subquery and join it back with the original table A to return each row.
SELECT a.*
FROM TableA AS a
JOIN (the above query) AS x
ON a.id = x.id AND
((a.type = x.type1 AND a.subtype = x.subtype1)
OR
(a.type = x.type2 AND a.subtype = x.subtype2))
DEMO
You can use EXISTS:
SELECT a.*
FROM TableA a
WHERE EXISTS(
SELECT 1
FROM TableB b
WHERE
(b.Type1 = a.Type AND b.SubType1 = a.SubType)
OR (b.Type2 = a.Type AND b.SubType2 = a.SubType)
)
AND a.ID = '0001'
ONLINE DEMO
You can use Join like this :
Select A.Type, A.SubType, A.ID from a_table A JOIN b_table B
ON (A.Type = B.Type1 AND A.SubType = B.SubType1) OR
(A.Type = B.Type2 AND A.SubType = B.SubType2)
But I think there is a problem in your design, you have same values in Table A with different ID and there is no any condition on ID !
Instead of storing Type and SubType in Table B, you can store an unique ID of each record of Table A to Table B, then you can think about better ways to get results you want ...
Edit :
With UNION of two joins you can get that result :
Select A.Type, A.SubType, A.ID from A_table A
JOIN b_table B1 ON A.Type = B1.Type1 AND A.SubType = B1.SubType1
WHERE (B1.Type2, B1.SubType2) IN (SELECT Type, SubType FROM A_table) AND ID = '0001'
UNION
Select A.Type, A.SubType, A.ID from A_table A
JOIN b_table B2 ON A.Type = B2.Type2 AND A.SubType = B2.SubType2
WHERE (B2.Type1, B2.SubType1) IN (SELECT Type, SubType FROM A_table) AND ID = '0001'
But as I say, I think there is a design problem, it seems better that each type and subtype have an unique ID in Table A and work with this ID on Table B
Related
Im trying to make some sort of localization in my DB.
For example I have 3 tables(img 1). Languages table contains different languages. Localization table has 3 fields: "id" - id of the string, 'language' - language of the string(id and language are my primary key), 'value' - localized string. tableOne has 'id', 'Col1' and 'Col2' - these fields contain IDs of the localizeable strings.
So after localizing I expect to get one of green tables instead of original(depending on a language parameter).
I've made it this way and it works, but I'd like to know is there any other better way to make it because now I have to create INNER JOIN block for each column, which must be localized. Im just scared that it will be very very slow.
I tried to create a temporary table to select all records of the required language and then i was doing the same. Inner joins, but searches should be performed only among the records of one language. But its not working because i still had to use multiple inner joins with that temp table which is impossible.
SELECT
`One`.`id` AS 'id',
`loc1`.`value` AS 'Col1',
`loc2`.`value` AS 'Col2'
FROM
`tableOne` AS `One`
INNER JOIN
`localization` AS `loc1`
ON `loc1`.`id` = `One.Col1`
AND `loc1`.`language` = 'en'
INNER JOIN
`localization` AS `loc2`
ON `loc2`.`id` = `One.Col2`
AND `loc2`.`language` = 'en'
img 1
If you want to reduce the number of JOINS needed, try displaying the values in rows instead of columns. You could do so like this:
SET #lang := 'en';
SELECT 1, tmp.value
FROM(
SELECT value
FROM localization
WHERE language = #lang AND id IN(543, 345)) tmp;
I first set a language parameter, and then I pull all values for that language from the localization table, using the ids inside an IN operator. You'll get results like this:
| 1 | one |
| 1 | two |
If you have to use the format given in the first table, try doing one inner join where you pull for the specific language and ids like this:
SELECT t1.id, t1.col1, t1.col2,
CASE WHEN l.id = t1.col1 THEN l.value ELSE null END AS col1Value,
CASE WHEN l.id = t1.col2 THEN l.value ELSE null END AS col2Value
FROM firstTable t1
JOIN localization l ON l.id IN (t1.col1, t1.col2) AND l.language = #lang;
Unfortunately, this won't give you the final solution, it will give you values like:
| 1 | 543 | 345 | one | null |
| 1 | 543 | 345 | null | two |
To wrap those into one column and remove nulls, just add MAX():
This will run a case statement for each column you have, but it will only have one JOIN and looks a little more manageable:
SELECT t1.id,
MAX(CASE WHEN l.id = t1.col1 THEN l.value ELSE null END) AS col1Value,
MAX(CASE WHEN l.id = t1.col2 THEN l.value ELSE null END) AS col2Value
FROM firstTable t1
JOIN localization l ON l.id IN (t1.col1, t1.col2) AND l.language = #lang
GROUP BY t1.id;
Here is an SQL Fiddle example. I don't think the case blocks will bog you down too much, but let me know how this preforms against your actual database.
I want to extract all the rows from a database table, where the rows cross-reference each other.
My table contains 2 rows: ref1 & ref2
Table example:
ID ref1 ref2
01 23 83
02 77 55
03 83 23
04 13 45
In this case, I want my query to return only rows 01 and 03, because they cross-reference each other.
Is this possible using a single query, or will I need to iterate the entire table manually?
I'm using MySQL.
A simple JOIN can do that in a straight forward manner;
SELECT DISTINCT a.*
FROM mytable a
JOIN mytable b
ON a.ref1 = b.ref2 AND a.ref2 = b.ref1;
An SQLfiddle to test with.
select
*
from
tbl t1
where
exists (
select
'x'
from
tbl t2
where
t1.ref1 = t2.ref2 and
t1.ref2 = t2.ref1
)
Let me first present the solution in a context of a table that is meant for representing trees but faces this issue (resulting in a cross reference that is not anymore part of the tree).
Note: If your root tree record(s) reference themselves then you need to filter them out ( a.id!=b.id) as below, to keep only cross-referencing records.
-- case of a tree(s)
select
a.id currentId,
b.id parentId,
b.parent_id grandParentId
from my_table a
join my_table b on
a.parent_id=b.id
and b.parent_id=a.id
and a.id!=b.id;
Now in your case, the above query can be written as (Again considering that records referencing themselves are allowed we add and a.ref1!=b.ref2):
-- case of a graph(s)
select
a.ref1 theCurrent,
b.ref1 theRef,
b.ref2 theRefsRef
from my_table a
join my_table b on
a.ref2=b.ref1
and b.ref2=a.ref1
and a.ref1!=b.ref2;
Supposing I have a table where a material has asignments of different characteristics. A material can have one or more charateristics. Then I would like to find to a certain material similar materials, that means at least 2 characteristics should match. In this example I should find material C when I compare with A and D should find B. Is there any solution in SQL?
material | character
----------------------
A | 2
A | 5
B | 1
B | 3
B | 4
C | 2
C | 5
D | 3
D | 1
This is an Entity-Attribute-Value table, and it notoriously painful to search. (In this case, the value is implied as being TRUE for has this attribute.)
It involves comparing everything against everything, grouping the results, and checking if the groups match. Virtually no use of indexes or intelligence of any kind.
SELECT
material_a.material AS material_a,
material_b.material AS material_b
FROM
material AS material_a
LEFT JOIN
material AS material_b
ON material_a.character = material_b.character
AND material_a.material <> material_b.material
GROUP BY
material_a.material,
material_b.material
HAVING
0 = MAX(CASE WHEN material_b.character IS NULL THEN 1 ELSE 0 END)
This gives every material_b that has all of the characteristics that material_a has.
- The HAVING clause will check that every 0 of material a's characteristics are missing from material b.
Changing to an INNER JOIN and changing the HAVING CLAUSE will get the share at least two materials.
SELECT
material_a.material AS material_a,
material_b.material AS material_b
FROM
material AS material_a
INNER JOIN
material AS material_b
ON material_a.character = material_b.character
AND material_a.material <> material_b.material
GROUP BY
material_a.material,
material_b.material
HAVING
COUNT(*) >= 2
Either way, you still are joining the whole table against the whole table, then filtering out the failures. With 100 materials, that's 9,900 material-material comparison. Imagine when you have 1000 materials and have 999,000 comparisons. Or 1million materials...
You could use something like the following grouped table to determine all items with more than 2 similar characteristics
SELECT
material = t1.material
, similarMaterial = t2.material
FROM
tableName t1
INNER JOIN tableName t2 ON t1.character = t2.character AND NOT(t1.material = t2.material)
GROUP BY material
HAVING
COUNT(*) >= 2
Yes, you can find all paired of similar materials with SQL similar to this:
SELECT c1.material, c2.material, COUNT(*) as characterCount
FROM charateristics c1
CROSS JOIN charateristics c2
WHERE c1.material > c2.material AND c1.character = c2.character
GROUP BY c1.material, c2.material
HAVING characterCount >= 2;
This would give you the results based on a material input:
SELECT b.material
FROM table1 a
INNER JOIN table1 b
ON a.character = b.character AND a.material <> b.material
WHERE a.material = 'A' -- Your input
GROUP BY b.material
HAVING COUNT(*) > 1;
sqlfiddle demo
Or do this to give you the pairs:
SELECT a.material as LEFT_MATERIAL ,b.material AS RIGHT_MATERIAL
FROM table1 a
INNER JOIN table1 b ON a.character = b.character AND a.material <> b.material
GROUP BY a.material,b.material
HAVING COUNT(*) > 1;
sqlfiddle demo
I have a table in MySQL as follows.
Id Designation Years Employee
1 Soft.Egr 2000-2005 A
2 Soft.Egr 2000-2005 B
3 Soft.Egr 2000-2005 C
4 Sr.Soft.Egr 2005-2010 A
5 Sr.Soft.Egr 2005-2010 B
6 Pro.Mgr 2010-2012 A
I need to get the Employees who worked as Soft.Egr and Sr.Soft.Egr and Pro.Mgr. It is not possible to use IN or Multiple ANDs in the query. How to do this??
One way:
select Employee
from job_history
where Designation in ('Soft.Egr','Sr.Soft.Egr','Pro.Mgr')
group by Employee
having count(distinct Designation) = 3
What you might actually be looking for is relational division, even if your exercise requirements forbid using AND (for whatever reason?). This is tricky, but possible to express correctly in SQL.
Relational division in prosa means: Find those employees who have a record in the employees table for all existing designations. Or in SQL:
SELECT DISTINCT E1.Employee FROM Employees E1
WHERE NOT EXISTS (
SELECT 1 FROM Employees E2
WHERE NOT EXISTS (
SELECT 1 FROM Employees E3
WHERE E3.Employee = E1.Employee
AND E3.Designation = E2.Designation
)
)
To see the above query in action, consider this SQLFiddle
A good resource explaining relational division can be found here:
http://www.simple-talk.com/sql/t-sql-programming/divided-we-stand-the-sql-of-relational-division
If you need to get additional information back about each of the roles (like the dates) then joining back to your original table for each of the additional designations is a possible solution:
SELECT t.Employee, t.Designation, t.Years, t1.Designation, t1.Years, t2.Designation, t2.Years
FROM Table t
INNER JOIN t2 ON (t2.Employee = t.Employee AND t2.Designation = 'Sr.Soft.Egr')
INNER JOIN t3 ON (t3.Employee = t.Employee AND t3.Designation = 'Soft.Egr')
WHERE t.Designation = 'Pro.Mgr';
Why not the following (for postgresql)?
SELECT employee FROM Employees WHERE Designation ='Sr.Soft.Egr'
INTERSECT
SELECT employee FROM Employees WHERE Designation ='Soft.Egr'
INTERSECT
SELECT employee FROM Employees WHERE Designation ='Pro.Mgr'
Link to SQLfiddle
I know this might not optimized, but I find this much much easier to understand and modify.
Try this query:
SELECT DISTINCT t1.employee,
t1.designation
FROM tempEmployees t1, tempEmployees t2, tempEmployees t3
WHERE t1.employee = t2.employee AND
t2.employee = t3.employee AND
t3.employee = t1.employee AND
t1.designation != t2.designation AND
t2.designation != t3.designation AND
t3.designation != t1.designation
I have this data in a table, for instance,
id name parent parent_id
1 add self 100
2 manage null 100
3 add 10 200
4 manage null 200
5 add 20 300
6 manage null 300
How can I left join or inner join this table itself so I get this result below?
id name parent
2 manage self
4 manage 10
6 manage 20
As you can I that I just want to query the row with the keyword of 'manage' but I want the column parent's data in add's row as the as in manage's row in the result.
Is it possible?
EDIT:
the simplified version of my actual table - system,
system_id parent_id type function_name name main_parent make_accessible sort
31 30 left main Main NULL 0 1
32 31 left page_main_add Add self 0 1
33 31 left page_main_manage Manage NULL 0 2
my actual query and it is quite messy already...
SELECT
a.system_id,
a.main_parent,
b.name,
b.make_accessible,
b.sort
FROM system AS a
INNER JOIN -- self --
(
SELECT system_id, name, make_accessible, sort
FROM system AS s2
LEFT JOIN -- search --
(
SELECT system_id AS parent_id
FROM system AS s1
WHERE s1.function_name = 'page'
) AS s1
ON s1.parent_id = s2.parent_id
WHERE s2.parent_id = s1.parent_id
AND s2.system_id != s1.parent_id
ORDER BY s2.sort ASC
) b
ON b.system_id = a.parent_id
WHERE a.function_name LIKE '%manage%'
ORDER BY b.sort ASC
result I get currently,
system_id main_parent name make_accessible sort
33 NULL Main 0 1
but I am after this,
system_id main_parent name make_accessible sort
33 self Main 0 1
You just need to reference the table twice:
select t1.id, t1.name, t2.id, t2.name
from TableA t1
inner join TableA t2
on t1.parent_id = t2.Id
Replace inner with left join if you want to see roots in the list.
UPDATE:
I misread your question. It seems to me that you always have two rows, manage one and add one. To get to "Add" from manage:
select system.*, (select parent
from system s2
where s2.parent_id = system.parent_id
and s2.name = 'add')
AS parent
from system
where name = 'manage'
Or, you might split the table into two derived tables and join them by parent_id:
select *
from system
inner join
(
select * from system where name = 'add'
) s2
on system.parent_id = s2.parent_id
where system.name = 'manage'
This will allow you to use all the columns from s2.
Your data does not abide to a child-parent hierarchical structure. For example, your column parent holds the value 10, which is not the value of any id, so a child-parent association is not possible.
In other words, there's nothing that relates the record 2,manage,null to the record 1,add,self, or the record 4,manage,null to 3,add,10, as you intend to do in your query.
To represent hierarchical data, you usually need a table that has a foreign key referencing it's own primary key. So your column parent must reference the column id, then you can express a child-parent relationship between manage and add. Currently, that's not possible.
UPDATED: Joining by parent_id, try:
select m.id, m.name, a.parent
from myTable m
join myTable a on m.parent_id = a.parent_id and a.name = 'add'
where m.name = 'manage'
Change the inner join to a left join if there may not be a corresponding add row.