I have 3 tables A, B and C:
Table A is small (~1000 rows).
Table B has ~200,000 rows.
Table C has ~2.2 million rows.
I'm running a query like this:
SELECT A.Id FROM A, B, C
WHERE A.Id = B.SomeId OR (A.Id = C.SomeId AND C.SomeValue = 'X')
INTO OUTFILE '/tmp/result.txt';
A.Id is the primary key of table A
B.SomeId has an index set up
Edit: C.SomeId has an index set up
C.SomeVal has an index set up too but it's a VARCHAR(1) with only two possible values
I thought this would only have to iterate over each Id in table A (1000 rows) and then potentially query across the other tables (depending on whether MySQL short circuits, I don't know if it does).
But the query seems to hang, or at least it's taking a very long time. Much longer than I would have expected if it only had to iterate 1000 rows. 10 minutes in and the output file is still empty. Let me know if I can provide any more information.
my#laptop$ mysql --version
mysql Ver 14.14 Distrib 5.5.37, for debian-linux-gnu (i686) using readline 6.3
Edit:
The result I'm looking for is 'Give me all the Id's in table A where the Id matches B.SomeId OR ELSE the Id matches C.SomeId AND C.SomeValue equals 'X'.
OR expressions often make it difficult for MySQL to use indexes. Try changing to a UNION:
SELECT A.id
FROM A
JOIN B ON A.id = B.SomeID
UNION
SELECT A.id
FROM A
JOIN C ON A.id = C.SomeID
WHERE C.SomeValue = 'A'
From the documentation:
Minimize the OR keywords in your WHERE clauses. If there is no index that helps to locate the values on both sides of the OR, any row could potentially be part of the result set, so all rows must be tested, and that requires a full table scan. If you have one index that helps to optimize one side of an OR query, and a different index that helps to optimize the other side, use a UNION operator to run separate fast queries and merge the results afterward.
Your query is described by the last sentence: you have different indexes for each side of the OR query.
Let's go even smaller. Let's say that your tables look like this:
A.ID
1
2
B.SomeID
1
3
C.SomeID | C.SomeValue
1 | X
2 | X
Now, let's see what your query will do. First, we look to see if A.ID match and B.SomeID match. In the case of A.ID = 1, we have a match! Sql short circuits. This means that if the first part of your or is true, sql doesn't evaluate the 2nd part of your or. Now, we still have to join with table C. Since there is no join condition, for table C sql matches A.ID with all the columns in table C.
Now we need to compare A.ID with the next row in B. Well, 1 <> 3. So, we move on to the second part of the or. When C.SomeID = 1, the row is included. When C.SomeID = 2, the row is not included. Your results for A.ID = 1 are:
A.ID | B.SomeID | C.SomeID | C.SomeValue
1 | 1 | 1 | X
1 | 1 | 2 | X
1 | 3 | 1 | X
This is clearly not the results table that you are looking for. Since you are going to join A with either table B or C, instead of an or, you should use a union
SELECT A.Id FROM A, B
WHERE A.Id = B.SomeId
Union All
Select A.ID From A, C
Where A.Id = C.SomeId AND C.SomeValue = 'X'
Union all puts the results from the first query into the same results table as the results from the second query. Now, your question says that you only want the A.IDs that are in one table but not the other (or else). There are several ways to do this. In this case, I am going to use a having and a subquery. You could also use a not exists but I believe that having is going to use less resources.
Select T.ID
From
(SELECT A.Id FROM A, B
WHERE A.Id = B.SomeId
Union All
Select A.ID From A, C
Where A.Id = C.SomeId AND C.SomeValue = 'X') T
Group By T.ID
Having count(1) = 1
We only want the Ids that show up exactly one time. This will only work if the id is not repeated in B or C, so keep that in mind. Since the condition is based on the aggregate function, count, this stipulation must be in the having.
I'm not strong in MySQL, but I think this would work better:
SELECT Id FROM
( SELECT A.Id FROM A, B
WHERE A.Id = B.SomeId
UNION
SELECT A.Id FROM A, C
WHERE A.Id = C.SomeId AND C.SomeValue = 'X'
) X
INTO OUTFILE '/tmp/result.txt';
Related
We have a table which has two columns -- ID and Value. The ID is the index of table row, and the Value consists of Fixed String and Key (a number) in hexadecimal storing as string in the database. Take 00001810010 as an example, the fixed string is 0000181 and the seconds part is the key -- 0010.
Table
ID Value
0 00001810000
1 00001810010
2 00001810500
3 00001810900
4 0000181090a
What I want to get from the above table is the Number Interval between rows, for above table the result is
[1, 9], [11, 4FF], [501, 8FF], [901, 909]
I can read all the records into memory and handle them via C++, but is it possible to implement it through MySQL statements only? How?
I would be tempted to match up a row with the previous row with something like this:-
SELECT sub1.id AS this_row_id,
sub1.value AS this_row_value,
z.id AS prev_row_id,
z.value AS prev_row_value
FROM
(
SELECT a.id, a.value, MAX(b.id) AS bid
FROM some_table a
INNER JOIN some_table b
ON a.id > b.id
GROUP BY a.id, a.value
) sub1
INNER JOIN some_table z
ON z.id = sub1.bid
You might want to use LEFT OUTER JOINs rather than INNER JOINs depending on what you want for the first record (where there is no previous record to match on).
I've seen people recommending cross joining a table on itself by doing this:
SELECT *
FROM tbl AS A, tbl AS B
WHERE A.col1 = 1 AND B.col1 = 1
But here, the engine needs to iterate through all of the rows in tbl twice to match the two queries to the results of A and B, despite the fact that the queries (and therefore the results) are the same.
Assuming that the WHERE on A and B will always be the identical for the two, this is a waste. Is there any way to query for something once, and then cross join the result of that query on itself? I'd like to avoid temp tables, which would require disk writing instead of performing this entire thing in RAM.
I am using MySQL, although any SQL answer would help a lot.
EXAMPLE:
Suppose that tbl looks as follows:
COL1 COL2
1 A
1 B
1 C
2 D
2 E
When I run my where clause of col1 = 1, it returns the first three rows from the above table. What I want is the following table, but with only one execution of the where statement, since the two tables A and B are identical:
A.COL1 A.COL2 B.COL1 B.COL2
1 A 1 A
1 A 1 B
1 A 1 C
1 B 1 A
1 B 1 B
1 B 1 C
1 C 1 A
1 C 1 B
1 C 1 C
You are basically asking for an intentional Cartesian join
select
a.col1,
a.col2,
b.col1,
b.col2
from
tbl a
join tbl b
on a.col1 = b.col1
where
a.col1 = 1
order by
a.col2,
b.col2
To exactly hit your output order sequence, you need the order by by the "a" column 2 then "b" column 2
I really recommend avoiding that JOIN syntax... it can be very difficult to read.
Your explanation of what you are trying to do is a bit cryptic. The query as written offers no value for a JOIN operation. Generally speaking, when you want to JOIN a table to itself, it's on different columns:
select *
from tbl as a
inner join
table as b
on a.col1 = b.col2
where
a.col1 = 1;
This allows you to query against the table, and also collect related information organized in a hierarchical fashion in the same table. For example:
create table tbl (
person_id int,
parent_id int
);
In this case, a parent is a person too. If you wanted to get a list of the parents related to the person with an ID of 1, you could write:
select
person.person_id as OriginalPerson,
parent.person_id as Parent
from
tbl as person
inner join
tbl as parent
on parent.person_id = person.person_id
where
person.person_id = 1;
UPDATE Upon reading your further explanation, you want a cartesian product:
select a.*, b.*
from tbl as a
inner join tbl as b
on 1=1
where a.col1 = 1
and b.col1 = 1
Given the following 2 tables
table_a
id name num_one num_two
------------------------------------
1 Foo 5 10
2 Bar 4 -1
table_b
name table_a_id
--------------------
Fooa 1
Foob 1
Suppose I want to use ether num_one or num_two in a where clause depending on if another table has joined rows or not.
The best thing I can come up with is this:
SELECT a.* FROM table_a a
JOIN table_b b on b.table_a_id = a.id
WHERE if(count(b.*) > 0, a.num_one, a.num_two) > 0
group by a.id
Ideally it would check if 5 > 0 on the first row and -1 > 0 on the 2nd because the 2nd row as no joined rows from table B.
But it errors with invalid use of group by. I know about "having" but not sure how I could use it in this situation.
Any ideas? Thanks!!
This can be done with an IF statement, OR statement or a subquery. None of which will be very efficient on a large table.
The only real modification to your original statement was the use of NULL instead of count(*).
SELECT DISTINCT a.*
FROM table_a a
LEFT JOIN table_b b on b.table_a_id = a.id
WHERE (b.table_a_id is null AND a.num_one > 0)
OR (b.table_a_id is NOT NULL AND a.num_two > 0)
table a
_______________________________
id col1 col2 col3 ...........col20
1 ............................
2 ............................
3 ............................
table b
_______________________________
id colA colB colC colD ...... colZ
query
________________________________
select a.*, b.* from a left join b on b.id = a.col20 where a.id = 1;
In this query table a and b has same column name.
And I need both of them.
select a.id as a_id .. b.id as b_id .. from a left join b on b.id = a.col20 where a.id = 1;
How to avoid typing all column name?
As far as I know there is not an easy way to select * and exclude columns, and it requires putting the full column list, but I'm sure there are other possibilities.
One way of doing this which would make those a.*, b.* type queries work, but requires some initial setup, is to create a view for the table which aliases all of the columns.
the view of a would be a select query with all of the column names aliased.
create view aview as
select id as a_id,
col1 as a_col1,
col2 as a_col2,
...
...
from a
Then anywhere else you could do something like this:
select a.*, b.*
from aview a
left join bview b on b.b_id = a.a_col20
where a.a_id = 1
If the example were that simple and you really only had 2 tables, it would be sufficient to just make a view for one of them.
Hackish, maybe.. I'd probably look to permanently change the column names on the base tables.
Sorry about the title. I'm not sure how to properly describe the problem.
I have four tables, tables A, B, X and D. A and B have a many-to-many relationship so I'm using X as the link table.
Here's the structure:
Assuming all I have is an ID corresponding to a row in table A, I want to select the rows in table B that match up with that ID plus a count of all rows in table D that have the same b_id. Eh, I suck at explaining in words.
Here's what I would like (all I have to search with is an ID which corresponds to a row in table A -- let's just say I have an "A"):
-------------------------------------------------------------
| b.id | (A COUNT of how many rows in D have a b_id = b.id) |
-------------------------------------------------------------
| 1 | 20 |
-------------------------------------------------------------
| 4 | 12 |
-------------------------------------------------------------
So, according to the above results, this particular "A" has two "B"s. One of those "B"s has 20 "D"s and the other has 12 "D"s.
How can I write a single query to give me the results I'm after (again, all I'm searching with is an ID in table A)?
Try this:
SELECT b.id, COUNT(1)
FROM a,x, b,c
WHERE a.id = <YOUR_ID_FOR_A>
AND a.id = x.id
AND x.b_id = b.id
AND b.id = d,b_id
GROUP BY b.id
If the table x has b_id entries that must exist in table b then you can by pass one join and use the query below:
SELECT b.id, COUNT(1)
FROM a,x,c
WHERE a.id = <YOUR_ID_FOR_A>
AND a.id = x.id
AND x.b_id = d,b_id
GROUP BY b.id
EDIT: Corrected the typo, changed . to , as column separator.
try
SELECT A.id, B.id, COUNT(B.id) AS cnt
FROM A
INNER JOIN X ON A.id = X.a_id
INNER JOIN B ON X.b_id= B.id
INNER JOIN D ON B.id = D.b_id
GROUP BY B.id