I have two tables containing 6M rows each. I'm trying to join the two using an inner join but the query ran for 2 days without finishing. The join is (note I've used count(*) just to enable me to run an explain, I'm actually using the join in a CTAS):
SELECT count(*)
FROM table1 t1,
table2 t2
WHERE t1.col1 = t2.colA
AND t1.col2 = t2.colB;
After a bit of investigation I've found the below query runs fine:
SELECT count(*)
FROM
(SELECT *
FROM table1) t1,
(SELECT *
FROM table2) t2
WHERE t1.col1 = t2.colA
AND t1.col2 = t2.colB;
The only difference between that instead of the table, I use the sub-query SELECT * FROM table;
Running the explain plans shows that the latter query is building up an index when it selects table2. Whereas the first query is using a join buffer (Block Nested Loop).
Surely MySQL is clever enough to work out that the two queries are practically identical and do the same with both queries? I don't see why an index should be need because a full scan is required for both tables anyway. These are temporary/transitory tables so if I did put an index on, it would literally be just to perform this join.
Is there a way to fix this via MySQL configuration?
You NEED the index on at least ONE of the tables, even such as
create index Temp1 on Table2 ( colA, colB )
So, your query from Table 1 joined to table 2, so even if a table scan is on all of table 1, you need it to quickly find the record(s) that match in table 2. If NEITHER has an index, then think of it this way. For every record in Table1, scan through ALL records in Table 2 and grab all records that match for ColA, ColB. Now, go back to table 1 for the SECOND record... go back through table 2 for ALL records until it finds a match.
Being that you have 6M records, you could practically choke a cow (so-to-speak) on performance. By having an index, even on the SECOND table, when the query is on the first record, it can immediately jump to the rows that match ColA, ColB and as soon as those A/B records are done, it goes back to the first table.
Now, for other overhead efficiencies. If you have BOTH tables indexed on respective Col1, Col2 and ColA, ColB, then the engine will have in its memory / cache a whole block of records for each common area and doesn't have to keep going back to the raw data pages for other elements repeatedly.
So, even though you think it might not be practical, it is still good to handle large table queries. Also, if you have multiple records in the first table with the same values for Col1, Col2, but have different other values for other columns in the table, and similarly in the second table for multiple ColA, ColB, you would get a Cartesian result. Consider the following scenario
Table1
Col1 Col2 OtherColumn
X Y blah1
X Y blah2
X Y blah3
Table2
ColA ColB OtherColumn
X Y second blah1
X Y second blah2
X Y second blah3
A simple query like you have
SELECT count(*)
FROM table1 t1,
table2 t2
WHERE t1.col1 = t2.colA
AND t1.col2 = t2.colB;
would result in a count of 9. You have 6M records and a possible Cartesian result? Hopefully this clarifies some problems you may be encountering.
Related
I have about 20 tables. These tables have only id (primary key) and description (varchar). The data is a lot reaching about 400 rows for one table.
Right now I have to get data of at least 15 tables at a time.
Right now I am calling them one by one. Which means that in one session I am giving 15 calls. This is making my process slow.
Can any one suggest any better way to get the results from the database?
I am using MySQL database and using Java Springs on server side. Will making view for all combined help me ?
The application is becoming slow because of this issue and I need a solution that will make my process faster.
It sounds like your schema isn't so great. 20 tables of id/varchar sounds like a broken EAV, which is generally considered broken to begin with. Just the same, I think a UNION query will help out. This would be the "View" to create in the database so you can just SELECT * FROM thisviewyoumade and let it worry about the hitting all the tables.
A UNION query works by having multiple SELECT stataements "Stacked" on top of one another. It's important that each SELECT statement has the same number, ordinal, and types of fields so when it stacks the results, everything matches up.
In your case, it makes sense to manufacturer an extra field so you know which table it came from. Something like the following:
SELECT 'table1' as tablename, id, col2 FROM table1
UNION ALL
SELECT 'table2', id, col2 FROM table2
UNION ALL
SELECT 'table3', id, col2 FROM table3
... and on and on
The names or aliases of the fields in the first SELECT statement are the field names that are used in the result set that is returned, so no worries about doing a bunch AS blahblahblah in subsequent SELECT statements.
The real question is whether this union query will perform faster than 15 individual calls on such a tiny tiny tiny amount of data. I think the better option would be to change your schema so this stuff is already stored in one table just like this UNION query outputs. Then you would need a single select statement against a single table. And 400x20=8000 is still a dinky little table to query.
To get a row of all descriptions into app code in a single roundtrip send a query kind of
select t1.description, ... t15.description
from t -- this should contain all needed ids
join table1 t1 on t1.id = t.t1id
...
join table1 t15 on t15.id = t.t15id
I cannot get you what you really need but here merging all those table values into single table
CREATE TABLE table_name AS (
SELECT *
FROM table1 t1
LEFT JOIN table2 t2 ON t1.ID=t2.ID AND
...
LEFT JOIN tableN tN ON tN-1.ID=tN.ID
)
I have two tables that almost have identical columns. The first table contains the "current" state of a particular record and the second table contains all the previous stats of that records (it's a history table). The second table has a FK to the first table.
I'd like to query both tables so I get the entire records history, including its current state in one result. I don't think a JOIN is what I'm trying to do as that "joins" multiple tables "horizontally" (one or more columns of one table combined with one or more columns of another table to produce a result that includes columns from both tables). Rather, I'm trying to "join"(???) the tables "vertically" (meaning, no columns are getting added to the result, just that the results from both tables are falling under the same columns in the result set).
Not exactly sure if what I'm expressing make sense -- or if it's possible in MySQL.
To accomplish this, you could use a UNION between two SELECT statements. I would also suggest selecting from a derived table in the following manner so that you can sort by columns in your result set. Suppose we wanted to combine results from the following two queries:
SELECT FieldA, FieldB FROM table1;
SELECT FieldX, FieldY FROM table2;
We could join these with a UNION statement as follows:
SELECT Field1, Field2 FROM (
SELECT FieldA AS `Field1`, FieldB AS `Field2` FROM table1
UNION SELECT FieldX AS `Field1`, FieldY AS `Field2` FROM table2)
AS `derived_table`
ORDER BY Field1 ASC, Field2 DESC
In this example, I have selected from table1 and table2 fields which are similar, but not identically named, sharing the same data type. They are matched up using aliases (e.g., FieldA in table1 and FieldX in table2 both map to Field1 in the result set, etc.).
If each table has the same column names, field aliasing is not required, and the query becomes simpler.
Note: In MySQL it is necessary to name derived tables, even if the name given is not intended to be used.
UNION.
Select colA, colB From TblA
UNION
Select colA, colB From TblB
Your after a left join on the first table. That will make the right side I'd he their a number (exists in both) or null (exists only in the left table )
You want
select lhs.* , rhs.id from lhs left join rhs using(Id)
I have a MySQL JOIN consisting 4 tables:
Direct chaining
SELECT col1, col2, col3... col12 FROM
(((tbl1 LEFT JOIN tbl2...) LEFT JOIN tbl3 ...) LEFT JOIN tbl4);
Sub-SELECT
(SELECT col10 .. col12 FROM
(SELECT col7 .. col9 FROM
(SELECT col1, ... col6 FROM tbl1
LEFT JOIN tbl2) AS J1
LEFT JOIN tbl3) AS J2
LEFT JOIN tbl4...)
Is there an efficiency difference between the two methods? My gut feeling is that sub-selects discard unnecessary rows and columns with the SELECT ... WHERE clause and makes JOINs faster and less memory intensive. Any advice? How about other databases?
It will depend on your tables size and filtered data by queries.
Condition1:
If your table size is normal (suppose all tables have approx 5000 rows) and you are fetching data from tables with out any filteration then there should not be any difference in both queries even first query can give better performance.
Condition2:
If your tables have bulky data suppose rows in billions but after filtering data actual data set has suppose near about approx. below 100 rows then 2nd query can be better.
There is no hard and fast rule, you have to check your query performance various manner as per your table data size and your requirement. The thumb rule is if we can reduce data size for joins with different tables then it will increase performance.
it will depend on the size of the table normally your first query will be more faster then the second because the evaluation period be less when compared with second.
I have two identical tables. I want to compare these two tables and getting the result from them. The condition are:
each record in TABLE1 grouped by TID will be compared to all records in TABLE2 grouped by their each TID.
if each grouped record in TABLE1 are to be discovered in TABLE2 (records in TABLE2 that grouped by each tid, too), as many as N (N is the user input variable), then that record will be inserted into new table.
For example, like the ss below, ITEM C-F-A grouped by TID 2 has 3 occurrences in table2, thus they will be inserted into new table:
I've already tried writing the code for this and it worked (vb.net), but the compiler takes ridiculous time to complete. The main cause is I'm processing a huge database.
The method I've done in program is populate the two table into 2d array. assigning value to array while comparing the two element with if clause.
Below is the 2d array that I've created:
But this method is really expensive, my real database on pic above is 1st 2d array has 2k records and 2nd 2d array has 800 records, and when I try to calculate the estimate time for compiling to completed, it showed a fantastic number, about 16 hours.. gosh!!
So I was wondering, whether this problem can be solved with mysql query,
or other method that is more effective than what I have done?
INSERT INTO tbl3
SELECT tbl1.TID, tbl1.ITEM
FROM tbl1
JOIN tbl2 ON tbl2.TID = tbl1.TID AND tbl2.ITEM = tbl1.ITEM
This will insert a record into tbl3 for each record in tbl1 that has a corresponding record in tbl2 identified by TID and ITEM.
This assumes that TID/ITEM is a unique index in both tbl1 and tbl2.
Ok, here's a wild, untested, guess (WUG).
The approach goes like this:
You need a list of TID's from table1. So you build a distinct list (inner most query).
You use that list in a where clause when selecting from table2, so that you only get rows that have TIDs in table1. You group that query, and use HAVING to then limit the rows to only those with a count > X.
Now you have a list of TIDs that match those in table1 and have more than X entries in table2. You select those rows.
Those are used a the source of an insert statement into table1.
The SQL might looks something like:
insert into table1
values (select * from table2 where tid in
(select tid, count(*) as cnt
from table2
where tid in (select distinct tid from table1)
group by tid
having cnt > 10)));
I doubt the syntax is correct (cant remember the exact syntax for an insert from a select), and make no claim it will work off the bat, but its what my first shot would be if I wanted to do it all in one query.
I'm trying to find the (set) intersection between two columns in the same table in MySQL. I basically want to find the rows that have either a col1 element that is in the table's col2, or a col2 element that is in the table's col1.
Initially I tried:
SELECT * FROM table WHERE col1 IN (SELECT col2 FROM table)
which was syntactically valid, however the run-time is far too high. The number of rows in the table is ~300,000 and the two columns in question are not indexed. I assume the run time is either n^2 or n^3 depending on whether MySQL executes the subquery again for each element of the table or if it stores the result of the subquery temporarily.
Next I thought of taking the union of the two columns and removing distinct elements, because if an element shows up more than once in this union then it must have been present in both columns (assuming both columns contain only distinct elements).
Is there a more elegant (i.e. faster) way to find the set intersection between two columns of the same table?
SELECT t1.*
FROM table t1
INNER JOIN table t2
ON t1.col1 = t2.col2
Creating indexes on col1 and col2 would go a long way to help this query as well.
If you only want the values, try the INTERSECT command:
(SELECT col1 FROM table) INTERSECT (SELECT col2 FROM table)