The execution of the following MySQL query often takes 2-3 minutes. The objective is to select records from a table, where 2 of it's columns' values are also contained in an other, previously created temporary table. This table has only one column. This temporary table is created instead of 2 subqueries, because 4 tables are needed to be joined in order to get the values.
The temporary table holds around 40 000 records in general, the values are of type varchar(32) COLLATE 'utf8mb4_bin', the table1 table has 45 000 records.
table1
a | varchar(32)
b | varchar(32)
temp
name | varchar(32)
CREATE TEMPORARY TABLE IF NOT EXISTS temp AS SELECT name FROM names ...;
SELECT a, b
FROM table1
WHERE a IN (SELECT name FROM temp)
AND b IN (SELECT name FROM temp);
a and b columns of table1 are indexed.
How to improve the execution speed? Is there a more efficient way of doing this?
Add an index to the temp table:
ALTER TABLE temp ADD INDEX (name);
Also use JOIN rather than IN. MySQL generally optimizes this better.
SELECT DISTINCT a, b
FROM table1 AS t1
JOIN temp AS t2 ON t1.a = t2.name
JOIN temp AS t3 ON t1.b = t3.name
Related
I have three tables with contents, now i want to get them and add it into new table but am having this sql error "Column count doesn't match value count at row 1"
here is the sql query.
insert into compare_year(yeara,yearb,yearc,data)
SELECT yeara
FROM table_1
UNION ALL
SELECT yearb, data
FROM table_2
UNION ALL
SELECT yearc
FROM table_3
below is how i created the tables
create table table_1(id int primary key auto_increment,yeara varchar(100));
create table table_2(id int primary key auto_increment,yearb varchar(100),data varchar(100));
create table table_3(id int primary key auto_increment,yearc varchar(100));
my new table is now
create table compare_year(id int primary key auto_increment,yeara varchar(100),yearb varchar(100),yearc varchar(100),data varchar(100))
please can someone help me. thanks
Note:when you union select queries,the number of columns should be equal.
and also you cannot insert mutiple select columns into a single row of another.
My solution will be like:
if three table contain same id,then you can do like this
insert into compare_year(yeara,yearb,yearc,data)
SELECT T1.yeara,T2.yearb,T3.yearc,T2.data
FROM table_1 T1
left Join table_2 T2 on T2.Id = T1.Id
left Join table_3 T3 on T3.Id = T2.Id
It looks like what you want is a JOIN rather than a UNION. When you union two select statements, they must have the same number of fields in the SELECT. For example,
insert into compare_year(yeara)
SELECT yeara
FROM table_1
UNION ALL
SELECT yearb AS yeara
FROM table_2
UNION ALL
SELECT yearc AS yeara
FROM table_3
would be acceptable syntactically. If you want to join the tables,
INSERT INTO compare_year(yeara, yearb, yearc, data)
SELECT table_1.yeara, table_2.yearb, table_3.yearc, table_2.data
FROM table_1, table_2, table_3
but note that this is full cartesian product of the tables. It's likely you want some conditionals as well in a WHERE clause. It's also worth noting that the order of the select cause is what's important for the INSERT, not the field names.
I'm trying to compare two tables in different databases (or looking for a best way to do this).
Table in database one:
id int(11)
lastmod int(11)
Table in database two:
id int(11)
timestamp int(11)
Both tables have matching ids (id is not unique in db1. Like one(db2) to many (db1)) and time stamps (but other columns differ). But with time, records in database two will be updated (data in one unimportant column). And now I need to find records (timestamps), comparing ids, to find which records I need to update in database one.
Problem is also with performance, because both tables have more than 5 000 000 records.
What is best way (most optimal) to find records which need to be updated?
Assuming that id is a primary key in both tables, then the following should be efficient:
select *
from db1.table t1 join
db2.table t2
on t1.id = t2.id and
t1.lastmod <> t2.timestamp
Note that this assumes two things. First, the id is unique in each table and second that the timestamp column is not NULL.
EDIT:
If the situation is that you have multiple modifications in t1 and are trying to compare the results to t2, which has only one row, then aggregate t1 first to get the most recent modification date and proceed from there:
select *
from (select t1.id, max(t1.lastmod) as lastmod
from db1.table t1
group by t1.id
) t1 join
db2.table t2
on t1.id = t2.id and
t1.lastmod <> t2.timestamp
If you are really looking for a record with more than one modification in t1, then add a having count(*) > 1 to the subquery.
I see In operator alternative in mysql
I have nearly 25,000 ids.I am using in operator on that.Then i am getting Stackoverflow Exception.Is there any other alternative for IN operator in mysql.
Thanks in advance..
If the ID's are in another table:
SELECT * FROM table1 WHERE id IN (SELECT id FROM table2);
then you can use a join instead:
SELECT table1.* FROM table1 INNER JOIN table2 ON table1.id = table2.id;
You could do the following:
1 - Create a MySQL Temporary Table
CREATE TEMPORARY TABLE tempIdTable (id int unsigned not null primary key);
2 - Insert All Your ids into the Temporary Table
For every id in your list:
insert ignore into myId (id) values (anId);
(this will have the added bonus of de-duplicating your list of ids ready for the final step)
3 - Join Against the Temporary Table
SELECT t1.* FROM myTable1 t1 INNER JOIN tempIdTable tt ON t1.id = tt.id;
The temporary table will disappear as soon as your connection is dropped so your don't have to worry about dropping it before you create it next time.
We are working on large volume data (row counts given below) :
Table 1 : 708408568 rows -- 708 million
Table 2 : 1416817136 rows -- 1.4 billion
Table 1 Schema:
----------------
ID - Int PK
column2 - Int
Table 2 Schema
----------------
Table1ID - Int FK
SomeColumn - Int
SomeColumn - Int
Table1 has PK1 which servers as FK for Table 2.
Index details :
Table1 :
PK Clustered Index on Id
Non Clustered (Non Unique) on column2
Table 2 :
Table1ID (FK) Clustered Index
Below is the query which needs to be executed :
SELECT t1.[id]
,t1.[column2]
FROM Table1 t1
inner join Table2 t2
on s.id = cs.id
WHERE t1.[column2] in (select [id] from ConvertCsvToTable('1,2,3,4,5.......10000')) -- 10,000 Comma seperated Ids
So to summarize, The inner join on ID should be handled by the clustered index on the same Ids on both PK and FK.
and as for the "huge" Where condition on column2 we have a nonclustered index.
However, the query is taking 4 minutes for a small subset of 100 Ids, we need to pass 10,000 ids.
Is there a better way design wise that we can do this, or possibly does Table Partitioning help?
Just wanted to get some ways of how to solve huge volume Select with Inner Join and Where IN.
Note : ConvertCsvToTable is a Split function which has already been determined to perform optimally.
Thanks !
This is what I would try:
Create a temp table with the structure of the return from the function. Make sure to set the column ID as primary key so that the optimizer takes it into consideration...
CREATE TABLE #temp
(id int not null
...
,PRIMARY KEY (id) )
then call the function
insert into #temp exec ConvertCsvToTable('1,2,3,4,5.......10000')
then use the temp table directly joined in the query
SELECT t1.[id], t1.[column2]
FROM Table1 t1, t2, #temp
where t1.id = t2.id
and t1.[column2] = #temp.id
Bring the condition into the join
It gives the optimizer a chance to first filter by t1.[column2] first
Try different hash hints
SELECT t1.[id], t1.[column2]
FROM Table1 t1 with (nolock)
inner join Table2 t2 with (nolock)
on s.id = cs.id
and t1.[column2] in (select [id] from ConvertCsvToTable('1,2,3,4,5.......10000'))
You may need to tell it to use that index on Column2.
But give it a chance to do the right thing.
In the where you were not giving it a chance to do the right thing.
If you go with #temp then try
(and declare a PK on the temp as Rodolfo stated +1)
This will pretty much force it to start with small table
It could still get stupid do the join on T2 first but I doubt it.
SELECT t1.[id], t1.[column2]
FROM #temp
JOIN Table1 t1 with (nolock)
on t1.[column2] = #temp.ID
join Table2 t2 with (nolock)
on t2.ID = t1.ID
I don't know why I am confused with this query.
I have two table: Table A with 900 records and Table B with 800 records. Both table need to contain the same data but there is some mismatch.
I need to write a mysql query to insert missing 100 records from Table A to Table B.
In the end, both Table A and Table B should be identical.
I do not want to truncate all the entries first and then do a insert from another table. So please any help is appreciated.
Thank you.
It is also possible to use LEFT OUTER JOIN for that. This will avoid subquery overhead (when system might execute subquery one time for each record of outer query) like in John Woo's answer, and will avoid doing unnecessary work overwriting already existing 800 records like in user2340435's one:
INSERT INTO b
SELECT a.* FROM a
LEFT OUTER JOIN b ON b.id = a.id
WHERE b.id IS NULL;
This will first select all rows from A and B tables including all columns from both tables, but for rows which exist in A and don't exist in B all columns for B table will be NULL.
Then it filter only such latter rows (WHERE b.id IS NULL),
and at last it inserts all these rows into B table.
I think you can use IN for this. (this is a simpliplification of your query)
INSERT INTO table2 (id, name)
SELECT id, name
FROM table1
WHERE (id,name) NOT IN
(SELECT id, name
FROM table2);
SQLFiddle Demo
AS you can see on the demonstration, table2 has only 1 records but after executing the query, 2 records were inserted on table2.
If it's mysql and the tables are identical, then this should work:
REPLACE INTO table1 SELECT * FROM table2;
This will insert the missing records into Table1
INSERT INTO Table2
(Col1, Col2....)
(
SELECT Col1, Col2,... FROM Table1
EXCEPT
SELECT Col1, Col2,... FROM Table2
)
You can then run an update query to match the records that differ.
UPDATE Table2
SET
Col1= T1.Col1,
Col2= T1.Col2,
FROM
Table T1
INNER JOIN
Table2 T2
ON
T1.Col1 = T2.Col1
Code also works when a group by and having clauses are used. Tested SQL 2012 (11.0.5058) Tab1 is source with new records, Tab 2 is the destination to be updated. Tab 2 also has an Identity column. (Yes folks, real world is not as neat and clean as the lab assignments)
INSERT INTO Tab2
SELECT a.T1,a.T2,a.T3,a.T4,a.Val1,a.Val2,a.Val3,a.Val4,-9,-9,-9,-9,MIN(hits) MinHit,MAX(hits) MaxHit,SUM(count) SumCnt, count(distinct(week)) WkCnt
FROM Tab1 a
LEFT OUTER JOIN Tab2 b ON b.t1 = a.t1 and b.t2 = a.t2 and b.t3 = a.t3 and b.t4 = a.t4 and b.val1 = a.val1 and b.val2 = a.val2 and b.val3 = a.val3 and b.val4 = a.val4
WHERE b.t1 IS NULL or b.Val1 is NULL
group by a.T1,a.T2,a.T3,a.T4,a.Val1,a.Val2,a.Val3,a.Val4 having MAX(returns)<4 and COUNT(distinct(week))>2 ;