How to load a huge data set into newly created table?

How to load a huge data set into newly created table? - mysql

I'm trying to FULLTEXT index into my table. That table content 3 million records.It was very difficult to insert that index using Alter table statement or Create index statement. Therefor easiest way to create new table and 1st add index and load the data. How can I load existing table data into newly created table? I'm using Xammp MySql database.

I don't know why creating a full text index on an existing table would be difficult. You just do:
create fulltext index idx_table_col on table(col)
Usually, it is faster to add indexes to already loaded tables than to load data into an empty table that has indexes pre-defined.
EDIT:
You can do the load by using insert. The following will insert the first 100,000 rows:
insert into newtable
select *
from oldtable
order by id
limit 0, 100000;
You can put this in a loop (via a stored procedure in MySQL or at the application level). Perhaps this will return faster. Each time you run it, you would change the offset value in limit.
I would expect that the overall time for creating an index would be less than using insert, but for your purposes, you might find this more convenient.

INSERT INTO newTable SELECT * FROM oldTable;
After your new table and index on it is created.
This is given you want to copy all columns. You can select specific columns as well.

Related

SQL Entry's Duped

I messed up when trying to create a test Database and accidently duplicated everything inside of a certain table. Basically there is now 2 of every entry there was once before. Is there a simple way to fix this? (Using InnoDB tables)

Yet another good reason to use auto incrementing primary keys. That way, the rows wouldn't be total duplicates.
Probably the fastest way is to copy the data into another table, truncate the first table, and re-insert it:
create temporary table tmp as
select distinct *
from test;
truncate table test;
insert into test
select *
from tmp;
As a little note: in almost all cases, I recommend using the complete column list on an insert statement. This is the one case where it is optional. After all, you are putting all the columns in another table and just putting them back a statement later.

mySQL: duplicating multiple records via temporary table, how to preserve autoincrement index?

I wish to duplicate a selection of records in a mySQL table.
The pk of the table is an autoincremented int.
I want to do this with one set of mysql queries (for performance reasons).
It seems like the fastest way to do this is to put the results of the selection into a temporary table,
make any changes needed, and reinsert the records back to the original table, like this:
CREATE TEMPORARY TABLE temp1234 ENGINE=MEMORY SELECT * FROM a_table WHERE column='my selection';
# do updates in temp1234; (altering FK's mainly)
INSERT INTO a_table SELECT * FROM temp1234;
But when I try to do this i get an error for duplicate PKs.
Now, I realise that I could alter the INSERT with SELECT query to exclude the pk/ID column, but as I am proceduraly generating these queries across multiple tables for a large data copying function, i want to avoid having to supply column names.
What is the best way around this problem?

Find items with similar keywords in InnoDB?

Right now I have the following construct to find items with similar keywords:
CREATE TEMPORARY TABLE tmp (FULLTEXT INDEX (keywords)) ENGINE=MyISAM
SELECT object_id, keywords FROM object_search_de;
SELECT object_id
FROM tmp
WHERE MATCH (keywords) AGAINST ('foo,bar') > 1.045;
DROP TEMPORARY TABLE tmp;
So, depending on the amount of overall records and the average size of the keyword field, this can get really slow (over 60 seconds execution time). My goal would be to be within 1 second for this task.
Alternatively to keywords comma separated in a TEXT field, I do also have an atomic keyord table (meaning two columns keyword and object_id, directly associating one keyword with an item).
Are there any alternatives or smooth solutions to achieving the same effect without resorting to a MyISAM mirror table?

First of all, do not create the table each time. You can create it once and use a trigger to insert/update/delete records or periodically (every hour for example) truncate and insert the records if you don't want to use triggers.
Alternatively, you can offload this task from MySQL and use Lucene/Solr or Sphinx.

Synchronize local database with external API

I have a table containing about 500 000 rows. Once a day, I will try to synchronize this table with an external API. Most of the times, there are few- or no changes made since last update. My question is basically how should I construct my MySQL query for best performance? I have thought about using insert ignore, but it doesn't feel like the best way to go since only a few rows will be inserted and MySQL must loop through all rows in the table. I have also thought about using LOAD_DATA_INFILE to insert all rows in a temporary table and then select the rows not already in my original table, and then remove the temporary table. Maybe someone else has a better suggestion?
Thank you in advance!

I usually use a temporary table and the LOAD DATA INFILE bulk loader. The bulk loader is much more efficient that trying to insert records using a dynamically created query.
If you index your permanent tables with appropriate unique keys that relate to the keys in the API then you should find the the INSERT and UPDATE statements work pretty fast. An example of the type of INSERT query I use is as follows:
INSERT INTO keywords(api_adgroup_id, api_keyword_id, keyword_text, match_type, status)
SELECT a.api_id, a.keyword_text, a.match_type, a.status
FROM tmp_keywords a LEFT JOIN keywords b ON a.api_adgroup_id = b.api_adgroup_id AND a.api_keyword_id = b.api_keyword_id
WHERE b.api_keyword_id IS NULL
In this example, I perform an OUTER JOIN on the keywords table to check if it already exists. Only new rows in the temporary table where there isn't a match in the main table (the api_keyword_id in the keywords table is NULL) are inserted.
Also note that in this example I need to use both the ad group id AND the keyword id to uniquely identify the keyword because the AdWords API gives the same keyword/match type combination the same id when it exists in more than one ad group.

Optimize mySql for faster alter table add column

I have a table that has 170,002,225 rows with about 35 columns and two indexes. I want to add a column. The alter table command took about 10 hours. Neither the processor seemed busy during that time nor were there excessive IO waits. This is on a 4 way high performance box with tons of memory.
Is this the best I can do? Is there something I can look at to optimize the add column in tuning of the db?

I faced a very similar situation in the past and i improve the performance of the operation in this way :
Create a new table (using the structure of the current table) with the new column(s) included.
execute a INSERT INTO new_table (column1,..columnN) SELECT (column1,..columnN) FROM current_table;
rename the current table
rename the new table using the name of the current table.

ALTER TABLE in MySQL is actually going to create a new table with new schema, then re-INSERT all the data and delete the old table. You might save some time by creating the new table, loading the data and then renaming the table.
From "High Performance MySQL book" (the percona guys):
The usual trick for loading MyISAM table efficiently is to disable keys, load the data and renalbe the keys:
mysql> ALTER TABLE test.load_data DISABLE KEYS;
-- load data
mysql> ALTER TABLE test.load_data ENABLE KEYS;

Well, I would recommend using latest Percona MySQL builds plus since there is the following note in MySQL manual
In other cases, MySQL creates a
temporary table, even if the data
wouldn't strictly need to be copied.
For MyISAM tables, you can speed up
the index re-creation operation (which
is the slowest part of the alteration
process) by setting the
myisam_sort_buffer_size system
variable to a high value.
You can do ALTER TABLE DISABLE KEYS first, then add column and then ALTER TABLE ENABLE KEYS. I don't see anything can be done here.
BTW, can't you go MongoDB? It doesn't rebuild anything when you add column.

Maybe you can remove the index before alter the table because what is take most of the time to build is the index?

Combining some of the comments on the other answers, this was the solution that worked for me (MySQL 5.6):
create table mytablenew like mytable;
alter table mytablenew add column col4a varchar(12) not null after col4;
alter table mytablenew drop index index1, drop index index2,...drop index indexN;
insert into mytablenew (col1,col2,...colN) select col1,col2,...colN from mytable;
alter table mytablenew add index index1 (col1), add index index2 (col2),...add index indexN (colN);
rename table mytable to mytableold, mytablenew to mytable
On a 75M row table, dropping the indexes before the insert caused the query to complete in 24 minutes rather than 43 minutes.
Other answers/comments have insert into mytablenew (col1) select (col1) from mytable, but this results in ERROR 1241 (21000): Operand should contain 1 column(s) if you have the parenthesis in the select query.
Other answers/comments have insert into mytablenew select * from mytable;, but this results in ERROR 1136 (21S01): Column count doesn't match value count at row 1 if you've already added a column.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to load a huge data set into newly created table? - mysql

INSERT INTO newTable SELECT * FROM oldTable; After your new table and index on it is created. This is given you want to copy all columns. You can select specific columns as well.

Related

SQL Entry's Duped

mySQL: duplicating multiple records via temporary table, how to preserve autoincrement index?

Find items with similar keywords in InnoDB?

Synchronize local database with external API

Optimize mySql for faster alter table add column

Categories

Resources