insert 100 millions records to mysql database - mysql

I'm starting to learn mysql and i need a database with 100 millions records, i'm trying to use basic loop but that just takes too long. Can anyone show me how that could be done? each records can just be a number or a bit, but it has to be 100 millions

1. DROP TABLE IF EXISTS my_table;
2. CREATE TABLE my_table(id SERIAL PRIMARY KEY);
3. INSERT INTO my_table VALUES(NULL);
4. INSERT INTO my_table SELECT NULL FROM my_table;
5. Repeat line 4 26 times.
Failing that, see http://datacharmer.blogspot.com/2006/06/filling-test-tables-quickly.html

Related

remove Duplicated data from very huge table

I have a table contains more than 500 millions records in MySQL database ,
i need to remove duplicated from it ,
i tried this query on table contain 20 millions , it was ok but for the 500 millions it take very long time :
-- Create temporary table
CREATE TABLE temp_table LIKE names_tbles;
-- Add constraint
ALTER TABLE temp_table ADD UNIQUE(name , family);
-- Copy data
INSERT IGNORE INTO temp_table SELECT * FROM names_tbles;
is there better solution ?
One option is aggregation rather than insert ignore. That way, there is no need for the database to manage rejected records:
insert into temp_table(id, name, family)
select min(id), name, family
from names_tbles
group by id, family;
I would take one step further and suggest adding the unique constraints only after the table is populated, so there is no need for the database to check for duplicates (the query guarantees that already), which should speed up the insert statement.

Can resetting initial/Start Auto-Increment value affect speed of queries?

I have a 6 million record table with an auto-increment ID PK. Due to various operations over the last several weeks, my starting ID is 2 million. Updates and other queries take a long time, and I'm wondering if having an iD range from 2mil to 8mil vs starting at 1 to 6 million could be responsible? I've noticed anecdotally that if I do selects/updates using a range of say ID>1000000 and ID<1001000 seems to be slower than ID>1 and ID<1000.
Is it worth it to remove the existing PK and add a new one starting at 1? I know I can do
ALTER TABLE tablename Auto-Increment=1
but I cannot do this here with 6 million existing records and Auto-Increment IDs already.
Clearly I can do try and test but for various reasons including the time it is going to take given the size of table, indexes, etc I'd prefer to ask before going to the time and effort if anyone knows the answer definitively.
Update:
For now I did the following:
CREATE TABLE table_new LIKE table;
To dupe the table with indexes and all
Then:
Alter Table table_new set Auto-Increment=1
So the empty duped table re-sets count to 1
Then I inserted from the original table to the duped table:
Insert into Table_New (FieldA,FieldB,FieldC)
Select FieldA,FIeldB,FieldC from Table
To insert all the records minus the ID field so that the Auto-Increment is added per inserted record, starting at 1 as the re-set specified and finally of course:
RENAME TABLE table TO table_old;
RENAME TABLE table_new TO table;

After copying with INSERT...SELECT, I have more records than before

I have a strange occurance: I copied ~4.7m records from one table to another in MySQL 5.6.14, using INSERT INTO tabl1 (col1,...) SELECT (col2...) FROM tbl2... and I have more records than before. 640 to be exact.
I checked by doing a select count(*) on both tables, subtracting the new table from the old table (which gave me the -640).
Any ideas? I'd like to know where the extra 640 records came from.
Both are InnoDB; the old table is latin charset, the new is utf8. Doubt that's part of the equation, but maybe someone with much more exp with MySQL would know.
SQL statement example:
INSERT INTO `table1` (`col1`,`col2`,`col3`) SELECT (`colA`,`colB`,`colC`) FROM `table2`;
The table receiving records is new, and has 0 records in it, and never had records in it. Also, it's not a production environment, so nothing should be adding records to it except this 1 statement.
try this:
First remove the table if it exists already in the database. This way you know for sure that tabl1 won't have any extra data.
DROP TABLE IF EXISTS tabl1;
Recreate the table to be copied which will be tabl2 using a create table statement to copy everything from tabl2 into tabl1 as follows:
CREATE TABLE tabl1 SELECT * FROM `tabl2

MySQL table 30 million records insert into another

I have a 30 million record mysql table.
Has about 20 columns of which I will use 15 to insert into another table.
Now I can't use PHP to load this large dataset (selecting 30 million rows and loading into memory isn't feasible), what would be the best method of loading all these records? MySQL 5.X
I'm using EMS to connect to the database.
What about doing an INSERT INTO MySmallerTable SELECT Col1, col2, col3... FROM MyBiggerTable
It might be worth breaking it into multiple INSERT
Like:
INSERT INTO ... SELECT ... WHERE ID between 1 and 100000;
INSERT INTO ... SELECT ... WHERE ID between 100001 and 200000;
etc.
you can do this
INSERT INTO new_table (`col1`,`col2`,`col3`) SELECT `oldcol1`,`oldcol2`,`oldcol3`
FROM old_table LIMIT 0,100000
and repeat it by php loop (with changing limit start value)
There are few ways you can including one user M_M provided above. I have not used EMS and not sure what it can and can't do. But I have extensively used Workbench.
A.
Create the new destination table
Create view on the source table with the columns of interest
LInsert into destination from source with simple INSERT INTO SOURCE_TABLE SELECT * FROM DESTINATION_TABLE
B.
Use mysqldump
Upload into new table
Alter table by dropping the columns you don't need

Remove duplicates from TWO columns

Good Morning stackoverflownians,
I have a very big table with duplicates on two columns. Means that if numbers on row a are duplicated in col1 and col2 on row b, I should keep only row a :
## table_1
col1 col2
1 10
1 10
1 10
1 11
1 11
1 12
2 20
2 20
2 21
2 21
# should return this tbl without duplication
col1 col2
1 10
1 11
1 12
2 20
2 21
My previous code account only for col1, and I don't know how to query this on two coluns :
CREATE TABLE temp LIKE db.table_1;
INSERT INTO temp SELECT * FROM table_1 WHERE 1 GROUP BY col1;
DROP TABLE table_1;
ALTER TABLE temp RENAME table_1;
So I thought about that :
CREATE TABLE temp LIKE db.table_1;
INSERT INTO temp(col1,col2)
SELECT DISTINCT col1,col2 FROM table_1;
then drop and rename..
But I'm not sure it's gonna work and MySQL tend to be unstable, if it takes too long I will have to stop the query and that my crash the server again .. T.T
We have 200,000,000 rows and all of them have at least one duplicate..
Any Suggestion of code ? :)
Also .. How long would it take ? minutes or hours ?
you already know quite a ways :)
you can try this also
Use INSERT IGNORE rather than INSERT. If a record doesn't duplicate an existing record, MySQL inserts it as usual. If the record is a duplicate, the IGNORE keyword tells MySQL to discard it silently without generating an error.
Read from existing table and then write on a new table using INSERT IGNORE. This way you can control insert process depending on your resource usage.
When using INSERT IGNORE and you do have key violations, MySQL does NOT raise a warning!!!
the distinct clause is the way to go, but it will take a while to run on that many records. I'd add an ID column that is autoincrment, and is your pk. Then you can run the deduplicate in stages that won't time out.
Good luck and HTH
-- Joe