Best Method For Quick Uploading of MYSQL Data with Normalization? - mysql

Here is my situation. I got this imaginary raw csv data set like this and would contain about 1M lines on average. I would get this data set often every 2 weeks.
PROJECT, MD5SUM_VALUE, USAGE_NAME
A,132412341324asdf,Apple
B,13404892340asdf9,Banana
...
I got mysql tables for
PROJECT_TABLE (id, value),
MD5SUM_VALUE (id, value)
USAGE_NAME (id, name)
RECORD_TABLE (id, project_id, MD5SUM_id, USAGE_id)
I have been using scripts to quickly file load the values of PROJECT (using INSERT IGNORE) into PROJECT_TABLE and a similar strategy to do this for MD5SUM_VALUE and USAGE_NAME.
As it stands today, I got about
17,115,235 unique row entries for USAGE_NAME table
3,001,675 unique row entries for MD5SUM_VALUE table
200 unique row entries for PROJECT table
59m+ rows for RECORD_TABLE table
My uploading of the RECORD_TABLE seems slow. I need to do a query to identify the ids (project_id, MD5SUM_id, USAGE_id) and do this 1 million times+ per raw csv file.
Is there a better way to upload the data? Seems like no way to upload this data part quickly? Should I structure differently?

I would create a temporary table and use that to generate/lookup ids and then insert into your record_table. Assuming you have autoincrement ids on project_table, md5sum_table, and usage_name, and unique keys on the non-id columns:
create temporary table record_table_load (
project varchar(255),
project_id int,
md5sum_value varchar(32),
md5sum_id int,
usage_name varchar(255),
usage_id int
);
load data local infile 'foo.csv'
into table record_table_load
fields terminated by ',' enclosed by '"'
lines terminated by '\n'
ignore 1 lines
(project, md5sum_value, usage_name);
insert ignore into project_table (value) select distinct project from record_table_load;
insert ignore into md5sum_table (value) select distinct md5sum_value from record_table_load;
insert ignore into usage_name (name) select distinct usage_name from record_table_load;
update record_table_load join project_table on project_table.value=record_table_load.project set record_table_load.project_id=project_table.id;
update record_table_load join md5sum_table on md5sum_table.value=record_table_load.md5sum_value set record_table_load.md5sum_id=md5sum_table.id;
update record_table_load join usage_name on usage_name.name=record_table_load.usage_name set record_table_load.usage_id=usage_name.id;
insert into record_table (project_id, md5sum_id, usage_id) select project_id, md5sum_id, usage_id from record_table_load;
drop temporary table record_table_load;
If you want to avoid using insert ignore or you don't have unique constraints on those values, do the lookup, then insert of any values not found, then another lookup.

Related

Finding ID and inserting it into another table

I have a table with two columns. ID and WORD. I've used the following query to insert several files into this table
LOAD DATA LOCAL INFILE 'c:/xad' IGNORE INTO TABLE words LINES TERMINATED BY '\n' (#col1) set word=#col1;
Now I'd like to find specific values and insert them into another table. I know based on this question that I can do the following
insert into tab2 (id_customers, value)
values ((select id from tab1 where customers='john'), 'alfa');
But I'd like to do this based on the files. For example:
Loop through each line of file xad and pass it's value to a query like the following
insert into othertable (word_id)
values ((select id from firsttable where word='VALUE FROM CURRENT LINE OF FILE'));
I can write a Java app to do this line by line but I figured it'd be faster to make MySQL do the work if possible. Is there a way to make MySQL loop over each line, find the ID, and insert it into othertable?
Plan A: A TRIGGER could be used to conditionally copy the id to another table when encountered in whatever loading process is used (LOAD DATA / INSERT .. SELECT .. / etc).
Plan B: Simply load the table, then copy over the ids that you desire.
Notes:
The syntax for this
insert into tab2 (id_customers, value)
values ((select id from tab1 where customers='john'), 'alfa');
is more like
insert into tab2 (id_customers, value)
SELECT id, 'alpha'
FROM tab1
WHERE customers = 'john'

Update table from file with mysql [duplicate]

I have a table in a database, and I'd like to update a column which I have offline on a local file. The file itself has two columns
an ID which corresponds to an ID column in the table, and
the actual value.
I've been able to create new rows using
LOAD DATA INFILE 'file.txt' INTO TABLE table
FIELDS TERMINATED BY ','
But I'm not sure how I can specifically insert values in such a way that the ID column in the file is joined to the ID column in the table. Can someone help with the SQL syntax?
I suggest you load your data into a temporary table, then use an INSERT ... SELECT ... ON DUPLICATE KEY UPDATE; for example:
CREATE TEMPORARY TABLE temptable (
id INT UNSIGNED NOT NULL,
val INT,
PRIMARY KEY (id)
) ENGINE = MEMORY;
LOAD DATA LOCAL INFILE '/path/to/file.txt' INTO temptable FIELDS TERMINATED BY ',';
INSERT INTO my_table
SELECT id, val FROM temptable
ON DUPLICATE KEY UPDATE val = VALUES(val);
DROP TEMPORARY TABLE temptable;
Another way could be ...
Since you already know the table name as well have the ID and actual value ... what you can do is ... directly write the update statements in a file, like
update mytable set value_col = value where ID_col = ID;
Second Update Statement
Third Update statement
.......
Save the file as *.sql like, updatescript.sql and then execute that script directly like
mysql -h <hostname> -u root -p <your_db_name> < "E:/scripts/sql/updatescript.sql"
It depends of the no of rows ,
If it is in hundreds make a script of update column and run it , but if it is in large volume import that file in to a new table and update your table with a join , and then drop the table

SQL Insert from table to table prevented by duplicate primary key from source table

I am trying to populate a products table on MySQL with latest products, which are retrieved and stored in products_temp table.
So the method for this is straight forward, simply doing an INSERT to products from products_temp, as such:
INSERT INTO products ( select products_temp.* FROM products_temp )
Problem is, it results in a duplicate primary key error, because of the id from products_temp clashing with the id in products.
Can someone tell me how to fix this please?
I tried declaring the fields in the select statement without the id, but that results in "Column count doesn't match value count at row 1"
Any help would be appreciated.
Thanks!
You'll need to declare the columns except the ID on both the INSERT and the SELECT, since the number of fields need to match, and id (as you noticed) can't be inserted as is into the destination table.
INSERT INTO DestTable (field1, field2, field3)
SELECT field1, field2, field3 FROM SourceTable;
An SQLfiddle to test with.
EDIT: You could do it in a bit more hacky way to simplify the insert. You can create a trigger that simply forces the primary key to NULL on insert.
CREATE TRIGGER t_DT BEFORE INSERT ON DestTable
FOR EACH ROW
SET NEW.id = NULL;
then a copy from table to table can be done as simply;
INSERT INTO DestTable SELECT * FROM SourceTable;
Another SQLfiddle.
How about something like:
INSERT INTO products
(
select products_temp.* FROM products_temp
where key not in (select key from products)
)

Update MySQL table from a local file

I have a table in a database, and I'd like to update a column which I have offline on a local file. The file itself has two columns
an ID which corresponds to an ID column in the table, and
the actual value.
I've been able to create new rows using
LOAD DATA INFILE 'file.txt' INTO TABLE table
FIELDS TERMINATED BY ','
But I'm not sure how I can specifically insert values in such a way that the ID column in the file is joined to the ID column in the table. Can someone help with the SQL syntax?
I suggest you load your data into a temporary table, then use an INSERT ... SELECT ... ON DUPLICATE KEY UPDATE; for example:
CREATE TEMPORARY TABLE temptable (
id INT UNSIGNED NOT NULL,
val INT,
PRIMARY KEY (id)
) ENGINE = MEMORY;
LOAD DATA LOCAL INFILE '/path/to/file.txt' INTO temptable FIELDS TERMINATED BY ',';
INSERT INTO my_table
SELECT id, val FROM temptable
ON DUPLICATE KEY UPDATE val = VALUES(val);
DROP TEMPORARY TABLE temptable;
Another way could be ...
Since you already know the table name as well have the ID and actual value ... what you can do is ... directly write the update statements in a file, like
update mytable set value_col = value where ID_col = ID;
Second Update Statement
Third Update statement
.......
Save the file as *.sql like, updatescript.sql and then execute that script directly like
mysql -h <hostname> -u root -p <your_db_name> < "E:/scripts/sql/updatescript.sql"
It depends of the no of rows ,
If it is in hundreds make a script of update column and run it , but if it is in large volume import that file in to a new table and update your table with a join , and then drop the table

Fast load data into file split to tables connect by id

Have a MySQL database using InnoDB and Foreign Keys...
I need to import 100MiB of data from a huge CSV file and split it into two tables and the records have to be like follows
Table1
id|data|data2
Table2
id|table1_id|data3
Where Table2.table1_id is a foreign key referencing Table1.id.
The MySQL sequence for one instance would look like this
Load file into a temporary table
After that do an insert from temporary table to the needed
Get the last insert ID
Do the last insert group using this reference id...
That is utterly slow...
How do I do this using file load into...? Any real ideas with high speed result?
You could temporarily add column data3 to Table1 (I also add a done column to distinguish records which originate from the CSV from those that already exist/originate from elsewhere):
ALTER TABLE Table1
ADD COLUMN data3 TEXT,
ADD COLUMN done BOOLEAN DEFAULT TRUE;
LOAD DATA
INFILE '/path/to/csv'
INTO TABLE Table1 (data, data2, data3)
SET done = FALSE;
INSERT
INTO Table2 (table1_id, data3)
SELECT (id, data3) FROM Table1 WHERE NOT done;
ALTER TABLE Table1
DROP COLUMN data3,
DROP COLUMN done;