MySQL read parameter from file for select statement - mysql

I have a select query as follows:
select * from abc where id IN ($list);
The problem with the length of variable $list, it may have a length of 4000-5000 characters, as a result of which the length of actually executed query increases and its get pretty slow.
Is there a method to store the values of $list in a file and ask MySQL to read it from that file, similar to LOAD DATA INFILE 'file_name' for insertion into table?

Yes, you can (TM)!
First step: Use CREATE TEMPORARY TABLE temp (id ... PRIMARY KEY) and LOAD DATA INFILE ... to create and fill a temporary table holding your value list
Second step: Run SELECT abc.id FROM abc INNER JOIN temp ON abc.id=temp.id
I have the strong impression this only checks out as a win, if you use the same value list quite a few times.

Related

How to get row ids when using LOAD LOCAL DATA INFILE?

I have MySQL database with table into which I insert from multiple files using
LOAD DATA LOCAL INFILE ... statement. I have PRIMARY KEY ID set to auto_increment. The problem is, when I want to update only part of the table.
Say I've inserted file_1, file_2, file_3 in the past and now I want to update only file_2. I imagine the process in pseudo workflow
delete old data related to file_2
insert new data from file_2
However, it is hard to determine, which data are originally from file_2. In order to find out, I've come up with this idea:
When I insert the data, I will note the ids of the rows, which I've inserted, since I am using auto_increment I can note something like from_id, to_id for each of the file. Then, when I want to update only file_x I will delete only the data with from_id <= id <= to_id (where from_id, to_id relates to the file_x).
After little bit of searching, I've found out about ##identity and last_insert_id() (see), however, when I use select last_insert_id() after LOAD DATA LOCAL INFILE I get only one id, and not the maximal id corresponding to the data, but the last added (as it is defined). I am connecting to the database from Python using mysql.connnector using
cur.execute("select last_insert_id();")
print(cur.fetchall())
# gives
# [(<some_number>,)]
So, is there a way, how to retrieve all (or at least the minimal and maximal) ids which were assigned to the data imported using the LOAD DATA LOCAL INFILE... statement as mentioned above?
If you need to remember the source of each record from the table then you better store the information in a field.
I would add a new field (src) of type TINYINT to the table and store the ID of the source (1 for file_1, 2 for file_2 a.s.o.). I assume there won't be more than 255 sources; otherwise use SHORTINT for its type.
Then, when you need to update the records imported from file_2 you have two options:
delete all the records having src = 2 then load the new records from file into the table; this is not quite an update, it is a replacement;
load the new records from file into a new table then copy from it the values you need to update the existing records.
Option #1
Deletion is an easy job:
DELETE FROM table_1 WHERE src = 2
Loading the new data and setting the value of src to 2 is also easy (it is explained in the documentation):
LOAD DATA INFILE 'file.txt'
INTO TABLE table_1
(column1, column2, column42) # Put all the columns names here
# in the same order the values appear in the file
SET src = 2 # Set values for other columns too
If there are columns in the file that you don't need then load their values into variables and simply ignore the variables. For example, if the third column from the file doesn't contain useful information you can use:
INTO TABLE table_1 (column1, column2, #unused, column42, ...)
A single variable (I called it #unused but it can have any name) can be used to load data from all the columns you want to ignore.
Option #2
The second option requires the creation of a working table but it's more flexible. It allows updating only some of the rows, based on usual WHERE conditions. However, it can be used only if the records can be identified using the values loaded from the file (with or without the src column).
The working table (let's name it table_w) has the columns you want to load from the file and is created in advance.
When it's the time to update the rows imported from file_2 you do something like this:
truncate the working table (just to be sure it doesn't contain any leftovers from a previous import);
load the data from file into the working table;
join the working table and table_1 and update the records of table_1 as needed;
truncate the working table (cleanup of the current import).
The code:
# 1
TRUNCATE table_w;
# 2
LOAD DATA INFILE 'file.txt'
INTO TABLE table_w
(column_1, column_2, column 42); # etc
# 3
UPDATE table_1 t
INNER JOIN table_w w
ON t.column_1 = w.column_1
# AND t.src = 2 # only if column_1 is not enough
SET t.column_2 = w.column_2,
t.column_42 = w.column_42
# WHERE ... you can add extra conditions here, if needed
# 4
TRUNCATE TABLE table_w

How to do something like SELECT "all columns except .."?

I have a MySQL table with data in it, called current. I import new data into a table called temp. Both these tables have auto_increment ID columns.
The table structure is not known in advance for the data import (there are various file structures that I need to import), event though the structure of current and temp will be the same.
Because of the unknown column configuration of the import files (tables created on the fly for each different file configuration), I cannot select specific columns, hence I would have to select all columns, less the ID column from table temp and import the result into table current.
I need to import into temp first, as the files can be large, and I need to do processing on the data before saving into the database, so I do not want to do any operations on the current table before I have imported the separate file first.
The ID column from the temp table prevents the insert into the current table due to duplicate key.
So I need something like this:
INSERT INTO `current`
(SELECT **ALL COLUMNS EXCEPT ID** FROM `temp`)
Any ideas on how to write the section ALL COLUMNS EXCEPT ID? Is this even possible?
There's no * except foo. You'll have to list all of the columns, except the ones you don't want.
SELECT field1, field2, ..., fieldN ...
You could do it via dynamic scripting, e.g. query information_schema for the field names, build up the field list as a string, prepare that string as query, execute it, etc...

LOAD DATA INFILE with a SELECT statement

I have the following database relationship:
I also have this large CSV file that I want to insert into bmt_transcripts:
Ensembl Gene ID Ensembl Transcript ID
ENSG00000261657 ENST00000566782
ENSG00000261657 ENST00000562780
ENSG00000261657 ENST00000569579
ENSG00000261657 ENST00000568242
The problem is that can't insert the Ensemble Gene ID as a string, I need to find its ID from the bmt_genes table, so I came up with this code:
LOAD DATA INFILE 'filename.csv'
INTO TABLE `bmt_transcripts`
(#gene_ensembl, ensembl_id)
SET gene_id = (SELECT id FROM bmt_genes WHERE ensembl_id = #gene_ensembl);
However this takes over 30 minutes to load a 7mb CSV, which is far too long. I assume it's running a table-wide query for every row it inserts, which is obviously horribly inefficient. I know I could load the data into a temporary table and SELECT from that (which, yes, runs in some 5 seconds), but this CSV may grow to have some 20 columns, which will become unwieldy to write a select statement for.
How can I fix my LOAD DATA INFILE query (which runs a SELECT on another table) to run in a reasonable length of time?

MySQL: reorder rows from file association

A MySQL photo gallery script requires that I provide the display order of my gallery by pairing each image title to a number representing the desired order.
I have a list of correctly ordered data called pairs_list.txt that looks like this:
# title correct data in list
-- -------
1 kmmal
2 bub14
3 ili2
4 sver2
5 ell5
6 ello1
...
So, the kimmals image will be displayed first, then the bub14 image, etc.
My MySQL table called title_order has the same titles above, but they are not paired with the right numbers:
# title bad data in MySQL
-- -------
14 kmmal
100 bub14
31 ili2
47 sver2
32 ell5
1 ello1
...
How can I make a MySQL script that will look at the correct number-title pairings from pairs_list.txt and go through each row of title_order, replacing each row with the correct number? In other words, how can I make the order of the MySQL table look like that of the text file?
In pseudo-code, it might look like something like this:
Get MySQL row title
Search pair_list.txt for this title
Get the correct number-title pair in list
Replace the MySQL number with the correct number
Repeat for all rows
Thank you for any help!
if this is not a one time task but will be frequently called function, then maybe you can have the following scenario:
create a temp table, insert all the values from pairs_list.txt into this temp table using mysql load data infile function.
create a procedure (or a insert trigger maybe?) on that temp table which would update your main table according to whatever inserted.
in that procedure (or a insert trigger), I would have a cursor getting all values from temp table and for each value from that cursor update the selected title on your main table.
delete all from that temp table
I'd suggest you to do this simple way -
1 Remove all primary and unique keys from the title_order table, and create unique index (or primary key) on title field -
ALTER TABLE title_order
ADD UNIQUE INDEX UK_title_order_title (title);
2 Use LOAD DATA INFILE with REPLACE option to load data from the file and replace -
LOAD DATA INFILE 'pairs_list.txt'
REPLACE
INTO TABLE title_order
FIELDS TERMINATED BY ' '
LINES TERMINATED BY '\r\n'
IGNORE 2 LINES
(#col1, #col2)
SET order_number_field = #col1, title = TRIM(#col2);
...specify properties you need in LOAD DATA INFILE command.

Fast way to populate a relational database in MySQL using JDBC?

I am trying to implement simple program in Java that will be used to populate a MySQL database from a CSV source file. For each row in the CSV file, I need to execute following sequence of SQL statements (example in pseudo code):
execute("INSERT INTO table_1 VALUES(?, ?)");
String id = execute("SELECT LAST_INSERT_ID()");
execute("INSERT INTO table_2 VALUES(?, ?)");
String id2 = execute("SELECT LAST_INSERT_ID()");
execute("INSERT INTO table_3 values("some value", id1, id2)");
execute("INSERT INTO table_3 values("some value2", id1, id2)");
...
There are three basic problems:
1. Database is not on localhost so each single INSERT/SELECT has latency and this is the basic problem
2. CSV file contains millions of rows (like 15 000 000) so it takes too long.
3. I cannot modify the database structure (add extra tables, disable keys etc).
I was wondering how can I speed up the INSERT/SELECT process? Currently 80% of the execution time is consumed by communication.
I already tried to group the above statements and execute them as batch but because of LAST_INSERT_ID it does not work. In any other cases it takes too long (see point 1).
Fastest way is to let MySQL parse the CSV and load records into the table. For that, you can use "LOAD DATA INFILE":
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
It works even better if you can transfer the file to server or keep it on a shared directory that is accessible to server.
Once that is done, you can have a column that indicates whether the records has been processed or not. Its value should be false by default.
Once data is loaded, you can pick up all records where processed=false.
For all such records you can populate table 2 and 3.
Since all these operation would happen on server, server <> client latency would not come into the picture.
Feed the data into a blackhole
CREATE TABLE `test`.`blackhole` (
`t1_f1` int(10) unsigned NOT NULL,
`t1_f2` int(10) unsigned NOT NULL,
`t2_f1` ... and so on for all the tables and all the fields.
) ENGINE=BLACKHOLE DEFAULT CHARSET=latin1;
Note that this is a blackhole table, so the data is going nowhere.
However you can create a trigger on the blackhole table, something like this.
And pass it on using a trigger
delimiter $$
create trigger ai_blackhole_each after insert on blackhole for each row
begin
declare lastid_t1 integer;
declare lastid_t2 integer;
insert into table1 values(new.t1_f1, new.t1_f2);
select last_insert_id() into lastid_t1;
insert into table2 values(new.t2_f1, new.t2_f1, lastid_t1);
etc....
end$$
delimiter ;
Now you can feed the blackhole table with a single insert statement at full speed and even insert multiple rows in one go.
insert into blackhole values(a,b,c,d,e,f,g,h),(....),(...)...
Disable index updates to speed things up
ALTER TABLE $tbl_name DISABLE KEYS;
....Lot of inserts
ALTER TABLE $tbl_name ENABLE KEYS;
Will disable all non-unique key updates and speed up the insert. (an autoincrement key is unique, so that's not affected)
If you have any unique keys and you don't want MySQL to check for them during the mass-insert, make sure you do an alter table to eliminate the unique key and enable it afterwards.
Note that the alter table to put the unique key back in will take a long time.