MySQL: LOADing a SHA1 hash into a BINARY(20) column - mysql

I'm going to be loading a billion rows into a mySQL table, one column of which - BINARY(20) - is the SHA1 hash of several other columns, concatenated. Offhand I don't see how to use the LOAD command to load binary values, because it seems to rely upon delimiters.
Obviously, speed is important here, which is why I want to use LOAD. Does anyone know how to load a fixed-length binary value with LOAD? Is this perhaps a job for a trigger? (I've never used triggers before.) Or can I invoke a function (e.g. UNHEX) in the LOAD command?
(Since it seems to be a common question: no, I don't want to store it in base64 or hex notation. BINARY(20) is a requirement.)

Binary data and LOAD DATA INFILE are not friends. The file format specifiers need a delimiter, and arbitrary binary data is length delimited, not field delimited.
Your best bet is to use large multi-INSERT statements and tough it out. These can handle having hex-encoded strings decoded and dropped into BINARY columns automatically.
I'm not sure why anyone would wish this misery upon themselves, though. Saving twenty bytes a row versus standard hex notation is not worth the trouble.
If you really need to load in kajillions of rows, maybe MySQL is not the best platform to do it on. What you should be doing is either sharding that data into multiple tables or databases, or using a NoSQL store to split it up more effectively.

This seems to be a reasonable approach: to use the SET form of LOAD, using variables and invoking functions such as UNHEX and CONCAT.
For example:
Suppose mytable has four columns:
mysha1 BINARY(20)
a VARCHAR(20)
b VARCHAR(20)
c VARCHAR(20)
Column mysha1 is the sha1 hash of a, b, and c concatenated with '|' as a separator.
And suppose the input file is tab-delimited text lines of three fields apiece:
abel\tbaker\tcharlie\t\n
dog\teasy\tfor\t\n
etc\tetc\tetc\t\n
Here's how I'd load the table
LOAD DATA INFILE '/foo/bar/input.txt' INTO TABLE mytable
FIELDS TERMINATED BY '\t' ESCAPED BY '\\' LINES TERMINATED BY '\n'
(#f1, #f2, #f3) SET mysha1 = UNHEX(SHA1(CONCAT_WS('|', #f1, #f2, #f3))),
a=#f1, b=#f2, c=#f3;
UPDATE: in the general case, for the arbitrary binary value that can't be computed with a builtin function such as SHA1, the binary value must be expressed in the INFILE as a displayable-hex string, read into an #variable, and then converted into binary with the UNHEX function. E.g.:
mytable:
mybin8 BINARY(8)
a VARCHAR(20)
b VARCHAR(20)
c VARCHAR(20)
input file:
abel\tbaker\tcharlie\t0123456789abcdef\n
dog\teasy\tfox\t2468ace13579bdf\n
etc\tetc\tetc\t0000000000000000\n
load command:
LOAD DATA INFILE '/foo/bar/input.txt' INTO TABLE mytable
FIELDS TERMINATED BY '\t' ESCAPED BY '\\' LINES TERMINATED BY '\n'
(a, b, c, #myhex) SET mybin8 = UNHEX(#myhex);

Related

How to convert string "3.82384E+11" to BIGINT with MySQL?

I'm trying to save some ID values from CSV that are automatically converted to exponent numbers by Excel.
Like 382383816413 becomes 3.82384E+11. So I'm doing a full import into my MySQL database with:
LOAD DATA LOCAL INFILE
'file.csv'
INTO TABLE my_table
FIELDS TERMINATED BY ';'
ENCLOSED BY '"'
ESCAPED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(#`item_id`,
#`colum2`,
#`colum3`)
SET
item_id = #`item_id`;
I've tried using cast like:
CAST('3.82384E+11' as UNSIGNED) and it gives me just 3.
CAST('3.82384E+11' as BIGINT) and it doesn't work.
CAST('3.82384E+11' as UNSIGNED BIGINT) and gives me 3 again.
So, what's the better way to convert string exponent numbers to real big integers in MySQL?
Set column format as text instead of number in excel. Refer below link.
PHPExcel - set cell type before writing a value in it
My option was to convert the column with 3.82384E+11 to number in the excel file, so it get back to the original value. Then I export to CSV and use SQL query to import it fine.

Load data from text file to DB

Data:
1|\N|"First\Line"
2|\N|"Second\Line"
3|100|\N
\N represents NULL in MYSQL & MariaDB.
I'm trying to load above data using LOAD DATA LOCAL INFILE method into a table named ID_OPR.
Table structure:
CREATE TABLE ID_OPR (
idnt decimal(4),
age decimal(3),
comment varchar(100)
);
My code looks like below:
LOAD DATA LOCAL INFILE <DATA FILE LOCATION> INTO TABLE <TABLE_NAME> FIELDS TERMINATED BY '|' ESCAPED BY '' OPTIONALLY ENCLOSED BY '\"' LINES TERMINATED BY '\n';
Problem with this code is it aborts with error Incorrect decimal value: '\\N' For column <Column name>.
Question:
How to load this data with NULL values in second decimal column and also without loosing \(Backslash) from third string column?
I'm trying this is MariaDB which is similar to Mysql in most case.
Update:
The error i have mentioned appears like a warning and the data is actually getting loaded into table. But the catch here is with the text data.
For example: Incase of the third record above it is being loaded as \N itself into string column. But i want it to be NULL.
Is there any way to make the software to recognize this null value? Something like decode in oracle?
You can't have it both ways - either \ is an escape character or it is not. From MySQL docs:
If the FIELDS ESCAPED BY character is empty, no characters are escaped and NULL is output as NULL, not \N. It is probably not a good idea to specify an empty escape character, particularly if field values in your data contain any of the characters in the list just given.
So, I'd suggest a consistently formatted input file, however that was generated:
use \\ if you want to keep the backslash in the strings
make \ an escape character in your load command
OR
make strings always, not optionally, enclosed in quotes
leave escape character empty, as is
use NULL for nulls, not \N
BTW, this also explains the warnings you were experiencing loading \N in your decimal field.
Deal with nulls with blanks. that should fix it.
1||"First\Line"
2||"Second\Line"
3|100|
Thats how nulls are handled on CSVs and TSVs. And don't expect decimal datatype to go null as it stays 0, use int or bigint instead if needed. You should forget about "ESCAPED BY"; as long as string data is enclosed by "" that deals with the escaping problem.
we need three text file & 1 batch file for Load Data:
Suppose your file location 'D:\loaddata'
Your text file 'D:\loaddata\abc.txt'
1. D:\loaddata\abc.bad -- empty
2. D:\loaddata\abc.log -- empty
3. D:\loaddata\abc.ctl
a. Write Code Below for no separator
OPTIONS ( SKIP=1, DIRECT=TRUE, ERRORS=10000000, ROWS=5000000)
load data
infile 'D:\loaddata\abc.txt'
TRUNCATE
into table Your_table
(
a_column POSITION (1:7) char,
b_column POSITION (8:10) char,
c_column POSITION (11:12) char,
d_column POSITION (13:13) char,
f_column POSITION (14:20) char
)
b. Write Code Below for coma separator
OPTIONS ( SKIP=1, DIRECT=TRUE, ERRORS=10000000, ROWS=5000000)
load data
infile 'D:\loaddata\abc.txt'
TRUNCATE
into table Your_table
FIELDS TERMINATED BY ","
TRAILING NULLCOLS
(a_column,
b_column,
c_column,
d_column,
e_column,
f_column
)
4.D:\loaddata\abc.bat "Write Code Below"
sqlldr db_user/db_passward#your_tns control=D:\loaddata\abc.ctl log=D:\loaddata\abc.log
After double click "D:\loaddata\abc.bat" file you data will be load desire oracle table. if anything wrong check you "D:\loaddata\abc.bad" and "D:\loaddata\abc.log" file

how to populate a database?

I have a mysql database with a single table, that includes an autoincrement ID, a string and two numbers. I want to populate this database with many strings, coming from a text file, with all numbers initially reset to 0.
Is there a way to do it quickly? I thought of creating a script that generates many INSERT statements, but that seems somewhat primitive and slow. Especially since mysql is on remote site.
Yes - use LOAD DATA INFILE docs are here Example :
LOAD DATA INFILE 'csvfile'
INTO TABLE table
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 0 LINES
(cola, colb, colc)
SET cold = 0,
cole = 0;
Notice the set line - here is where you set a default value.
Depending on your field separator change the line FIELDS TERMINATED BY ','
The other answers only respond to half of your question. For the other half (zeroing numeric columns):
Either:
Set the default value of your number columns to 0,
In your text file, simply delete the numeric values,
This will cause the field to be read by LOAD INFILE as null, and the defauly value will be assigned, which you have set to 0.
Or:
Once you have your data in the table, issue a MySQL command to zero the fields, like
UPDATE table SET first_numeric_column_name = 0, second_numeric_column_name = 0 WHERE 1;
And to sum everything up, use LOAD DATA INFILE.
If you have access to server's file system, you can utilize LOAD DATA
If you don't want to fight with syntax, easiest way (if on windows) is to use HeidiSQL
It has friendly wizard for this purpose.
Maybe I can help you with right syntax, if you post sample line from text file.
I recommend you to use SB Data Generator by Softbuilder (which I work for), download and install the free trial.
First, create a connection to your MySQL database then go to “Tools -> Virtual Data” and import your test data (the file must be in CSV format).
After importing the test data, you will be able to preview them and query them in the same window without inserting them into the database.
Now, if you want to insert test data into your database, go to “Tools -> Data Generation” and select "generate data from virtual data".
SB data generator from Softbuilder

load data infile separating words by "_"

I'm currently importing a dictionary in mysql which has words that are seperated by _. I wanted to know how to specify the words are seperated by _. For example the words are as such:
Super_Mario
Stack_Overflow
Another_Word
so each row would then be stored as :
Super Mario
Stack Overflow
Another Word
I have this query right now:
LOAD DATA LOCAL INFILE
C:/upload/dictionary.csv
INTO TABLE dictionary
fields terminated by ',' lines terminated by '\n'
would I have to use fields terminated by '_'?
No, you just use the SET clause (just like in an UPDATE) to set the field's value with the result of a string REPLACE() operation that replaces underscores with spaces.
LOAD DATA LOCAL INFILE
C:/upload/dictionary.csv
INTO TABLE dictionary (#var1)
SET your_column_name = REPLACE(#var1, '_', ' ')
The (#var1) bit after INTO TABLE dictionary just means "there's only one column in the file I'm interested in, and I want to store it in #var1 so I can use it later in my SET clause instead of putting it directly into a column." Do a Ctrl+F in for "SET clause" in the documentation for LOAD DATA INFILE to see how to use a SET clause when your input file has more than one column.
fields terminated by '_' will interpret every word separated by _ as a new column
so Super_Mario, Stack_Overflow and Another_Word would each end up as two columns in each row. If your entire dictionary is made up of two words and your dictionary table has two columns, it'll work, but I have the feeling not every word in your file is going to be two words
If you want to store each line in the file as a single column, but with all the _s replaced with spaces, you could do something like this after the import (or do what #Jordan said and do it as part of the import which sounds better to me)
UPDATE dictionary SET columnname = replace(columnname,'_',' ')

When LOAD DATA INFILE INTO TABLE, what if BINARY(100) contains the separator byte?

I have found experimentally that MySQL first does a split of a row into fields based on the predefined separator. However what if I am loading a binary(100), and somewhere within the BINARY there is the separator byte?
For instance, you have a separator of a tab char (0x9).
What if the binary consists of (worst case) all 0x9 characters?
When you export data , MySQL will escape that data so you can safely insert it again.
If you are the one generating the data to import, you have to be careful and escape it. Prepend a \ to bytes that means a tab, newline or a \ itself that occurs within the data.¨
The details should be on this page