Joining binary data from blobs in mysql - mysql

I have a mysql database table with audio data in MEDIUMBLOB fields. The thing is, one audio file is in different rows. So I want to join them. When I do this:
select data from audio where id=1 into outfile "output" fields escaped by '';
.. I get audio. When I do the same thing for id=2, I get audio. When I put them together:
select data from audio order by date, time into outfile "output" fields escaped by '';
.. I get audio for awhile, then high amplitude noise. The noise starts where id=2 would have been. If I select more than two columns to put together, sometimes the output from that particular id is noise, sometimes it's the correct audio. It's not exactly interleaved.
So, how can I extract and concatenate blobs from multiple rows into a coherent binary output file?
Edit: this is raw audio data. E.g. to read it into Audacity you would go to import->raw.

SELECT INTO OUTFILE is intended mainly to allow you quickly dump a table to a text file, and also to complement LOAD DATA INFILE.
I strongly suspect that the SELECT INTO OUTFILE is either adding an ASCII NULL, or other formatting between columns so that it can be read back.
You could compare binary output to determine if this is true, and also pick up upon any other formatting, encoding or shaping that may also be present.
Do you have to use INTO OUTFILE? - Can you not get the binary data and create a file directly with a script or other middle tier layer rather than relying on the database?
Update
After writing this, and reading the comment below. I thought about CONCAT, MySQL treats BLOB values as binary strings (byte strings), therefore in theory a simple concatenation of multiple columns into one single one might work.
If it doesn't then it wouldn't take much to write a simple pearl, PHP, C, bash or other scripting language to query the database, join two or more binary columns and write the output to a file on the system.

what you need is to run the mysql command from the command line, using -e to pass a sql statement to execute, and the appropriate flags for the binary output.
You then use group_concat with an empty separator to concat all the data.
Finally, you output to your file. The only thing you need to be aware of is the built in limit on the group_concat, which you can change in your .ini file, or by setting a global variable.
Here is how would this look:
mysql --binary-mode -e "select group_concat(data separator '') from audio order by date, time;" -A -B -r -L -N YOUR_DATABASE_NAME > output
i just tried it with binary data (not audio) in a database and it works.

Related

Importing and exporting TSVs with MySQL

I'm using a database with MySQL 5.7, and sometimes, data needs to be updated using a mixture of scripts and manual editing. Because people working with the database are usually not familiar with SQL, I'd like to export the data as a TSV, which then could be manipulated (for example with Python's pandas module) and then be imported back. I assume the standard way would be to directly connect to the database, but using TSVs has some upsides in this situation, I think. I've been reading the MySQL docs and some stackoverflow questions to find the best way to do this. I've found a couple of solutions, however, they all are somewhat inconvenient. I will list them below and explain my problems with them.
My question is: did I miss something, for example some helpful SQL commands or CLI options to help with this? Or are the solutions I found already the best when importing/exporting TSVs?
My example database looks like this:
Database: Export_test
Table: Sample
Field
Type
Null
Key
id
int(11)
NO
PRI
text_data
text
NO
optional
int(11)
YES
time
timestamp
NO
Example data:
INSERT INTO `Sample` VALUES (1,'first line\\\nsecond line',NULL,'2022-02-16 20:17:38');
The data contains an escaped newline, which caused a lot of problems for me when exporting.
Table: Reference
Field
Type
Null
Key
id
int(11)
NO
PRI
foreign_key
int(11)
NO
MUL
Example data:
INSERT INTO `Reference` VALUES (1,1);
foreign_key is referencing a Sample.id.
Note about encoding: As a caveat for people trying to do the same thing: If you want to export/import data, make sure that characters sets and collations are set up correctly for connections. This caused me some headache, because although the data itself is utf8mb4, the client, server and connection character sets were latin1, which caused some loss of data in some instances.
Export
So, for exporting, I found basically three solutions, and they all behave somewhat differently:
A: SELECT stdout redirection
mysql Export_test -e "SELECT * FROM Sample;" > out.tsv
Output:
id text_data optional time
1 first line\\\nsecond line NULL 2022-02-16 21:26:13
Pros:
headers are added, which makes it easy to use with external programs
formatting works as intended
Cons:
NULL is used for null values; when importing, \N is required instead; as far as I know, this can't be configured for exports
Workaround: replace NULL values when editing the data
B: SELECT INTO OUTFILE
mysql Export_test -e "SELECT * FROM Sample INTO OUTFILE '/tmp/out.tsv';"
Output:
1 first line\\\
second line \N 2022-02-16 21:26:13
Pros:
\N is used for null data
Cons:
escaped linebreaks are not handled correctly
headers are missing
file writing permission issues
Workaround: fix linebreaks manually; add headers by hand or supply them in the script; use /tmp/ as output directory
C: mysqldump with --tab (performs SELECT INTO OUTFILE behind the scenes)
mysqldump --tab='/tmp/' --skip-tz-utc Export_test Sample
Output, pros and cons: same as export variant B
Something that should be noted: the output is only the same as B, if --skip-tz-utc is used; otherwise, timestamps will be converted to UTC, and will be off after importing the data.
Import
Something I didn't realize it first, is that it's impossible to merely update data directly with LOAD INTO or mysqlimport, although that's something many GUI tools appear to be doing and other people attempted. For me as an beginner, this wasn't immediately clear from the MySQL docs. A workaround appears to be creating an empty table, import the data there and then updating the actual table of interest via a join. I also thought one could update individual columns with this, which again is not possible. If there are some other ways to achieve this, I would really like to know.
As far as I could tell, there are two options, which do pretty much the same thing:
LOAD INTO:
mysql Export_test -e "SET FOREIGN_KEY_CHECKS = 0; LOAD DATA INFILE '/tmp/Sample.tsv' REPLACE INTO TABLE Sample IGNORE 1 LINES; SET FOREIGN_KEY_CHECKS = 1;"
mysqlimport (performs LOAD INTO behind the scenes):
mysqlimport --replace Export_test /tmp/Sample.tsv
Notice: if there are foreign key constraints like in this example, SET FOREIGN_KEY_CHECKS = 0; needs to be performed (as far as I can tell, mysqlimport can't be directly used in these cases). Also, IGNORE 1 LINES or --ignore-lines can be used to skip the first line if the input TSV contains a header. For mysqlimport, the name of the input file without extension must be the name of the table. Again, file reading permissions can be an issue, and /tmp/ is used to avoid that.
Are there ways to make this process more convenient? Like, are there some options I can use to avoid the manual workarounds, or are there ways to use TSV importing to UPDATE entries without creating a temporary table?
What I ended up doing was using LOAD INTO OUTFILE for exporting, added a header manually and also fixed the malformed lines by hand. After manipulating the data, I used LOAD DATA INTO to update the data. In another case, I exported with SELECT to stdout redirection, manipulated the data and then added a script, which just created a file with a bunch of UPDATE ... WHERE statements with the corresponding data. Then I ran the resulting .sql in my database. Is the latter maybe the best option in this case?
Exporting and importing is indeed sort of clunky in MySQL.
One problem is that it introduces a race condition. What if you export data to work on it, then someone modifies the data in the database, then you import your modified data, overwriting your friend's recent changes?
If you say, "no one is allowed to change data until you re-import the data," that could cause an unacceptably long time where clients are blocked, if the table is large.
The trend is that people want the database to minimize downtime, and ideally to have no downtime at all. Advancements in database tools are generally made with this priority in mind, not so much to accommodate your workflow of taking the data out of MySQL for transformations.
Also what if the database is large enough that the exported data causes a problem because where do you store a 500GB TSV file? Does pandas even work on such a large file?
What most people do is modify data while it remains in the database. They use in-place UPDATE statements to modify data. If they can't do this in one pass (there's a practical limit of 4GB for a binary log event, for example), then they UPDATE more modest-size subsets of rows, looping until they have transformed the data on all rows of a given table.

Informix LOAD FROM file with header

I'm using Informix LOAD FROM command to bulk insert data from CSV files to a DB table, like:
LOAD FROM "file.csv" DELIMITER ";" INSERT INTO table_name(col1, col2, col3)
The problem is, the first line of each CSV file contains column headers. Is there any way to tell Informix that the first row shall be ignored?
No; there isn't a way to tell standard Informix LOAD statement to skip a header line. Note, too, that it won't remove quotes from around fields in CSV format and otherwise deal with things the way CSV format officially expects (though, since you have semicolon-separated values rather than comma-separated values, it is hard to know which rules are being followed — be leery of the treatment of backslashes too).
You might be able to use the Informix DB-Load utility (dbload) instead; it depends on whether your data is simply using ; in place of Informix's default | delimiter, or whether you have more of the semantics of CSV such as quotes around fields that need to be removed. If you want to get exotic, the Informix High-Performance Loader (HPL) can either handle it natively or be trained to handle it.
Alternatively, you could consider using my* SQLCMD program (it has been called sqlcmd a lot longer than Microsoft's johnny-come-lately of the same name) which allows you to specify:
LOAD FROM "file.csv" DELIMITER ";" SKIP 1 INSERT INTO table_name(col1, col2, col3);
SQLCMD also has an option FORMAT CSV (amongst other formats) that might, or might not, be relevant. It handles things like stripping quotes from around fields that the full CSV standard supports.
You'll need to have Informix ClientSDK and a C compiler (and the rest of a C development system) installed to build SQLCMD.
* Since SQLCMD is my program because I wrote it, any recommendation to use it is inherently biassed; you were warned.
You could also consider an 'external table' (CREATE EXTERNAL TABLE), but I'm not sure it is any better than the LOAD statement either with the formats it supports or with the ability to skip the first row of data.
When I Load CSV files using LOAD FROM into Informix I usually load to a temporary table which is all character columns which I then work with. You just delete the header row. Basically your just putting the whole file into a temp table which is easier to work with.

MySQL Writing Image BLOB To Disk

I have a table that contains image BLOB field.
I want to be able to submit a query to the database and have the BLOBs written to the windows file system.
Is this possible??
Yes, it is possible. You can use SELECT command with INTO DUMPFILE clause. For example -
SELECT
data_column
FROM
table1
WHERE
id = 1
INTO DUMPFILE 'image.png';
From the reference: If you use INTO DUMPFILE instead of INTO OUTFILE, MySQL writes only one row into the file, without any column or line termination and without performing any escape processing. This is useful if you want to store a BLOB value in a file.

How to load data into a MySQL table without spaces?

I have a text file to be imported in a MySQL table. The columns of the files are comma delimited. I set up an appropriate table and I used the command:
load data LOCAL INFILE 'myfile.txt' into table mytable FIELDS TERMINATED BY ‘,’;
The problem is, there are several spaces in the text file, before and after the data on each column, and it seems that the spaces are all imported in the tables (and that is not what I want). Is there a way to load the file without the empty spaces (other than processing each row of the text file before importing in MySQL)?
As far as I understand, there's no way to do this during the actual load of the data file dynamically (I've looked, as well).
It seems the best way to handle this is to either use the SET clause with the TRIM
function
("SET column2 = TRIM(column2)")
or run an update on the string columns after loading, using the TRIM() function.
You can also create a stored procedure using prepared statements to run the TRIM function on all columns in a specified table, immediately after loading it.
You would essentially pass in the table name as a variable, and the sp would use the information_schema database to determine which columns to upload.
If you can use .NET, CSVReader is a great option(http://www.codeproject.com/KB/database/CsvReader.aspx). You can read data from a CSV and specify delimiter, trimming options, etc. In your case, you could choose to trim left and right spaces from each value. You can then either save the result to a new text file and import it into the database, or loop through the CsvReader object and insert each row into the database directly. The performance of CsvReader is impressive. Hope this helps.

MySQL BinLog Statement Retrieval

I have seven 1G MySQL binlog files that I have to use to retrieve some "lost" information. I only need to get certain INSERT statements from the log (ex. where the statement starts with "INSERT INTO table SET field1="). If I just run a mysqlbinlog (even if per database and with using --short-form), I get a text file that is several hundred megabytes, which makes it almost impossible to then parse with any other program.
Is there a way to just retrieve certain sql statements from the log? I don't need any of the ancillary information (timestamps, autoincrement #s, etc.). I just need a list of sql statements that match a certain string. Ideally, I would like to have a text file that just lists those sql statements, such as:
INSERT INTO table SET field1='a';
INSERT INTO table SET field1='tommy';
INSERT INTO table SET field1='2';
I could get that by running mysqlbinlog to a text file and then parsing the results based upon a string, but the text file is way too big. It just times out any script I run and even makes it impossible to open in a text editor.
Thanks for your help in advance.
I never received an answer, but I will tell you what I did to get by.
1. Ran mysqlbinlog to a textfile
2. Created a PHP script that uses fgets to read each line of the log
3. While looping through each line, the script parses it using the stristr function
4. If the line matches the string I am looking for, it logs the line to a file
It takes a while to run mysqlbinlog and the PHP script, but it no longer times out. I originally used fread in PHP, but that reads the entire file into memory and caused the script to crash on large (1G) log files. Now, it takes several minutes to run (I also set the max_execution_time variable to be larger), but it works like a charm. fgets gets one line at a time, so it doesn't take up nearly as much memory.