MySQL BinLog Statement Retrieval - mysql

I have seven 1G MySQL binlog files that I have to use to retrieve some "lost" information. I only need to get certain INSERT statements from the log (ex. where the statement starts with "INSERT INTO table SET field1="). If I just run a mysqlbinlog (even if per database and with using --short-form), I get a text file that is several hundred megabytes, which makes it almost impossible to then parse with any other program.
Is there a way to just retrieve certain sql statements from the log? I don't need any of the ancillary information (timestamps, autoincrement #s, etc.). I just need a list of sql statements that match a certain string. Ideally, I would like to have a text file that just lists those sql statements, such as:
INSERT INTO table SET field1='a';
INSERT INTO table SET field1='tommy';
INSERT INTO table SET field1='2';
I could get that by running mysqlbinlog to a text file and then parsing the results based upon a string, but the text file is way too big. It just times out any script I run and even makes it impossible to open in a text editor.
Thanks for your help in advance.

I never received an answer, but I will tell you what I did to get by.
1. Ran mysqlbinlog to a textfile
2. Created a PHP script that uses fgets to read each line of the log
3. While looping through each line, the script parses it using the stristr function
4. If the line matches the string I am looking for, it logs the line to a file
It takes a while to run mysqlbinlog and the PHP script, but it no longer times out. I originally used fread in PHP, but that reads the entire file into memory and caused the script to crash on large (1G) log files. Now, it takes several minutes to run (I also set the max_execution_time variable to be larger), but it works like a charm. fgets gets one line at a time, so it doesn't take up nearly as much memory.

Related

How can I track the execution of `source filename` command in mysql

How can I track the execution of source filename command in mysql so that I can have the filename and path of sql scripts that's been run. Google didn't help or may be I didn't use the right keyword.
So when I execute source ./test/file.sql (without errors preferably)
I want an entry in "source_history" table with current_time,filename(along with path) which I can do if I could figure how to track.
It'd be of great help if anyone could help me in keeping track of the command source.(Something like a trigger event for insert or update on table)
(may be, tracking all command in that sense and then while exiting mysql, get the query history and check for source)
Hope that makes sense.
Thanks in advance.
The problem is that the source command is not a MySQL command, it is a command in MySQL's command line interface, which is also named MySQL.
CLI only passes the sql commands within the file to the MySQL server, therefore the server cannot be aware of the exact file used for executing the command.
MySQL own documentation the source command (see the link above) suggests the most obvious solution:
Sometimes you may want your script to display progress information to
the user. For this you can insert statements like this:
SELECT '<info_to_display>' AS ' ';
So, the simples way is to create a table with fields for path, event type (start / stop) and a timestamp and add insert statements to the start and end of each sql file that log the start and the end of each batch and supply the name of the file hard coded in the insert statements. You may want to create a script that adds these commands to the sql files.
Alternative is to create a batch file that receives a path to an .sql file in a parameter, invokes MySQL's CLI, logs the start of the batch process in mysql, launches the .sql file, and then logs the completion of the batch in MySQL.

how to FAST import a giant sql script for mysql?

Currently I have a situation which needs to import a giant sql script into mysql. The sql script content is mainly about INSERT operation. But there are so much records over there and the file size is around 80GB.
The machine has 8 cpus, 20GB mem. I have done something like:
mysql -h [*host_ip_address*] -u *username* -px xxxxxxx -D *databaseName* < giant.sql
But the whole process takes serveral days which is quite long.Is any other options to import the sql file into database?
Thanks so much.
I suggest you to try LOAD DATA INFILE. It is extremely fast. I've not used it for loading to remote server, but there is mysqlimport utility. See a comparison of different approaches: https://dev.mysql.com/doc/refman/5.5/en/insert-speed.html.
Also you need to convert your sql script to format suitable for LOAD DATA INFILE clause.
You can break the sql file into several files (on basis of tables) by using shell script & then prepare a shell script to one by one to import the file. This would speedy insert instead of one go.
The reason is that inserted records occupied space in memory for a single process and not remove. You can see when you are importing script after 5 hours the query execution speed would be slow.
Thanks for all your guys help.
I have taken some of your advices and done some comparison on this, now it is time to post the results. The target single sql script 15GB.
Overall, I tried:
importing data as single sql script with index; (Take Days, finally I killed it. DONOT TRY THIS YOURSELF, you will be pissed off.)
importing data as single sql script without index; (Same as above)
importing data as split sql script with index (Take the single sql as an example, I split the big file into small trunks around 41MB each. Each trunk takes around 2m19.586s, Total around );
importing data as split sql script without index; (Each trunk takes 2m9.326s.)
(Unfortunately I did not tried the Load Data method for this dataset)
Conclusion:
If you do not want to use Load Data method when you have to import a giant sql into mysql. It is better to:
Divide into small scripts;
Remove the index
You can add the index back after importing. Cheers
Thanks #btilly #Hitesh Mundra
Put the following commands at the head of giant.sql file
SET AUTOCOMMIT = 0;
SET FOREIGN_KEY_CHECKS=0;
and following at the end
SET FOREIGN_KEY_CHECKS = 1;
COMMIT;
SET AUTOCOMMIT = 1;

PhpMyAdmin data import performance issues

Originally, my question was related to the fact that PhpMyAdmin's SQL section wasn't working properly. As suggested in the comments, I realized that it was the amount of the input is impossible to handle. However, this didn't provide me with a valid solution of how to deal with the files that have (in my case - 35 thousand record lines) in format of (CSV):
...
20120509,126,1590.6,0
20120509,127,1590.7,1
20120509,129,1590.7,6
...
The Import option in PhpMyadmin is struggling just as the basic copy-paste input in SQL section does. This time, same as previously, it takes 5 minutes until the max execution time is called and then it stops. What is interesting tho, it adds like 6-7 thousand of records into the table. So that means the input actually goes through and does that almost successfully. I also tried halving the amount of data in the file. Nothing has changed however.
There is clearly something wrong now. It is pretty annoying to have to play with the data in php script when simple data import is not work.
Change your php upload max size.
Do you know where your php.ini file is?
First of all, try putting this file into your web root:
phpinfo.php
( see http://php.net/manual/en/function.phpinfo.php )
containing:
<?php
phpinfo();
?>
Then navigate to http://www.yoursite.com/phpinfo.php
Look for "php.ini".
To upload large files you need max_execution_time, post_max_size, upload_max_filesize
Also, do you know where your error.log file is? It would hopefully give you a clue as to what is going wrong.
EDIT:
Here is the query I use for the file import:
$query = "LOAD DATA LOCAL INFILE '$file_name' INTO TABLE `$table_name` FIELDS TERMINATED BY ',' OPTIONALLY
ENCLOSED BY '\"' LINES TERMINATED BY '$nl'";
Where $file_name is the temporary filename from php global variable $_FILES, $table_name is the table already prepared for import, and $nl is a variable for the csv line endings (default to windows line endings but I have an option to select linux line endings).
The other thing is that the table ($table_name) in my script is prepared in advance by first scanning the csv to determine column types. After it determines appropriate column types, it creates the MySQL table to receive the data.
I suggest you try creating the MySQL table definition first, to match what's in the file (data types, character lengths, etc). Then try the above query and see how fast it runs. I don't know how much of a factor the MySQL table definition is on speed.
Also, I have no indexes defined in the table until AFTER the data is loaded. Indexes slow down data loading.

Excel SQL Server Data Connection

Perhaps someone could provide some insight into a problem I have.
I have a SQL Server database which receives information every hour and is updated from a stored procedure using a Bulk Insert. This all works fine, however the end result is to pull this information into Excel.
Establishing the data connection worked fine as well, until I attempted some calculations. The imported data is all formatted as text. Excel's number formats aren't working so I decided looking at the table in the database.
All the columns are set to varchar for the Bulk Insert to work so I changed a few to numeric. Refreshed in Excel and the calculations worked.
After repeat attempts I've not been able to get the Bulk Insert to work, even generating a format file with bcp it still returned errors on the insert. Could not convert varchar to numerical, after some further searching it was only failing on one numerical column which is generally empty.
Other than importing the data with VBA and converting it like that or adding zero to every imported value so Excel converts it.
Any suggestions are welcome.
Thanks!
Thanks for the replies I had considered using =value() in Excel but wanted to try avoid the additional formulas.
I was eventually able to resolve my problem by generating a format file for the Bulk Insert using the bcp utility. Though getting it to generte a file proved tricky enough below is an example of how I generated it.
At an elevated cmd:
C:\>bcp databasename.dbo.tablename format nul -c -x -f outputformatfile.xml -t, -S localhost\SQLINSTANCE -T
This generated an xml format file for the specific table. As my table had two additional columns which weren't in the source data I edited the XML and removed them. They were uniqueid and getdate columns.
Then I changed the Bulk Insert statement so it used the format file:
BULK INSERT [database].[dbo].[tablename]
FROM 'C:\bulkinsertdata.txt'
WITH (FORMATFILE='C:\outputformatfile.xml',FIRSTROW=3)
Using this method I was able to use the numeric and int datatypes successfully. Going back to Excel when the data connection was refreshed it was able to determine the correct datatypes.
Hope that helps someone!

How to count number of SQL statements in text file?

My program restores a MySQL database from SQL file. If I wanted to display progress of SQL execution in my program, I would need to know the number of SQL statements in the file. How can I do this in MySQL? (The queries may consist of mysql specific multi-row insert statements)
I could use either MySQL command line tools or the Python API. You're welcome to post solutions for other DBMS too.
The simple (and easy) way: Add PRINT statements to your SQL script file, displaying progess messages.
The advantage (apart from the obvious 'it's hard to parse multi-statement constructs') is that you get precise control over the progress. For example, some statements might take much longer to run than others so you would need to weight them.
I wouldn't think of progress in terms of number of statements executed. What I do is print out feedback that specific tasks have been started and completed, such as 'Synchronising Table 'blah'', 'Updating Stored Procedure X' etc
The naive solution is to count the number of semicolons in the file (or any other character used as delimited in the file).
It usually works pretty well, except when the data you are inserting has many semicolons and then you have to start dealing with actual parsing of the SQLs, which is a headache.