I am using MySQL.
I got a mysql dump file (large_data.sql), I can create a database and load data from this dump file to the created database. No problem on this.
Now, I feel the data in the dump file is too large (for example, it contains 300000 rows/objects in one table, other tables are also contain a large amount of data).
So, I decided to make another dump (based on the large size dump) which can contains a small size of data (for example, 30 rows/objects in a table).
With only that big size dump file, what is the correct and efficient way to cut off the data in that dump and create a new dump file which contains small amount of data?
------------------------- More -----------------------------------
(Use textual tool to open the large size dump is not good, since the dump is very large, it takes long time to open the dump from textual tool)
If you want to work only on the textual dump files, you could use some textual tools (like awk or sed, or perhaps a perl or python or ocaml script) to handle them.
But maybe your big database was already loaded from the big dump file, and you want to work with MySQL incremental backups?
I recommend free file splitter : http://www.filesplitter.org/ .
Only problem : it cut a query in two parts. You need to edit manualy the file after but, it work like a charm.
Example :
My file is :
BlaBloBluBlw
BlaBloBluBlw
BlaBloBluBlw
Result will be :
File 1:
BlaBloBluBlw
BlaBloBl
File 2:
uBlw
BlaBloBluBlw
So you need to edit everything but it work like a charm and very quick. Used today on a 9,5 millions rows table.
BUT !! Best argument : the time you will take to do this is small compared to the time you try to import something big or waiting for it... this is quick and efficent even if you need to edit the file manualy since you need to rebuild the last and first query.
Related
I know how to import a text file into MySQL database by using the command
LOAD DATA LOCAL INFILE '/home/admin/Desktop/data.txt' INTO TABLE data
The above command will write the records of the file "data.txt" into the MySQL database table. My question is that I want to erase the records form the .txt file once it is stored in the database.
For Example: If there are 10 records and at current point of time 4 of them have been written into the database table, I require that in the data.txt file these 4 records get erased simultaneously. (In a way the text file acts as a "Queue".) How can I accomplish this? Can a java code be written? Or a scripting language is to be used?
Automating this is not too difficult, but it is also not trivial. You'll need something (a program, a script, ...) that can
Read the records from the original file,
Check if they were inserted, and, if they were not, copy them in another file
Rename or delete the original file, and rename the new file to replace the original one.
There might be better ways of achieving what you want to do, but, that's not something I can comment on without knowing your goal.
i've got a questions regarding on how to process a delimited file with a large number of columns (>3000).
I tried to extract the fields with the standard delimited file input component, but creating the schema takes hours and when i run the job i get an error, because the toString() method exceeds the 65535 bytes limit. At that point i can run the job but all the columns are messed up and i cant realy work with them anymore.
Is it possible to split that .csv-file with talend? Is there any other handling possible, maybe with some sort of java code? If you have any further questions dont hesitate to comment.
Cheers!
You can create the schema of the delimited file in Metadata right? I tested 3k columns with some millions of records and it did not even take 5 minutes to load all the column names with data types. Obviously you can't split that file by taking each row as one cell, it could exceed the limit of strings in talend. But you can do it in Java using BufferedReader.
To deal with Big delimited file, we need something designed for big data, I think it will be a good choice to load your file to a MongoDB collection using this command with no need to create a 3k columns collection before importing file:
mongoimport --db users --collection contacts --type csv --headerline --file /opt/backups/contacts.csv
After that, you can process your data easily using an ETL tool.
See MongoImprort Ref.
Maybe you could have a go with uniVocity. It is built to handle all sorts of extreme situations processing data.
Check out the tutorial and see if suits your needs.
Here's a simple project which works with CSV inputs: https://github.com/uniVocity/worldcities-import/
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).
I have a backup file from a big database. its about 85Mb in gzip format and 1.5Gb in sql format.
Now I want to import it in my local database. but no phpMyadmin and nor Naicat for Mysql can't do it. So i want an application to split it to smaller parts and import it part by part.
I tryed notepad++, glogg and TSE Pro ti read and manually split, but except TSE others couldn't open it and TSE hangs after selecting and cutting 10000 line of text.
I also tried Gsplit to split it but it seems Gsplit has it's own type for split-ed parts that isn't txt.
thanks for your help. your help may contain any other solution to restore my db in local...
Thanks to #souvickcse the bigdump worked great.
Initially, I created a database called "sample" and updated the data from massive size CSV file.
Whenever I have small changes in .csv file (some data are added/deleted/modified), I have to update this in database too. Always updating the entire .csv file (large) is not efficient.
Is there any efficient way to update the modified data from .csv file to database?
There's no simple way of doing this.
One plausible way would be to store the old version of the CSV somewhere, run a diff-program between the old and new version, and then use the resulting output to determine what has been updated, changed, or removed, and update the database accordingly.
This is however a bit unreliable, slow, and would take some effort to implement. If you can it would probably be better to adapt the source of the CSV file to update the database directly.
since you also want to delete entries that are not existing in the csv file anymore, you will have to load the complete csv file every time (and truncate the table first) in order to get a 1:1 copy.
For a more convenient synchonzation you probably will have to utilize some scripting language (php, python etc).
Sorry thats all I know...
In my experience it's almost impossible to consider a datafile that changes regularly as your "master data set": unless you can somehow generate a diff file that shows where the masterdata changed you will always be forced to run through the entire csv file, query the database to return the corresponding record and then either do nothing (if identical), insert (if new) or update (if modified). In many cases it will even be faster to just drop the table and reload the entire thing, but that can lead to serious operational problems.
Therefor, if it's at all possible for you I'd consider the database as the masterdata, and generate the csv file from there.
I have to get a backup of my database everyday and I use mysql dump with shell commands to get backup from database
I want to know the progress of backup process .
so I need to know to the backup file size and also the file which is being created as the backup.
how can I have these ?
any answers will be appreciated .
The MySQL information_schema table will give you meta-information about a database, including the total size for each table. See: http://dev.mysql.com/doc/refman/5.0/en/tables-table.html
There is an example in first comment of calculating the size for an entire database.
Note however that your mysqldump output will have overhead depending on your output format: integer values are represented as text, you'll have extra SQL or XML stuff, etc.
You may need to take the sizes provided and scale them up by a fudge factor to get an estimate for the dump size.
And for the dump file name: that's chosen by you (or the shell script you're using) as an argument to mysqldump
you can use the argument --show-progress-size of mysqldump.exe and read periodically the standard output.