How to handle a big CSV file? - mysql

I am planning to add the list of all cities in the world to my application (BTW: I am using Ruby, Ruby on Rails and MySQL) so I thought to use the CSV file downloaded from the www.maxmind.com website.
However, I am worried and doubtful because the unpacked file is about 151,1 MB on disk (!) and I should put all those values in my database. How do you advice to proceed (also for MySQL indexes...)?

Using LOAD INFILE is the only way to import it, but index and performance considerations will be dependant on what you import and how you're going to use it. Research, research, research...good starting poing is Large Files with LOAD DATA INFILE

Related

import very large XML file (60GB) into MySQL

I have a XML file with a size of almost 60 GB that I would like to import into a MySQL database. I have root access to my server, but I don't know how to handle files of this size.
You guys have any idea?
Normally I use Navicat, but it gave up...
Thanks
This is a little out of my area of knowledge but would this work ?
LOAD XML LOCAL INFILE '/pathtofile/file.xml'
INTO TABLE my_tablename(name, date, etc);
I know this sort of thing work with <1GB files, but I've yet to work with large files.
Hope this helps !
EDIT
If that doesn't work for you go take a look at the LOAD DATA documentation http://dev.mysql.com/doc/refman/5.1/en/load-data.html
You could use a command line xml splitter to split it into manageable size files first. Google to find one.

How to import the data from .MYD into MATLAB?

I just obtained a bunch of MySQL data stored in raw MySQL ( MyISAM table) format in a .MYD file.
I now wish to start data analysis over those data. All I need to do is just reading the numbers into my MATLAB and process them.
What is the easiest way of doing so? I am using Mac OS, by the way.
Creating a mysql database and dropping the file into a (non running at the time) mysql server is certainly one way to get to the stage where you have the data in a form you can re-export it.
I am not familiar with MACos locations but in linux the data structure is :
/var/lib/mysql/databasename/*.MYI and MYD
I would be leery of trying to extract an ISAM file using anything other than mysql frankly.
Maybe someone else knows better, but I don't :-)

Importing csv into multiple mysql databases from rails app

I have a CSV file consisting of 78,000 records.Im using smarter_csv (https://github.com/tilo/smarter_csv) to parse the csv file. I want to import that into a MySQL database from my rails app. I have the following two questions
What would be the best approach to quickly importing such a large data-set into MySQL from my rails app ?. Would using resque or sidekiq to create multiple workers be a good idea ?
I need to insert this data into a given table which is present in multiple databases. In Rails, i have a model talk to only one database. So how can i scale the solution to talk to multiple mysql databases from my model ?
Thank You
One way would be to use the native interface of the database application itself for importing and exporting; it would be optimised for that specific purpose.
For MySQL, the mysqlimport provides that interface. Note that the import can also be done as an SQL statement and that this executable provides a much saner interface for the underlying SQL command.
As far as implementation goes, if this is a frequent import exercise, the sidekiq/resque/cron job is the best possible approach.
[EDIT]
The SQL command referred to above is the LOAD DATA INFILE as the other answer points out.
Performance wise probably the best method is the use MYSQL's LOAD DATA INFILE syntax and execute an import command on each database. This requires the data file to be local to each database instance.
As the other answer suggests, mysqlimport can be used to ease the import as the LOAD DATA INFILE statement syntax is highly customisable and can deal with many data formats.

Split huge mysql insert into multiple files suggestions

I have a Huge mysql dump I need to import, I managed to split the 3gig file by table insert, one of the table inserts is 600MBs, I want to split it into 100 MB files. So my question is: is there a script or easy way to split a 600MB INSERT statement into multiple 100MB inserts without having to open the file (as this kills my pc).
I tried SQLDumpSplitter but this does not help.
here is the reason I cannot just run the 600MB file:
MYSQL import response 'killed'
Please help
On Linux, easiest way to split files is split -l N - split to pieces N lines each.
On Windows, I've had pretty good luck with HxD - it works well with huge files.
You can easily open a file of 1GB on Textpad software. User this software to open the file and split your queries as what you want.
Link for downloading TextPad software TextPad

Loading large CSV file to MySQL with Django and transformations

I have a large CSV file (5.4GB) of data. It's a table with 6 columns a lot of rows. I want to import it into MySQL across several tables. Additionally I have to do some transformations to the data before import (e.g. parse a cell, and input the parts into several table values etc.). Now I can either do a script does a transformation and inserts a row at a time but it will take weeks to import the data. I know there is the LOAD DATA INFILE for MySQL but I am not sure how or if I can do the needed transformations in SQL.
Any advice how to proceed?
In my limited experience you won't want to use the Django ORM for something like this. It will be far too slow. I would write a Python script to operate on the CSV file, using Python's csv library. And then use the native MySQL facility LOAD DATA INFILE to load the data.
If the Python script to massage the CSV file is too slow you may consider writing that part in C or C++, assuming you can find a decent CSV library for those languages.