Importing csv into multiple mysql databases from rails app - mysql

I have a CSV file consisting of 78,000 records.Im using smarter_csv (https://github.com/tilo/smarter_csv) to parse the csv file. I want to import that into a MySQL database from my rails app. I have the following two questions
What would be the best approach to quickly importing such a large data-set into MySQL from my rails app ?. Would using resque or sidekiq to create multiple workers be a good idea ?
I need to insert this data into a given table which is present in multiple databases. In Rails, i have a model talk to only one database. So how can i scale the solution to talk to multiple mysql databases from my model ?
Thank You

One way would be to use the native interface of the database application itself for importing and exporting; it would be optimised for that specific purpose.
For MySQL, the mysqlimport provides that interface. Note that the import can also be done as an SQL statement and that this executable provides a much saner interface for the underlying SQL command.
As far as implementation goes, if this is a frequent import exercise, the sidekiq/resque/cron job is the best possible approach.
[EDIT]
The SQL command referred to above is the LOAD DATA INFILE as the other answer points out.

Performance wise probably the best method is the use MYSQL's LOAD DATA INFILE syntax and execute an import command on each database. This requires the data file to be local to each database instance.
As the other answer suggests, mysqlimport can be used to ease the import as the LOAD DATA INFILE statement syntax is highly customisable and can deal with many data formats.

Related

Nifi for database migration

Why would nifi be a good use case for database migration if all it does is sending the same data over and over again?(I have tried to extract data from a database and putting them into a JSON file I was seeing multiple entries of the same tuple.) Wouldn't that be a waste of computing resources?
If I just want to migrate the database once and sometimes update the changed columns only, is nifi still a good tool to use?
It all depends on which database you want to migrate from/to which environments. Is it a large enterprise Oracle DB you want to migrate into Hadoop? Look into Sqoop https://sqoop.apache.org/. I would recommend Sqoop for doing one-time imports of large databases into Hadoop.
You can use NiFi to do an import as well, using processors such as ExecuteSQL, QueryDatabaseTable, GenerateTableFetch... They all work with JDBC connectors, so depending on if your database supports this, you could opt for this as well.
If you want to get incremental changes, you could use the QueryDatabaseTable processor and use it's Maximum-Value Column property, Matt Burgess has an article explaining how you can put this in place over at https://community.hortonworks.com/articles/51902/incremental-fetch-in-nifi-with-querydatabasetable.html.

How to import from sql dump to MongoDB?

I am trying to import data from MySQL dump .sql file to get imported into MongoDB. But I could not see any mechanism for RDBMS to NoSQL data migration.
I have tried to convert the data into JSON and CSV but it is not giving m the desired output in the MongoDB.
I thought to try Apache Sqoop but it is mostly for SQL or NoSQL to Hadoop.
I could not understand, how it can be possible to migrate data from 'MySQL' to 'MongoDB'?
I there any thought apart from what I have tried till now?
Hoping to hear a better and faster solution for this type of migration.
I suggest you dump Mysql data to a CSV file,also you can try other file format,but make sure the file format is friendly so that you can import the data into MongoDB easily,both of MongoDB and Mysql support CSV file format very well.
You can try to use mysqldump or OUTFILE keyword to dump Mysql databases for backup,using mysqldump maybe takes a long time,so have a look at How can I optimize a mysqldump of a large database?.
Then use mongoimport tool to import data.
As far as I know,there are three ways to optimize this importing:
mongoimport --numInsertionWorkers N It will start several insertion workers, N can be the number of cores.
mongod --njournal Most of the continuous disk usage come from the journal,so disable journal might be a good way for optimizing.
split up your file and start parallel jobs.
Actually in my opinion, importing data and exporting data aren't difficulty,it seems that your dataset is large,so if you don't design you document structure,it still make your code slower,it is not recommended doing automatic migrations from relational database to MongoDB,the database performance might not be good.
So it's worth designing your data structure, you can check out Data models.
Hope this helps.
You can use Mongify which helps you to move/migrate data from SQL based systems to MongoDB. Supports MySQL, PostgreSQL, SQLite, Oracle, SQLServer, and DB2.
Requires ruby and rubygems as prerequisites. Refer this documentation to install and configure mongify.

Is there any tool to migrate data from MySQL (or mongodb) to Aerospike?

I would like to migrate data from MySQL (or mongodb) to Aerospike, anyone knows if exists any tool to do that?
Aerospike provides something like a csv loader.
https://github.com/aerospike/aerospike-loader
So you can play around with mysqldump data , process the dumped file to create a csv as per the accepted format of aerospike-loader and then load the data into aerospike.
There is a simple python script I have developed. I have been using it for my projects since we are moving all our databases from MySQL to MongoDB. It migrates around 1 Lakh rows in a minute. Migrate MySQL to MongoDB

Grails with CSV (No DB)

I have been building a grails application for quite a while with dummy data using MySQL server, this was eventually supposed to be connected to Greenplum DB (postgresql cluster).
But this is not feasible anymore due to firewall issues.
We were contemplating connecting grails to a CSV file on a shared drive( which is constantly updated by greenplum DB, data is appended hourly only)
These CSV files are fairly large(3mb, 30mb and 60mb) The last file has 550,000+ rows.
Quick questions:
Is this even feasible? Can CSV be treated as a database and can grails directly access this CSV file and run queries on it, similar to that of a DB?
Assuming this is feasible, how much rework will be required in the grails codes in Datasource, controller and index ( Currently, we are connected to Mysql and we filter data in controller and index using sql queries and ajax calls using remotefunction)
Will the constant reading( csv -> grails ) and writing (greenplum -> csv) render the csv file corrupt or bring up any more problems?
I know this is not a very robust method, but I really need to understand the feasibility of this idea. Can grails function wihtout any DB and merely a CSV file on a shared drive accesssible to multiple users?
The short answer is, No. This won't be a good solution.
No.
It would be nearly impossible, if at all possible to rework this.
Concurrent access to a file like that in any environment is a recipe for disaster.
Grails is not suitable for a solution like this.
update:
Have you considered using the built in H2 database which can be packaged with the Grails application itself? This way you can distribute the database engine along with your Grails application within the WAR. You could even have it populate it's database from the CSV you mention the first time it runs, or periodically. Depending on your requirements.

How to import a .dmp file (Oracle) into MySql DB?

The .dmp is a dump of a table built in Oracle 10g (Express Edition) and one of the fields is of CLOB type.
I was trying to simply export the table to xml/csv files then import it to the MySql, but the export simply ignored the CLOB field... (I was using sqldeveloper for that).
I noticed this post explaining how to extract the CLOB to text file but it seems to miss the handling of the other fields or at least the primary key fields. can it be adopted to create a csv of the complete table? (I am not familiar with plsql at all)
As the brute force approach, I can use my python interface to simply query for all the records and spool it to a flat file but I'm afraid it will take a LOOOONG time (query for all records replace all native commas with the ascii... )
Thanks guys!
if you can get the mysql server and the oracle server on the same network, you might want to look at the mysql administrator tools, which includes the migration toolkit. you can connect to the oracle server with the migration toolkit and it will automatically create tables and move data for you.
Here is a documentation explaining the migration process: http://www.mysql.com/why-mysql/white-papers/mysql_wp_oracle2mysql.php
and you can use Data Wizard for MySQL . Trial version is fully usable for 30 days.
After about 2 hours of installing and uninstalling the MySql on the same machine (mylaptop) in order to use the migration tool kit as suggested by longneck, I decided to simply implement the dump and here it is for the likes of me that have minimal admin experience and get hard time to make both DBs work together (errors 1130, 1045 and more).
Surprisingly, it is not as slow as I expected: OraDump
Any comments and improvements are welcomed.