CSV to MySQL conversion and import - mysql

Im working on a large project, havent had to do what I need help with before. I have a csv file, it contains a large amount of data, namely all of the cities, towns, suburbs in Australia. I need to convert the csv file to sql for mysql, and then import it into the database.
What would be the best way to achieve this?

Use LOAD DATA INFILE or the equivalent command-line tool mysqlimport.
These are easy to use for loading CSV data, and this method runs 20x faster than importing rows one at a time with SQL.

Related

searching for an option to compare data of an sql file with a txt file in csv form

I got a Problem, cause I'm totally new to sql and have to kinda learn it in an internship. So I had to import huge txt files into a database in phpmyadmin (took me for ever but managed it with load data infile). Now my task is to find a way to control if the data of the tables is the same as the data of the given txt files.
Is there any possibility to do so ?
Have you tried exporting the data through phpMyAdmin using a different file format instead of .sql? phpMyAdmin gives you several choices including CSV, OpenOffice spreadsheets. That would make your compare easier. You could use Excel, sort the data and you'd have a quicker compare.
the best way to do so is to load, and then extract.
Compare your extract with the original file.
Another way could be to count the number of line in both table and file. And extract few lines, and verify that they both exists. This is less precise.
But this has nothing to do with SQL. It is just a test logic.

Import Only Matched Data From CSV to MySQL

i need to import some data from very huge csv file which is about 1GB.
instead of importing all, i want to just import matched data, i think it will be more easy and faster than importing all data.
i need to search "Post Code District" column of CSV file, if it contains LS1 or LS2 or LS10, import matched data into tabel in SQL?
Misconception. You think that filtering a text file against a database table is going to be faster than just loading the entire file into the database.
I support there are extreme cases where this might be true. But, in general, the safest way to handle these types of situation is:
Import the file into a staging table.
Add indexes, as necessary to the staging table for performance.
Run a query to copy the data you want from the staging table.
I could phrase this a different way. In the time it would take you to figure out how to efficient combine information from the file and a database table, you could probably go through the above process 10-50 times.

Big data migration from Oracle to MySQL

I received over 100GB of data with 67million records from one of the retailers. My objective is to do some market-basket analysis and CLV. This data is a direct sql dump from one of the tables with 70 columns. I'm trying to find a way to extract information from this data as managing itself in a small laptop/desktop setup is becoming time consuming. I considered the following options
Parse the data and convert the same to CSV format. File size might come down to around 35-40GB as more than half of the information in each records is column names. However, I may still have to use a db as I cant use R or Excel with 66 million records.
Migrate the data to mysql db. Unfortunately I don't have the schema for the table and I'm trying to recreate the schema looking at the data. I may have to replace to_date() in the data dump to str_to_date() to match with MySQL format.
Are there any better way to handle this? All that I need to do is extract the data from the sql dump by running some queries. Hadoop etc. are options, but I dont have the infrastructure to setup a cluster. I'm considering mysql as I have storage space and some memory to spare.
Suppose I go in the MySQL path, how would I import the data? I'm considering one of the following
Use sed and replace to_date() with appropriate str_to_date() inline. Note that, I need to do this for a 100GB file. Then import the data using mysql CLI.
Write python/perl script that will read the file, convert the data and write to mysql directly.
What would be faster? Thank you for your help.
In my opinion writing a script will be faster, because you are going to skip the SED part.
I think that you need to setup a server on a separate PC, and run the script from your laptop.
Also use tail to faster get a part from the bottom of this large file, in order to test your script on that part before you run it on this 100GB file.
I decided to go with the MySQL path. I created the schema looking at the data (had to increase a few of the column size as there were unexpected variations in the data) and wrote a python script using MySQLdb module. Import completed in 4hr 40mins on my 2011 MacBook Pro with 8154 failures out of 67 million records. Those failures were mostly data issues. Both client and server are running on my MBP.
#kpopovbg, yes, writing script was faster. Thank you.

How to import large mysql dumps into hadoop?

I need to import wikipedia dumps(mysql tables, unpacked files take about 50gb) into Hadoop(hbase). Now first I load dump into mysql and then transfer data from mysql to hadoop. But loading data into mysql takes huge amount of time - about 4-7 days. Is it possible to load mysql dump directly to hadoop(by means of some dump file parser or something similar)?
As far as I remember - MySQL Dumps are almost entirely is set of insert statements. You can parse them in your mapper and process as is... If you have only few tables hard code parsing in java should be trivial.
use sqoop . A tool that import mysql data into HDFS with map reduce jobs.
It is handy.

Import multiple (100+) Excel files into MySQL Table

I have 100+ XLSX files that I need to get into a MySQL Database. Each file is a bit different, so I've created a massive table that contains a field for each possible column header across all files. This way they can auto-map on import.
Using Navicat I can import the files one at a time, but I'm wondering if there is a way to import all of the files at one time?
I'm thinking 'no' but if your going to do this a lot I think there are ways to automate this.
Export your xls-files into csv-files instead. Write some script to transform them between csv-excel style to csv-mysql style.
Create tables for importing with the csv engine. Put your files inplace of the ones created by mysql and flush tables. Now your data is ready to be read inside mysql and copied over to more powerful tables engines.
Another way is to do a VBA-script that exports the data in a format recognized by load data infile and then load them using mysql.