I have a CSV file that I want to read with Ruby and create Ruby objects to insert into a MySQL database with Active Record. What's the best way to do this? I see two clear options: FasterCSV & the Ruby core CSV. Which is better? Is there a better option that I'm missing?
EDIT: Gareth says to use FasterCSV, so what's the best way to read a CSV file using FasterCSV? Looking at the documentation, I see methods called parse, foreach, read, open... It says that foreach "is intended as the primary interface for reading CSV files." So, I guess I should use that one?
Ruby 1.9 adopted FasterCSV as its core CSV processor, so I would say it's definitely better to go for FasterCSV, even if you're still using Ruby 1.8
If you have a lot of records to import you might want to use MySQL's loader. It's going to be extremely fast.
LOAD DATA INFILE can be used to read files obtained from external sources. For example, many programs can export data in comma-separated values (CSV) format, such that lines have fields separated by commas and enclosed within double quotation marks, with an initial line of column names. If the lines in such a file are terminated by carriage return/newline pairs, the statement shown here illustrates the field- and line-handling options you would use to load the file:
LOAD DATA INFILE 'data.txt' INTO TABLE tbl_name
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES;
If the input values are not necessarily enclosed within quotation marks, use OPTIONALLY before the ENCLOSED BY keywords.
Use that to pull everything into a temporary table, then use ActiveRecord to run queries against it to delete records that you don't want, then copy from the temp table to your production one, then drop or truncate the temp. Or, use ActiveRecord to search the temporary table and copy the records to production, then drop or truncate the temp. You might even be able to do a table-to-table copy inside MySQL or append one table to another.
It's going to be tough to beat the speed of the dedicated loader, and using the database's query mechanism to process records in bulk. The step of turning a record in the CSV file into an object, then using the ORM to write it to the database adds a lot of extra overhead, so unless you have some super difficult validations requiring Ruby's logic, you'll be faster going straight to the database.
EDIT: Here's a simple CSV header to DB column mapper example:
require "csv"
data = <<EOT
header1, header2, header 3
1, 2, 3
2, 2, 3
3, 2, 3
EOT
header_to_table_columns = {
'header1' => 'col1',
'header2' => 'col2',
'header 3' => 'col3'
}
arr_of_arrs = CSV.parse(data)
headers = arr_of_arrs.shift.map{ |i| i.strip }
db_cols = header_to_table_columns.values_at(*headers)
arr_of_arrs.each do |ary|
# insert into the database using an ORM or by creating insert statements
end
Related
I have many csv files with names 0_0.csv , 0_1.csv , 0_2.csv , ... , 1_0.csv , 1_1.csv , ... , z_17.csv.
I wanted to know how can I import them in a loop or something ?
Also I wanted to know am I doing it good ? ( each file is 50MB and whole files size is about 100GB )
This is my code :
create index on :name(v)
create index on :value(v)
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///0_0.txt" AS csv
FIELDTERMINATOR ','
MERGE (n:name {v:csv.name})
MERGE (m:value {v:csv.value})
CREATE (n)-[:kind {v:csv.kind}]->(m)
You could handle multiple files by constructing a file name. Unfortunately this seems to break when using the USING PERIODIC COMMIT query hint so it won't be a good option for you. You could create a script to wrap it up and send the commands to bin/cypher-shell though.
UNWIND ['0','1','z'] as outer
UNWIND range(0,17) as inner
LOAD CSV WITH HEADERS FROM 'file:///'+ outer +'_' + toString(inner) + '.csv' AS csv
FIELDTERMINATOR ','
MERGE (n:name {v:csv.name})
MERGE (m:value {v:csv.value})
CREATE (n)-[:kind {v:csv.kind}]->(m)
As far as your actual load query goes. Do you name and value nodes come up multiple times in the files? If they are unique, you would be better off loading the the data in multiple passes. Load the nodes first without the indexes; then add the indexes once the nodes are loaded; and then do the relationships as the last step.
Using CREATE for the :kind relationship will result in multiple relationships even if it is the same value for csv.kind. You might want to use MERGE instead if that is the case.
For 100 GB of data though if you are starting with an empty database and are looking for speed, I would take a look at using bin/neo4j-admin import.
I have an assignment to write queries in Neo4J, but the database provided is SAKILA.SQL.
How can I load it into Neo4j?
I've tried to find an answer for this, but had no luck!
Perhaps you can share your sql?
Easiest would be to insert it into a relational database, dump the table contents as CSV and import the data into Neo4j using LOAD CSV. See: http://neo4j.com/developer/guide-importing-data-and-etl/
See: http://neo4j.com/docs/stable/query-load-csv.html
For details on Cypher see: http://neo4j.com/developer/cypher/
So you need to import (i.e. run all those insert statements) into MySQL first and then export into CSV files that Neo4j can use.
In the example Michael posted we used PostgresSQL's 'COPY' command to export CSV files. In MySQL you have a slightly different command as described over here.
You'd have something like:
SELECT * from customer
INTO OUTFILE '/tmp/customers.csv'
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
And then in Neo4j you'd have a query like this:
LOAD CSV WITH HEADERS FROM 'file:/tmp/customers.csv' AS line
MERGE (c:Customer {id: c.id})
ON CREATE SET c.name = line.name
And so on.
You can then do a similar thing to extract your other tables and use the MERGE command to create appropriate relationships between the different nodes.
If you share all the MySQL import script we can show you how to do a more complete translation.
How can I load 10,000 rows of test.xls file into mysql db table?
When I use below query it shows this error.
LOAD DATA INFILE 'd:/test.xls' INTO TABLE karmaasolutions.tbl_candidatedetail (candidate_firstname,candidate_lastname);
My primary key is candidateid and has below properties.
The test.xls contains data like below.
I have added rows starting from candidateid 61 because upto 60 there are already candidates in table.
please suggest the solutions.
Export your Excel spreadsheet to CSV format.
Import the CSV file into mysql using a similar command to the one you are currently trying:
LOAD DATA INFILE 'd:/test.csv'
INTO TABLE karmaasolutions.tbl_candidatedetail
(candidate_firstname,candidate_lastname);
To import data from Excel (or any other program that can produce a text file) is very simple using the LOAD DATA command from the MySQL Command prompt.
Save your Excel data as a csv file (In Excel 2007 using Save As) Check
the saved file using a text editor such as Notepad to see what it
actually looks like, i.e. what delimiter was used etc. Start the MySQL
Command Prompt (I’m lazy so I usually do this from the MySQL Query
Browser – Tools – MySQL Command Line Client to avoid having to enter
username and password etc.) Enter this command: LOAD DATA LOCAL INFILE
‘C:\temp\yourfile.csv’ INTO TABLE database.table FIELDS TERMINATED
BY ‘;’ ENCLOSED BY ‘”‘ LINES TERMINATED BY ‘\r\n’ (field1, field2);
[Edit: Make sure to check your single quotes (') and double quotes (")
if you copy and paste this code - it seems WordPress is changing them
into some similar but different characters] Done! Very quick and
simple once you know it :)
Some notes from my own import – may not apply to you if you run a different language version, MySQL version, Excel version etc…
TERMINATED BY – this is why I included step 2. I thought a csv would default to comma separated but at least in my case semicolon was the deafult
ENCLOSED BY – my data was not enclosed by anything so I left this as empty string ”
LINES TERMINATED BY – at first I tried with only ‘\n’ but had to add the ‘\r’ to get rid of a carriage return character being imported into the database
Also make sure that if you do not import into the primary key field/column that it has auto increment on, otherwhise only the first row will be imported
Original Author reference
I have a mysql database with a single table, that includes an autoincrement ID, a string and two numbers. I want to populate this database with many strings, coming from a text file, with all numbers initially reset to 0.
Is there a way to do it quickly? I thought of creating a script that generates many INSERT statements, but that seems somewhat primitive and slow. Especially since mysql is on remote site.
Yes - use LOAD DATA INFILE docs are here Example :
LOAD DATA INFILE 'csvfile'
INTO TABLE table
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
IGNORE 0 LINES
(cola, colb, colc)
SET cold = 0,
cole = 0;
Notice the set line - here is where you set a default value.
Depending on your field separator change the line FIELDS TERMINATED BY ','
The other answers only respond to half of your question. For the other half (zeroing numeric columns):
Either:
Set the default value of your number columns to 0,
In your text file, simply delete the numeric values,
This will cause the field to be read by LOAD INFILE as null, and the defauly value will be assigned, which you have set to 0.
Or:
Once you have your data in the table, issue a MySQL command to zero the fields, like
UPDATE table SET first_numeric_column_name = 0, second_numeric_column_name = 0 WHERE 1;
And to sum everything up, use LOAD DATA INFILE.
If you have access to server's file system, you can utilize LOAD DATA
If you don't want to fight with syntax, easiest way (if on windows) is to use HeidiSQL
It has friendly wizard for this purpose.
Maybe I can help you with right syntax, if you post sample line from text file.
I recommend you to use SB Data Generator by Softbuilder (which I work for), download and install the free trial.
First, create a connection to your MySQL database then go to “Tools -> Virtual Data” and import your test data (the file must be in CSV format).
After importing the test data, you will be able to preview them and query them in the same window without inserting them into the database.
Now, if you want to insert test data into your database, go to “Tools -> Data Generation” and select "generate data from virtual data".
SB data generator from Softbuilder
I am using the following statement to load data from a file into a table:
LOAD DATA LOCAL INFILE '/home/100000462733296__Stats"
INTO TABLE datapoints
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
(uid1, uid2, status);
Now, if I want to enter a custom value into uid1, say 328383 without actually asking it to read it from a file, how would I do that? There are about 10 files and uid1 is the identifier for each of these files. I am looking for something like this:
LOAD DATA LOCAL INFILE '/home/100000462733296__Stats"
INTO TABLE datapoints
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
(uid1="328383", uid2, status);
Any suggestions?
The SET clause can be used to supply values not derived from the input file:
LOAD DATA LOCAL INFILE '/home/100000462733296__Stats"
INTO TABLE datapoints
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
(uid1, uid2, status)
SET uid1 = '328383';
It's not clear what the data type of uid1 is, but being that you enclosed the value in double quotes I assumed it's a string related data type - remove the single quotes if the data type is numeric.
There's more to read on what the SET functionality supports in the LOAD FILE documentation - it's a little more than 1/2 way down the page.
You could use a python interactive shell instead of MySQL shell to interactvely provide values for MySQL tables.
Install the python inerpreter from python.org (only needed if you are under windows, otherwise you have it already), and the mysql connector from http://sourceforge.net/projects/mysql-python/files/ (ah, I see you are on Lunux/Unix --just install teh mysqldb package then)
After that, you type these three lines in the python shell:
import MySQLdb
connection = MySQLdb.connect(" <hostname>", "< user>", "<pwd>", [ "<port>"] )
cursor = connection.cursor
Adter that you can use the cursor.execute method to issue SQL statements, but retaining th full flexibility of python to change your data.
For example, for this specific query:
myfile = open("/home/100000462733296__Stats")
for line in file:
uid1, uid2, status = line.split("|")
status = status.strip()
cursor.execute("""INSERT INTO datapoints SET uid1="328383", uid2=%s, status=%s""" %(uid2, status) )
voilá !
(maybe with a try: clause around the the "line.split " line to avoid an exception on the last line)
If you don't already, you may learn Python in under one hour with the tutorial at python.org
-- it is really worth it, even if the only things you do at computers is to import data into databases.
2 quick thought (one might be applicable :)):
change the value of uid1 in the file to 328383 in every line.
temporarily change the uid1 column in the table to be non-mandatory, load the contents of the file, then run a query that sets the value to 328383 in every row. Finally, reset the column to mandatory.