Is anyone aware of any tutorials that demonstrate how to import data in a Ruby app with FasterCSV and saving it to a SQLite or MySQL database?
Here are the specific steps involved:
Reading a file line by line (the .foreach method does this according to documentation)
Mapping header names in file to database column names
Creating entries in database for CSV data (seems doable with .new and .save within a .foreach block)
This is a basic usage scenario but I haven't been able to find any tutorials for it, so any resources would be helpful.
Thanks!
So it looks like FasterCSV is now part of the Ruby core as of Ruby 1.9, so this is what I ended up doing, to achieve the goals in my question above:
#importedfile = Import.find(params[:id])
filename = #importedfile.csv.path
CSV.foreach(filename, {:headers => true}) do |row|
#post = Post.find_or_create_by_email(
:content => row[0],
:name => row[1],
:blog_url => row[2],
:email => row[3]
)
end
flash[:notice] = "New posts were successfully processed."
redirect_to posts_path
Inside the find_or_create_by_email function is the mapping from the database columns to the columns of the CSV file: row[0], row[1], row[2], row[3].
Since it is a find_or_create function I don't need to explicitly call #post.save to save the entry to the database.
If there's a better way please update or add your own answer.
First, start with other Stack Overflow answers: Best way to read CSV in Ruby. FasterCSV?
Before jumping into writing the code, I check whether there is an existing tool to do the import. You might want to look at mysqlimport.
This is a simple example showing how to map the CSV headers to a database's columns:
require "csv"
data = <<EOT
header1, header2, header 3
1, 2, 3
2, 2, 3
3, 2, 3
EOT
header_to_table_columns = {
'header1' => 'col1',
'header2' => 'col2',
'header 3' => 'col3'
}
arr_of_arrs = CSV.parse(data)
headers = arr_of_arrs.shift.map{ |i| i.strip }
db_cols = header_to_table_columns.values_at(*headers)
arr_of_arrs.each do |ary|
# insert into the database using an ORM or by creating insert statements
end
Ruby is great for rolling your own import routines.
Reading a file(handy block structure to ensure that the file handle is closed properly):
File.open( filepath ) do |f|
f.each_line do |line|
do something with the line...
end
end
Mapping header names to columns(you might want to check for matching array lengths):
Hash[header_array.zip( line_array )]
Creating entries in the database using activerecord:
SomeModel.create( Hash[header_array.zip( line_array )] )
It sounds like you are planning to let users upload csv files and import them into the database. This is asking for trouble unless they are savvy about data. You might want to look into a nosql solution to simplify things on the import front.
This seems to be the shortest way, if you can use the ID to identify the records and if no mapping of column names is necessary:
CSV.foreach(filename, {:headers => true}) do |row|
post = Post.find_or_create_by_id row["id"]
post.update_attributes row.to_hash
end
Related
I am creating a report using the following gems:
require "mysql2"
require "watir"
require "io/console"
require "writeexcel"
After I query a database with mysql2 and convert the query into a multidimensional array like so:
Mysql2::Client.default_query_options.merge!(:as => :array)
mysql = Mysql2::Client.new(:host => "01.02.03.405", :username => "user", :password => "pass123", :database => "db")
report = mysql.query("SELECT ... ASC;")
arr = []
report.each {|row| arr << row}
and then finally write the data to an Excel spreadsheet like so:
workbook = WriteExcel.new("File.xls")
worksheet = workbook.add_worksheet(sheetname = "Report")
header = ["Column A Title", ... , "Column N Title"]
worksheet.write_row("A1", header)
worksheet.write_col("A2", arr)
workbook.close
when I open the file in the latest edition of Excel for OSX (Office 365) I get the following error for every cell containing mostly numerals:
This report has a target audience that may become distracted with such an error.
I have attempted all the .set_num_format enumerable methods found in the documentation for writeexcel here.
How can I create a report with columns that contain special characters and numerals, such as currency, with write excel?
Should I look into utilizing another gem entirely?
Define the format after you create the worksheet.
format01 = workbook.add_format
format01.set_num_format('#,##0.00')
then write the column with the format.
worksheet.write_col("A2", arr, format01)
Since I'm not a Ruby user, this is just a S.W.A.G.
Being the sports nerd that I am, I'm looking to take daily XML files that are produced by the Major League Baseball Website, and import them into either an Access or MySQL database. The issue I'm running into, is that almost every XML file they produce is just slightly different than the last. For example, one game file may have a field named batter23 that is next to event22 while the other file calls it batter24and is next to pitcher25. I know that XML files can be inconsistent, but I know there has to be a way to consistently get the data into a database. Is there anyway to standardize these XML files? Some code that will parse each file in a list, and organize them into a specific style and giving them consistent field names? Currently I import the XML file into a Excel sheet first, where I change the file type to a CSV, but from there the field names and column locations are still different from file to file.
My goal is to have all the files in a structure where I can quickly import them into a database each day, without having to manually change column locations, or field names. I'm open to any and all options, but my experience in most languages are rookie level at best, so forgive me for my lack of knowledge.
The files are pretty standard as far as XML goes.., you just have to figure what each file represents.
I did a quick look around a Red Sox v Royals game from September 14. (Go Sox!)
In year_2014/month_09/day_14/gid_2014_09_14_bosmlb_kcamlb_1/players.xml
I can see that Ortiz has an id of 120074.
If I look in batters for his player Id, I can see his stats for that game.
(year_2014/month_09/day_14/gid_2014_09_14_bosmlb_kcamlb_1/batters/120074.xml)
It goes on. Basically, in order to load these files into a database, you will have perform some level of processing for them to make any sense.
The IDs don't appear to change between games, but I only took a cursory glance.
As for loading the data, XML::Simple in perl can take an XML and spit out a perl data structure very easily. Unless you need something more heavy duty, this should cover you.
Loading the players.xml:
#!/bin/env perl
use strict; use warnings;
use Data::Dumper;
use XML::Simple;
my $players_xml = XMLin('players.xml');
print Dumper $xml;
Gives you something like:
$VAR1 = {
'venue' => 'Kauffman Stadium',
'date' => 'September 14, 2014',
'team' => {
'Boston Red Sox' => {
'id' => 'BOS',
'player' => {
'605141' => {
'avg' => '.283',
'team_abbrev' => 'BOS',
'parent_team_id' => '111',
'hr' => '4',
'team_id' => '111',
'status' => 'A',
'last' => 'Betts',
'rl' => 'R',
'parent_team_abbrev' => 'BOS',
'first' => 'Mookie',
'rbi' => '12',
'game_position' => '2B',
'num' => '50',
'position' => '2B',
'current_position' => '2B',
'boxname' => 'Betts',
'bats' => 'R',
'bat_order' => '1'
},
...
It's then trivial to navigate these hashes and insert DB rows as you like.
I have a client who has a database of images/media that uses a file naming convention that contains a page number for each image in the filename itself.
The images are scans of books and page 1 is often simply the cover image and the actual “page 1” of the book is scanned on something like scan number 3. With that in mind the filenames would look like this in the database field filename:
great_book_001.jpg
great_book_002.jpg
great_book_003_0001.jpg
great_book_004_0002.jpg
great_book_005_0003.jpg
With that in mind, I would like to extract that page number from the filename using MySQL’s SUBSTRING_INDEX. And using pure MySQL it took me about 5 minutes to come up with this raw query which works great:
SELECT `id`, `filename`, SUBSTRING_INDEX(SUBSTRING_INDEX(`filename`, '.', 1), '_', -1) as `page`
FROM `media_files`
WHERE CHAR_LENGTH(SUBSTRING_INDEX(SUBSTRING_INDEX(`filename`, '.', 1), '_', -1)) = 4
ORDER BY `page` ASC
;
The issue is I am trying to understand if it’s possible to implement column aliasing using SUBSTRING_INDEX while using the Sequel Gem for Ruby?
So far I don’t seem to be able to do this with the initial creation of a dataset like this:
# Fetch a dataset of media files.
one_to_many :media_files, :class => MediaFiles,
:key => :id, :order => :rank
Since the returned dataset is an array, I am doing is using the Ruby map method to roll through the fetched dataset & then doing some string processing before inserting a page into the dataset using the Ruby merge:
# Roll through the dataset & set a page value for files that match the page pattern.
def media_files_final
media_files.map{ |m|
split_value = m[:filename].split(/_/, -1).last.split(/ *\. */, 2).first
if split_value != nil && split_value.length == 4
m.values.merge({ :page => split_value })
else
m.values.merge({ :page => nil })
end
}
end
That works fine. But this seems clumsy to me when compared to a simple MySQL query which can do it all in one fell swoop. So the question is, is there any way I can achieve the same results using the Sequel Gem for Ruby?
I gather that perhaps SUBSTRING_INDEX is not easily supported within the Sequel framework. But if not, is there any chance I can insert raw MySQL instead of using Sequel methods to achieve this goal?
If you want your association to use that additional selected column and that filter, just use the :select and :conditions options:
substring_index = Sequel.expr{SUBSTRING_INDEX(SUBSTRING_INDEX(:filename, '.', 1), '_', -1)}
one_to_many :media_files, :class => MediaFiles,
:key => :id, :order => :page,
:select=>[:id, :filename, substring_index.as(:page)],
:conditions => {Sequel.function(:CHAR_LENGTH, substring_index) => 4}
I am trying to receive a JSON post to my Rails 3 application. The JSON post is just an email with a subject which will be one of the following:
BACKUP_PASS/VERIFY_PASS
BACKUP_FAIL/VERIFY_FAIL
BACKUP_FAIL/VERIFY_PASS
etc..
I have the following code in my controller:
def backupnotification
email_payload = JSON.parse(params[:payload])
Activity.create(:action => 'failed to backup', :details => email_payload['recipient'], :user_id => '28')
end
I've also added the following to my routes file:
post '/api/activity/backupnotification' => 'activities#backupnotification'
Obviously, this would create a new Activity record regardless of the backup status. What I would like to do is create an activity with an action of failed to backup if the term FAIL appears anywhere in the subject, and successfully backed up if the term FAIL does not exist.
The JSON post (email_payload) includes an attribute called subject. I was wondering if I could do something like this:
if email_payload['subject'] => "FAIL"
...
else
...
end
What would be the best way of doing this?
Assuming you can access your subject in a similar way as your recipient, you can try something like this.
def backupnotification
email_payload = JSON.parse(params[:payload])
if email_payload['subject'].include?('FAIL')
action_message = 'failed to backup'
else
action_message = 'successfully backed up'
end
Activity.create(
:action => action_message,
:details => email_payload['recipient'],
:user_id => '28')
end
I have a Jruby on Rails application with Neo4j.rb and a model, let's say Auth, defined like this:
class Auth < Neo4j::Rails::Model
property :uid, :type => String, :index => :exact
property :provider, :type => String, :index => :exact
property :email, :type => String, :index => :exact
end
And this code:
a = Auth.find :uid => 324, :provider => 'twitter'
# a now represents a node
a.to_json
# outputs: {"auth":{"uid": "324", "provider": "twitter", "email": "email#example.com"}}
Notice that the ID of the node is missing from the JSON representation. I have a RESTful API within my application and I need the id to perform DELETE and UPDATE actions.
I tried this to see if it works:
a.to_json :only => [:id]
But it returns an empty JSON {}.
Is there any way I can get the ID of the node in the JSON representation without rewriting the whole to_json method?
Update The same problems applies also to the to_xml method.
Thank you!
I am answering my own question. I still think that there is a better way to do this, but, for now, I am using the following hack:
In /config/initializers/neo4j_json_hack.rb I put the following code:
class Neo4j::Rails::Model
def as_json(options={})
repr = super options
repr.merge! '_nodeId' => self.id if self.persisted?
end
end
And now every JSON representations of my persisted Neo4j::Rails::Model objects have a _nodeId parameter.
The ID is typically not included because it shouldn't be exposed outside the Neo4j database. Neo4j doesn't guarantee that the ID will be identical from instance to instance, and it wouldn't surprise me if the ID changed in a distributed, enterprise installation of Neo4j.
You should create your own ID (GUID?), save it as a property on the node, index it, and use that to reference your nodes. Don't expose the Neo4j ID to your users or application, and definitely don't rely on it beyond a single request (e.g. don't save a reference to it in another database or use it to test for equality).