Inserting data from CSV into MySQL DB is very slow - mysql

Trying to insert data from a CSV file to a MySQL DB using Ruby, and it's very slow. Note that this is not a Rails application, just stand-alone Ruby script.
Here is my code:
def add_record (data1, data2, time)
date = DateTime.strptime(time, "%m/%d/%y %H:%M")
<my table>.create(data1: data1, data2: data2, time: date)
end
def parse_file (file)
path = #folder + "\\" + file
CSV.foreach(path, {headers: :first_row}) do |line|
add_record(line[4], line[5], line[0])
end
end
def analyze_data ()
Dir.foreach #folder do |file|
next if file == '.' or file == '..'
parse_file file
end
end
And my connection:
#connection = ActiveRecord::Base.establish_connection(
:adapter=> "mysql2",
:host => "localhost",
:database=> <db>,
:username => "root",
:password => <pw>
)
Any help appreciated.

Use Load Data Infile.
Here is a nice article on performance and strategies titled Testing the Fastest Way to Import a Table into MySQL. Don't let the mysql version of the title or inside the article scare you away. Jumping to the bottom and picking up some conclusions:
The fastest way you can import a table into MySQL without using raw
files is the LOAD DATA syntax. Use parallelization for InnoDB for
better results, and remember to tune basic parameters like your
transaction log size and buffer pool. Careful programming and
importing can make a >2-hour problem became a 2-minute process. You
can disable temporarily some security features for extra performance
You might just find your times greatly reduced.

Use the zdennis/activerecord-import gem. you can insert tons records quickly.

Related

Values are not inserted into MySQL table using pool.apply_async in python2.7

I am trying to run the following code to populate a table in parallel for a certain application. First the following function is defined which is supposed to connect to my db and execute the sql command with the values given (to insert into table).
def dbWriter(sql, rows) :
# load cnf file
MYSQL_CNF = os.path.abspath('.') + '/mysql.cnf'
conn = MySQLdb.connect(db='dedupe',
charset='utf8',
read_default_file = MYSQL_CNF)
cursor = conn.cursor()
cursor.executemany(sql, rows)
conn.commit()
cursor.close()
conn.close()
And then there is this piece:
pool = dedupe.backport.Pool(processes=2)
done = False
while not done :
chunks = (list(itertools.islice(b_data, step)) for step in
[step_size]*100)
results = []
for chunk in chunks :
print len(chunk)
results.append(pool.apply_async(dbWriter,
("INSERT INTO blocking_map VALUES (%s, %s)",
chunk)))
for r in results :
r.wait()
if len(chunk) < step_size :
done = True
pool.close()
Everything works and there are no errors. But at the end, my table is empty, meaning somehow the insertions were not successful. I have tried so many things to fix this (including adding column names for insertion) after many google searches and have not been successful. Any suggestions would be appreciated. (running code in python2.7, gcloud (ubuntu). note that indents may be a bit messed up after pasting here)
Please also note that "chunk" follows exactly the required data format.
Note. This is part of this example
Please note that the only thing I am changing in the above example (linked) is that I am separating the steps for creation of and inserting into the tables since I am running my code on gcloud platform and it enforces GTID standards.
Solution was changing dbwriter function to:
conn = MySQLdb.connect(host = # host ip,
user = # username,
passwd = # password,
db = 'dedupe')
cursor = conn.cursor()
cursor.executemany(sql, rows)
cursor.close()
conn.commit()
conn.close()

Ruby Mysql2 Client not taking backslash while insert

We are using production and staging databases in our application.
Our requirement is to insert all the records to staging database when ever a record is added in production database, so that both the servers are consistent and same data.
I have used Mysql2 client pool to connect to staging server and insert the record that is added to production.
here is my code:
def create
#aperson = Person.new
#person = #aperson.save
if #person && Rails.env == "production"
#add_new_person_to_staging
client = Mysql2::Client.new(:host => dbconfig[:host], :username => dbconfig[:username], :password => dbconfig[:password], :database => dbconfig[:database])
#person_result = client.query('INSERT INTO user_types(user_name, regex, code) Values ("myname" , "\.myregex\." , "ns" );')
end
end
Here "#person_result" record is inserted to mysql table but the "regex" column eliminates "\" slashes.
like : user_name = myname, regex = .myregex., code = ns
when I manually execute the "Insert" query in mysql command line it inserts as it is along with \ slash. but not through "client.query"
Why does \ slash is eliminated. please help me here.
Thanks.
\ is likely being removed by the MySQL2 client as part of a SQL injection protection preprocessor.
Have you looked at trying either a double backslash or using the escape method to properly escape the string?
Try using this
#person_result = client.query('INSERT INTO user_types(user_name, regex, code) Values (myname , "\."+myregx+".\" , ns )')

Dashing: Ruby: CentOS: Not closing MySQL processes

I am having trouble with my server.
It is a CentOS RedHat Linux server and runs "Dashing" a Ruby/Sinatra-based dashboard.
I am trying to close the active connections as defined by my MySQL database "SHOW PROCESSLIST;"
Example.rb File
require 'mysql2'
SCHEDULER.every '10s'do
db = Mysql.new('host_name', 'database_name', 'password', 'table')
mysql1 = "SELECT `VAR` from `TABLE` ORDER BY `VAR` DESC LIMIT 1"
result1 = db.query(mysql1)
result1.each do |row|
strrow1 = row[0]
$num1 = strrow1.to_i
end
...
db.close
LINK[0] = { label: 'LABEL', value: $num1}
...
send_event('LABEL FOR HTML', { items: LINK.values })
end
However, after a few clicks back and forth, it is clear that the database does not drop the connections, but instead keeps them. This causes the browser to slow down to the point that loading a page becomes impossible and the output of the log reads:
"max_user_connections" reached
Can anyone think of a way to fix this?
It is a best practice for DB/File/handle stuff to be in a begin/rescue/ensure block. It could be that something is happening and Rufus/Dashing is just being quiet about the error since they trap exceptions and go on their merry way. This would prevent your db connection from closing. The symptoms you are having could be from a similar problem, either way it's a good idea.
SCHEDULER.every '10s'do
begin
db = Mysql.new('host_name', 'database_name', 'password', 'table')
# .... stuff ....
rescue
# what happens if an error happens? log it, toss it, ignore it?
ensure
db.close
end
# ... more stuff if you want ...
end

How to import large csv file into mysql in rails application?

I'm implementing import csv data into mysql in rails application. I have use CSV.parse to read line by line in csv file and import into database. This way works well.
But, when I deploy to Heroku server, timeout for each request is 30 seconds. If import csv file more than 30 seconds. Heroku server has error: request timeout - H12. Does anyone help me find out the best way to import large csv file? Now, I only import small csv include 70 users. I want import large csv include 500 - 1000 users. Here is the code:
Import controller:
CSV.foreach(params[:file].path, :headers => true) do |row|
i = i + 1
if i == 1
#company = Company.find_or_create_by!(name: row[0])
end
#users = User.find_by(email: row[1])
if #users
if #company.id == #users.employee.company_id
render :status=> 401, :json => {:message=> "Error"}
return
else
render :status=> 401, :json => {:message=> "Error"}
return
end
else
# User
# # Generate password
password = row[2]
user = User.new(email: row[1])
user.password = password.downcase
user.normal_password = password.downcase
user.skip_confirmation!
user.save!
obj = {
'small' => 'https://' + ENV['AWS_S3_BUCKET'] + '.s3.amazonaws.com/images/' + 'default-profile-pic_30x30.png',
'medium' => 'https://' + ENV['AWS_S3_BUCKET'] + '.s3.amazonaws.com/images/' + 'default-profile-pic_40x40.png'
}
employee = Employee.new(user_id: user.id)
employee.update_attributes(name: row[3], job_title: row[5], gender: row[9], job_location: row[10], group_name: row[11], is_admin: to_bool(row[13]),
is_manager: to_bool(row[14]), is_reviewee: to_bool(row[6]), admin_target: row[7], admin_view_target: row[12], department: row[8],
company_id: #company.id, avatar: obj.to_json)
employee.save!
end
end
I has try use gems 'activerecord-import' or 'fastercsv' but 'activerecord-import' not work, 'fastercsv' not work for ruby 2.0 and rails 4.0
Doing this in a controller seems a bit much to me, especially since it's blocking. Have you given any thought to throwing it in a background job?
If I were you I'd:
Upload the file
Parse it in the background as a rake task
Also, have a look at: https://github.com/tilo/smarter_csv
Process your CSV in the background, using products such as delayed_job, sidekiq, resque. If it fits your usecase, you can even do this using guard or cron.
It seems that these lines
if i == 1
#company = Company.find_or_create_by!(name: row[0])
end
#users = User.find_by(email: row[1])
takes a lot of computation cycle in your 30 seconds timeframe.
I would suggest to convert your routine into Heroku background process by using resque or delayed_job, or split the routine into n requests, if we cannot somewhat optimize the code above.
Hope this helps.

Ruby On Rails: Testing deletes tables

I'm creating an application in RoR and I'm implementing unit testing in all my models.
When I run every test on his own (by running ruby test/unit/some_test.rb) all tests are successful.
But when I run all tests together (by running rake test:units) some tables from both databases (development e test) are deleted.
I'm using raw SQL (mysql) do create tables because I need composite primary keys and physical constraints so I figured it would be the best. Maybe this be the cause?
All my tests are in this form:
require File.dirname(FILE) + '/../test_helper'
require File.dirname(FILE) + '/../../app/models/order'
class OrderTestCase < Test::Unit::TestCase
def setup
#order = Order.new(
:user_id => 1,
:total => 10.23,
:date => Date.today,
:status => 'processing',
:date_concluded => Date.today,
:user_address_user_id => 3,
:user_address_address_id => 5,
:creation_date => Date.today,
:update_date => Date.today
)
end
################ Happy Path
def test_happy_path
assert #order.valid?, #order.errors.full_messages
end
...
The errors I get when running the tests are something like this:
3) Error:
test_empty_is_primary(AddressTestCase):
ActiveRecord::StatementInvalid: Mysql::Error: Table 'shopshop_enterprise_test.addresses' doesn't exist: SHOW FIELDS FROM addresses
/test/unit/address_test.rb:9:in new'
/test/unit/address_test.rb:9:insetup'
Any guesses?
Thanks!
PS: When using postgres as the database engine, everything works fine with rake test:units! (of course, with the correct changes so the sql statements can work with postgres)