I've got a colossal CSV file (2.4 GB) hosted remotely (s3) which I'm trying to ingest into my rails app.
I've loaded it into temp and seems to work fine, but the connection keeps terminating on me SIGTERM about ten minutes after I begin to ingest / iterate over the file.
I'm on heroku running rails 4.2 with mysql 0.3.20.
What am I missing? How do I get this done?
rake aborted!
SignalException: SIGTERM
/app/vendor/bundle/ruby/2.2.0/gems/mysql2-0.3.21/lib/mysql2/client.rb:80:in `_query'
/app/vendor/bundle/ruby/2.2.0/gems/mysql2-0.3.21/lib/mysql2/client.rb:80:in `block in query'
/app/vendor/bundle/ruby/2.2.0/gems/mysql2-0.3.21/lib/mysql2/client.rb:79:in `handle_interrupt'
/app/vendor/bundle/ruby/2.2.0/gems/mysql2-0.3.21/lib/mysql2/client.rb:79:in `query'
/app/vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.0/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:299:in `block in execute'
/app/vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.0/lib/active_record/connection_adapters/abstract_adapter.rb:466:in `block in log'
/app/vendor/bundle/ruby/2.2.0/gems/activesupport-4.2.0/lib/active_support/notifications/instrumenter.rb:20:in `instrument'
/app/vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.0/lib/active_record/connection_adapters/abstract_adapter.rb:460:in `log'
/app/vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.0/lib/active_record/connection_adapters/abstract_mysql_adapter.rb:299:in `execute'
/app/vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.0/lib/active_record/connection_adapters/mysql2_adapter.rb:231:in `execute'
/app/vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.0/lib/active_record/connection_adapters/mysql2_adapter.rb:235:in `exec_query'
/app/vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.0/lib/active_record/connection_adapters/abstract/database_statements.rb:336:in `select'
/app/vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.0/lib/active_record/connection_adapters/abstract/database_statements.rb:32:in `select_all'
/app/vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.0/lib/active_record/connection_adapters/abstract/query_cache.rb:70:in `select_all'
/app/vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.0/lib/active_record/connection_adapters/abstract/database_statements.rb:38:in `select_one'
/app/vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.0/lib/active_record/connection_adapters/abstract/database_statements.rb:43:in `select_value'
/app/vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.0/lib/active_record/relation/finder_methods.rb:314:in `exists?'
/app/vendor/bundle/ruby/2.2.0/gems/activerecord-4.2.0/lib/active_record/querying.rb:3:in `exists?'
You can do it two ways: Use the SmarterCSV gem and test it locally first to make sure it can handle the size. If the size isn't an issue this would be your best bet since it makes it very easy to process large csv's before input. If that doesn't work you can do this:
Use mysql's import feature (discussed here: http://dev.mysql.com/doc/refman/5.7/en/mysqlimport.html ) to first directly throw the data into a table in mysql. Then, you can iterate through the records and transfer the data to the appropriate table using rails find_each method to avoid overloading the garbage collector. I'm not sure if the import feature works the same as postgres' COPY but if it does make sure you create a table without a primary key in rails to hold the initial data transfer if your csv file doesn't have a primary key column.
Related
For performance reasons, I need to issue SQL statements (insertions and updates) directly to the database. I have no problem executing a large insert statement like:
#conn = ActiveRecord::Base.connection
inserts = "INSERT INTO clients (code, name) VALUES ('abc123', 'Alyx'), ('xyz123', 'Gordon') ...many more...\;"
#conn.execute inserts
However, I'm having difficulty executing a batch of updates like:
updates = "UPDATE clients SET name='Julia' WHERE id=1; UPDATE clients SET name='Eli' WHERE id=2; ...many more..."
#conn.execute updates
# or
#conn.update updates
because that gives me the general SQL syntax error:
ActiveRecord::StatementInvalid: Mysql2::Error: You have an error in your SQL syntax;
I've tried changing the database.yml configuration file to include the MULTI_STATEMENTS flag without success:
flags:
- MULTI_STATEMENTS
The only way I managed to make this work is by getting a Mysql2 client instance, with the flag set:
client = Mysql2::Client.new(host: 'localhost', ... , flags: Mysql2::Client::MULTI_STATEMENTS)
client.query updates
but this doesn't seem like a good idea since it would lock the app together with the mysql2 gem.
Is this a problem with the mysql2 gem, ActiveRecord, or am I missing something essential?
So, I found there was no reason to keep using ActiveRecord since I wasn't making use of it so I decided to stick with Mysql2::Client.
Just make sure that flags: Mysql2::Client::MULTI_STATEMENTS is set and remember to clear the results from any previous commands before issuing another one:
while client.next_result
end
Also, the reason behind trying to use ActiveRecord was the transaction management. It's possible to do the same with:
client.query 'BEGIN'
client.query 'COMMIT'
client.query 'ROLLBACK'
According to "Ruby datetime suitable for mysql comparison", I should be able to do:
Time.now.to_s(:db)
This doesn't appear to be valid anymore. I get:
irb(main):001:0> Time.now.to_s(:db)
ArgumentError: wrong number of arguments (1 for 0)
from (irb):1:in `to_s'
from (irb):1
from C:/Ruby22/bin/irb:11:in `<main>'
Does this functionality still exist or do I have to manually format the date and time to fit MySQL format?
I'm using ruby 2.2.2.
Time#to_s doesn't accept arguments in Ruby. If you're using Rails, ActiveSupport::TimeWithZone supplies the to_s method you were referring to.
To get this format in Ruby without ActiveSupport you can use:
Time.now.strftime('%Y-%m-%d %H:%M:%S')
My local rails database is mysql but my server host (heroku) is Postgres.
Probably a fairly common combination.
I have an advanced search form that work locally in development mode but not in production and it looks like it might be a Postgres specific thing as the heroku log shows I am getting:
LINE 1: ...,18,19,17,4,32,23,24,16,6,13) and (version_number >= 0.0 or ...
2014-06-23T01:47:54.198026+00:00 app[web.1]: ^
2014-06-23T01:47:54.198022+00:00 app[web.1]: ActiveRecord::StatementInvalid (PG::UndefinedFunction: ERROR: operator does not exist: character varying >= numeric
2014-06-23T01:47:54.198028+00:00 app[web.1]: HINT: No operator matches the given name and argument type(s). You might need to add explicit type casts.
in the log.
Is there another way to do >= in postgres.
Locally I do see that the datatype is string in schema.rb which is probably the problem. Is there a way I can cast it into integer for rails for pg?
PostgreSQL definitely does have the >= operator: http://www.postgresql.org/docs/current/static/functions-comparison.html
Your problem is that you seem to be comparing a string with a number.
Is there a way I can cast it into integer for rails for pg?
Probably - but we can't see your code. Did you write the SQL? Or did you rely on ActiveRecord? DataMapper? Sequel? Can't help without seeing what you did.
First, this is directly related to my other question:
How to gracefully handle "Mysql2::Error: Invalid date" in ActiveRecord?
But I still do not want to jump through all the loops of writing migrations which fix dates. That won't be the last table with invalid dates and I need some more generic approach.
So here we go:
I'm using a legacy MySQL database which contains invalid dates, sometimes like 2010-01-00 or 0000-04-25... Rails does not load such records (older versions of Rails did).
I do not want to (and cannot) correct these dates manually or automated. It should be up to the authors of those records to correct these dates. The old system was a PHP application which allowed such annoyances. The Rails application should/will just prevent the user from saving the record until the dates are valid.
The problem does not seem to be within Rails itself, but deeper within an .so library of the rails mysql gem.
So my question is not about how to validate the date or how to insert invalid dates. I don't want to do that and that's covered by numerous answers all over stackoverflow and the rest of the internet. My question is how to READ invalid dates from MySQL that already exist in the database without Rails exploding into 1000 little pieces...
The column type is DATETIME and I'm not sure if casting to string could help because Rails chokes before any ActiveRecord related parsing kicks in.
Here's the exact error and backtrace:
$ rails c
Loading development environment (Rails 3.2.13)
irb(main):001:0> Poll.first
Poll Load (0.5ms) SELECT `polls`.* FROM `polls` LIMIT 1
Mysql2::Error: Invalid date: 2003-00-01 00:00:00
from /home/kakra/.gem/ruby/1.8/gems/activerecord-3.2.13/lib/active_record/connection_adapters/mysql2_adapter.rb:216:in `each'
from /home/kakra/.gem/ruby/1.8/gems/activerecord-3.2.13/lib/active_record/connection_adapters/mysql2_adapter.rb:216:in `to_a'
from /home/kakra/.gem/ruby/1.8/gems/activerecord-3.2.13/lib/active_record/connection_adapters/mysql2_adapter.rb:216:in `exec_query'
from /home/kakra/.gem/ruby/1.8/gems/activerecord-3.2.13/lib/active_record/connection_adapters/mysql2_adapter.rb:224:in `select'
from /home/kakra/.gem/ruby/1.8/gems/activerecord-3.2.13/lib/active_record/connection_adapters/abstract/database_statements.rb:18:in `select_all'
from /home/kakra/.gem/ruby/1.8/gems/activerecord-3.2.13/lib/active_record/connection_adapters/abstract/query_cache.rb:63:in `select_all'
from /home/kakra/.gem/ruby/1.8/gems/activerecord-3.2.13/lib/active_record/querying.rb:38:in `find_by_sql'
from /home/kakra/.gem/ruby/1.8/gems/activerecord-3.2.13/lib/active_record/explain.rb:41:in `logging_query_plan'
from /home/kakra/.gem/ruby/1.8/gems/activerecord-3.2.13/lib/active_record/querying.rb:37:in `find_by_sql'
from /home/kakra/.gem/ruby/1.8/gems/activerecord-3.2.13/lib/active_record/relation.rb:171:in `exec_queries'
from /home/kakra/.gem/ruby/1.8/gems/activerecord-3.2.13/lib/active_record/relation.rb:160:in `to_a'
from /home/kakra/.gem/ruby/1.8/gems/activerecord-3.2.13/lib/active_record/explain.rb:34:in `logging_query_plan'
from /home/kakra/.gem/ruby/1.8/gems/activerecord-3.2.13/lib/active_record/relation.rb:159:in `to_a'
from /home/kakra/.gem/ruby/1.8/gems/activerecord-3.2.13/lib/active_record/relation/finder_methods.rb:380:in `find_first'
from /home/kakra/.gem/ruby/1.8/gems/activerecord-3.2.13/lib/active_record/relation/finder_methods.rb:122:in `first'
from /home/kakra/.gem/ruby/1.8/gems/activerecord-3.2.13/lib/active_record/querying.rb:5:in `__send__'
from /home/kakra/.gem/ruby/1.8/gems/activerecord-3.2.13/lib/active_record/querying.rb:5:in `first'
from (irb):1
The backtrace remains the same even when I do Poll.first.title so some date should never reach any output routine in IRB and thus should never be parsed. So suggestions to use a value before typecasting would not help.
I think the simplest solution that worked for me was to set in database.yml file cast: false, e.g. for development section
development
<<: *default
adapter: mysql2
(... some other settings ...)
cast: false
try this out
ActiveRecord::AttributeMethods::BeforeTypeCast provides a way to read the value of the attributes before typecasting and deserialization.
http://api.rubyonrails.org/classes/ActiveRecord/AttributeMethods/BeforeTypeCast.html
I get a mysql error:
#update (ActiveRecord::StatementInvalid) "Mysql::Error: #HY000Got error 139 from storage engine:
When trying to update a text field on a record with a string of length 1429 characters, any ideas on how to track down the problem?
Below is the stacktrace.
from /var/www/releases/20081002155111/vendor/rails/activerecord/lib/active_record/connection_adapters/abstract_adapter.rb:147:in `log'
from /var/www/releases/20081002155111/vendor/rails/activerecord/lib/active_record/connection_adapters/mysql_adapter.rb:299:in `execute'
from /var/www/releases/20081002155111/vendor/rails/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:167:in `update_sql'
from /var/www/releases/20081002155111/vendor/rails/activerecord/lib/active_record/connection_adapters/mysql_adapter.rb:314:in `update_sql'
from /var/www/releases/20081002155111/vendor/rails/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:49:in `update_without_query_dirty'
from /var/www/releases/20081002155111/vendor/rails/activerecord/lib/active_record/connection_adapters/abstract/query_cache.rb:19:in `update'
from /var/www/releases/20081002155111/vendor/rails/activerecord/lib/active_record/base.rb:2481:in `update_without_lock'
from /var/www/releases/20081002155111/vendor/rails/activerecord/lib/active_record/locking/optimistic.rb:70:in `update_without_dirty'
from /var/www/releases/20081002155111/vendor/rails/activerecord/lib/active_record/dirty.rb:137:in `update_without_callbacks'
from /var/www/releases/20081002155111/vendor/rails/activerecord/lib/active_record/callbacks.rb:234:in `update_without_timestamps'
from /var/www/releases/20081002155111/vendor/rails/activerecord/lib/active_record/timestamp.rb:38:in `update'
from /var/www/releases/20081002155111/vendor/rails/activerecord/lib/active_record/base.rb:2472:in `create_or_update_without_callbacks'
from /var/www/releases/20081002155111/vendor/rails/activerecord/lib/active_record/callbacks.rb:207:in `create_or_update'
from /var/www/releases/20081002155111/vendor/rails/activerecord/lib/active_record/base.rb:2200:in `save_without_validation'
from /var/www/releases/20081002155111/vendor/rails/activerecord/lib/active_record/validations.rb:901:in `save_without_dirty'
from /var/www/releases/20081002155111/vendor/rails/activerecord/lib/active_record/dirty.rb:75:in `save_without_transactions'
from /var/www/releases/20081002155111/vendor/rails/activerecord/lib/active_record/transactions.rb:106:in `save'
from /var/www/releases/20081002155111/vendor/rails/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:66:in `transaction'
from /var/www/releases/20081002155111/vendor/rails/activerecord/lib/active_record/transactions.rb:79:in `transaction'
from /var/www/releases/20081002155111/vendor/rails/activerecord/lib/active_record/transactions.rb:98:in `transaction'
from /var/www/releases/20081002155111/vendor/rails/activerecord/lib/active_record/transactions.rb:106:in `save'
from /var/www/releases/20081002155111/vendor/rails/activerecord/lib/active_record/transactions.rb:118:in `rollback_active_record_state!'
from /var/www/releases/20081002155111/vendor/rails/activerecord/lib/active_record/transactions.rb:106:in `save'
When you say a text field, is it of type VARCHAR, or TEXT?
If its the former then you cannot store a string larger than 255 chars (possibly less with UTF-8 overhead) in that column. If its the latter, you'd better post your schema definition so people can assist you further.
Maybe it's this bug: #1030 - Got error 139 from storage engine, but it would help if you'd post the query which should come directly after the error message.
It seemed to be a very weird mysql error, where the text was being truncated to 256 characters (for a text type) and throwing the above error is the string was 1000 characters or more. modifying the table column to be text again fixed the issue, or it just fixed it self.. i'm still not sure.
Update:
Changing the table type to MyISAM fixed this problem