How to find a rails object based on a date value - mysql

I have a User object and I am attempting to do 2 different queries as part of a script that needs to run nightly. Given the schema below I would like to:
Get all the Users with a non nil end_date
Get all the Users with an end_date that is prior to today (I.E. has passed)
Users Schema:
# == Schema Information
#
# Table name: users
#
# id :integer not null, primary key
# name :string(100) default("")
# end_date :datetime
I've been trying to use User.where('end_date != NULL) and other things but I cannot seem to get the syntax correct.

Your methods should be as below inside the User model :
def self.users_with_end_date_not_null
self.where.not(end_date: nil)
# below Rails 4 use
# self.where("end_date != ?", nil)
end
def self.past_users n
self.where(end_date: n.day.ago)
end

Related

Running out of memory when running rake import task in ruby

I am running a task to import around 1 million orders. I am looping through the data to update it to the values on the new database and it is working fine on my local computer with 8 gig of ram.
However when I upload it to my AWS instance t2.medium It will run for the first 500 thousand rows but towards the end, I will start maxing out my memory when it starts actually creating non-existent orders. I am porting a mysql database to postgres
am I missing something obvious here?
require 'mysql2' # or require 'pg'
require 'active_record'
def legacy_database
#client ||= Mysql2::Client.new(Rails.configuration.database_configuration['legacy_production'])
end
desc "import legacy orders"
task orders: :environment do
orders = legacy_database.query("SELECT * FROM oc_order")
# init progressbar
progressbar = ProgressBar.create(:total => orders.count, :format => "%E, \e[0;34m%t: |%B|\e[0m")
orders.each do |order|
if [1, 2, 13, 14].include? order['order_status_id']
payment_method = "wx"
if order['paid_by'] == "Alipay"
payment_method = "ap"
elsif order['paid_by'] == "UnionPay"
payment_method = "up"
end
user_id = User.where(import_id: order['customer_id']).first
if user_id
user_id = user_id.id
end
order = Order.create(
# id: order['order_id'],
import_id: order['order_id'],
# user_id: order['customer_id'],
user_id: user_id,
receiver_name: order['payment_firstname'],
receiver_address: order['payment_address_1'],
created_at: order['date_added'],
updated_at: order['date_modified'],
paid_by: payment_method,
order_num: order['order_id']
)
#increment progress bar on each save
progressbar.increment
end
end
end
I assume this line orders = legacy_database.query("SELECT * FROM oc_order") loads entire table to the memory, which is very ineffective.
You need to iterate over table in batches. In ActiveRecord, there is find_each method for that. You may want to implement your own batch querying using limit and offset, since you don't use ActiveRecord.
In order to handle memory efficiently, you can run mysql query in batches as suggested by nattfodd.
There are two ways to achieve it, as per mysql documentation:
SELECT * FROM oc_order LIMIT 5,10;
or
SELECT * FROM oc_order LIMIT 10 OFFSET 5;
Both of the queries will return rows 6-15.
You can decide the offset of your choice and run the queries in loop until your orders object is empty.
Let us assume you handle 1000 orders at a time, then you'll have something like this:
batch_size = 1000
offset = 0
loop do
orders = legacy_database.query("SELECT * FROM oc_order LIMIT #{batch_size} OFFSET #{offset}")
break unless orders.present?
offset += batch_size
orders.each do |order|
... # your logic of creating new model objects
end
end
It is also advised to run your code in production with proper error handling:
begin
... # main logic
rescue => e
... # handle error
ensure
... # ensure
end
Disabling row caching while iterating over the orders collection should reduce the memory consumption:
orders.each(cache_rows: false) do |order|
there is a gem that helps us do this called activerecord-import.
bulk_orders=[]
orders.each do |order|
order = Order.new(
# id: order['order_id'],
import_id: order['order_id'],
# user_id: order['customer_id'],
user_id: user_id,
receiver_name: order['payment_firstname'],
receiver_address: order['payment_address_1'],
created_at: order['date_added'],
updated_at: order['date_modified'],
paid_by: payment_method,
order_num: order['order_id']
)
end
Order.import bulk_orders, validate: false
with a single INSERT statement.

Proper use of flatMap

Why I keep getting this error everytime I try an action of my RDD & how to fix it?
/databricks/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
317 raise Py4JJavaError(
318 "An error occurred while calling {0}{1}{2}.\n".
--> 319 format(target_id, ".", name), value)
320 else:
321 raise Py4JError(
I've tried to figure out which is the last RDD I can do action on and its ratingByUser, which indicates the problem is in the flatMap.
What I'm trying to do is that I take CSV with (userID,movieID,rating) and I want to create unique combinations of movieID per userID with the rating, but different users can generate the same pair of movieID,ex for this CSV:
1,2000,5
1,2001,2
1,2002,3
2,2000,4
2,2001,1
2,2004,5
I want RDD:
key (2000,2001), value (5,2,1)
key (2000,2002), value (5,3,1)
key (2001,2002), value (2,3,1)
key (2000,2001), value (4,1,1)
key (2000,2004), value (4,5,1)
key (2001,2004), value (1,5,1)
# First Map function - gets line and returns key(userID) value(movieID,rating)
def parseLine(line):
fields=line.split(",")
userID=int(fields[0])
movieID=int(fields[1])
rating=int(fields[2])
return userID, (movieID,rating)
# Function to create movie unique pairs with ratings
# all pair start with the lowest ID
# returns key (movieIDj,movieIDi) & value (rating-j,rating-i,1)
# the 1 in value is added in order to count number of ratings in the reduce
def createPairs(userRatings):
pairs=[]
for i1 in range(len(userRatings[1])-1):
for i2 in range(i1+1,len(userRatings[1])):
if userRatings[i1][0]<userRatings[1][i2][0]:
pairs.append(((userRatings[1][i1][0],userRatings[1][i2][0]),(userRatings[1][i1][1],userRatings[1][i2][1],1)))
else:
pairs.append(((userRatings[1][i2][0],userRatings[1][i1][0]),(userRatings[1][i2][1],userRatings[1][i1][1],1)))
return pairs
# Create SC object from the ratings file
lines = sc.textFile("/FileStore/tables/dvmlbdnj1487603982330/ratings.csv")
# Map lines to Key(userID),Value(movieID,rating)
movieRatings = lines.map(parseLine)
# Join all rating by same user into one key
# (UserID1,(movie1,rating1)),(UserID1,(movie2,rating2)) --> UserID1,[(movie1,rating1),(movie2,rating2)]
ratingsPerUser = movieRatings.groupByKey()
# activate createPairs func
# We use flatMap, since each user have different number of ratings --> different number pairs
pairsOfMovies = ratingsPerUser.flatMap(createPairs)
Problem is function passed to flatMap not flatMap.
Group by key returns iterator:
It cannot be traversed multiple times
It cannot be indexed.
Convert to list first:
ratingsPerUser.mapValues(list).flatMap(createPairs)

Convert this code into active record/sql query

I have the following code and would like to convert the request into a mysql query. Right now I achieve the desired result using a manual .select (array method) on the data. This should be possibile with a single query (correct me if I am wrong).
Current code:
def self.active_companies(zip_code = nil)
if !zip_code
query = Company.locatable.not_deleted
else
query = Company.locatable.not_deleted.where("zip_code = ?", zip_code)
end
query.select do |company|
company.company_active?
end
end
# Check if the company can be considered as active
def company_active?(min_orders = 5, last_order_days_ago = 15)
if orders.count >= min_orders &&
orders.last.created_at >= last_order_days_ago.days.ago &&
active
return true
else
return false
end
end
Explanation:
I want to find out which companies are active. We have a company model and an orders model.
Data:
Company:
active
orders (associated orders)
Orders:
created_at
I don't know if it is possible to make the company_active? predicate a single SQL query, but I can offer an alternative:
If you do:
query = Company.locatable.not_deleted.includes(:orders)
All of the relevant orders will be loaded into the memory for future processing.
This will eliminate all the queries except for 2:
One to get the companies, and one to get all their associated orders.

Default size of integer in Rails tables (MySQL)

When I run
rails g model StripeCustomer user_id:integer customer_id:integer
annotate
I got
# == Schema Information
# Table name: stripe_customers
# id :integer(4) not null, primary key
# user_id :integer(4)
# customer_id :integer(4)
# created_at :datetime
# updated_at :datetime
Does it mean I can only hold up to 9,999 records only? (I am quite surprise how small a default size for keys is). How do I change default IDs to be 7 digits in existing tables?
Thank you.
While the mysql client's describe command really uses the display width (see the docs), the schema information in the OP's question is very probably generated by the annontate_models gem's get_schema_info method that uses the limit attribute of each column. And the limit attribute is the number of bytes for :binary and :integer columns (see the docs).
The method reads (see how the last line adds the limit):
def get_schema_info(klass, header, options = {})
info = "# #{header}\n#\n"
info << "# Table name: #{klass.table_name}\n#\n"
max_size = klass.column_names.collect{|name| name.size}.max + 1
klass.columns.each do |col|
attrs = []
attrs << "default(#{quote(col.default)})" unless col.default.nil?
attrs << "not null" unless col.null
attrs << "primary key" if col.name == klass.primary_key
col_type = col.type.to_s
if col_type == "decimal"
col_type << "(#{col.precision}, #{col.scale})"
else
col_type << "(#{col.limit})" if col.limit
end
#...
end
Rails actually means 4 bytes here, i.e. the standard mysql integer type (see the docs)

Fetch all table name and row count for specific table with Rails?

How can i fetch all the table name and row count for the specific table from the specific database ?
Result
Table Name , Row Count , Table Size(MB)
---------------------------------------
table_1 , 10 , 2.45
table_2 , 20 , 4.00
ActiveRecord::Base.connection.tables.each do |table|
h = ActiveRecord::Base.connection.execute("SHOW TABLE STATUS LIKE '#{table}'").fetch_hash
puts "#{h['Name']} has #{h['Rows']} rows with size: #{h['Data_length']}"
end
The question is tagged mysql but you can do it in a DB-agnostic manner via ORM.
class DatabaseReport
def entry_counts
table_model_names.map do |model_name|
entity = model_name.constantize rescue nil
next if entity.nil?
{ entity.to_s => entity.count }
end.compact
end
private
def table_model_names
ActiveRecord::Base.connection.tables.map(&:singularize).map(&:camelize)
end
end
Note that this will skip tables for which you don't have an object mapping such as meta tables like ar_internal_metadata or schema_migrations. It also cannot infer scoped models (but could be extended to do so). E.g. with Delayed::Job I do this:
def table_model_names
ActiveRecord::Base.connection.tables.map(&:singularize).map(&:camelize) + ["Delayed::Job"]
end
I came up with my own version which is also db agnostic.
As it uses the descendants directly it also handles any tables where the table_name is different to the model name.
The rescue nil exists for cases when you have the class that inherits from ActiveRecord but for some reason don't have a table associated with it. It does give data for STI classes and the parent class.
my_models = ActiveRecord::Base.descendants
results = my_models.inject({}) do |result, model|
result[model.name] = model.count rescue nil
result
end
#temp_table = []
ActiveRecord::Base.connection.tables.each do |table|
count = ActiveRecord::Base.connection.execute("SELECT COUNT(*) as count FROM #{table}").fetch_hash['count']
size = ActiveRecord::Base.connection.execute("SHOW TABLE STATUS LIKE '#{table}'").fetch_hash
#temp_table << {:table_name => table,
:records => count.to_i,
:size_of_table => ((BigDecimal(size['Data_length']) + BigDecimal(size['Index_length']))/1024/1024).round(2)
}
end
end