Rails activerecord possible delay with big tables - mysql

I have Message model which have like a tons of fields and around 10 million entries in the table.
Now i also have an SomeItem model which looks like this(it have just 1000 entries):
class SomeItem < ActiveRecord::Base
belongs_to :item, :polymorphic => true # "Message" is one of the item types
end
now in have these two consecutive lines of code which bother me:
m = Message.new(:fild_one => 1, ...)
m.save
s = SomeItem.create(:item => m)
Now if see m is getting saved in the database fine at: Sat, 05 Oct 2013 15:01:06 UTC +00:00 and s at Sat, 05 Oct 2013 15:01:23 UTC +00:00 and thats fine.
but when i do:
s.item_type it gives me "Message" which is fine , but s.id it gives me nil in 3 out of
of 1000 entries.
So my question is:
Is there an kind of delay that prevents m created in the previous line to be used in the s created in the next line.
NOTE:
1) Message table is huge containing millions of entries
2) SomeItem table is small containing just 1000 of entries
3) As per me understanding rails execute these two sentences sequentially , but that does not explain this behavior.
Anyone noticed this before and if yes and if its a known thing what can be done to prevent this?
Thanks in advance. I understand the question looks silly , but its a real life scenario and is happening on my live project.

Related

How can I pull data from my database using the Django ORM that annotates values for each day?

I have a Django app that is attached to a MySQL database. The database is full of records - several million of them.
My models look like this:
class LAN(models.Model):
...
class Record(models.Model):
start_time = models.DateTimeField(...)
end_time = models.DateTimeField(...)
ip_address = models.CharField(...)
LAN = models.ForeignKey(LAN, related_name="records", ...)
bytes_downloaded = models.BigIntegerField(...)
bytes_uploaded = models.BigIntegerField(...)
Each record reflects a window of time, and shows if a particular IP address on particular LAN did any downloading or uploading during that window.
What I need to know is this:
Given a beginning date, and end date, give me a table of which DAYS a particular LAN had ANY activity (has any records)
Ex:
Between Jan 1 and Jan 31, tell me which DAYS LAN A had ANY records on
them
Assume that once in a while, a LAN will shut down for days at a time and have no records or any activity on those days.
My Solution:
I can do this the slow way by attaching some methods to my LAN model:
class LAN(models.Model):
...
# Returns True if there are records for the current LAN between 2 given dates
# Returns False otherwise
def online(self, start, end):
criterion1 = Q(start_time__lt=end)
criterion2 = Q(end_time__gt=start)
return self.records.filter(criterion1 & criterion2).exists()
# Returns a list of days that a LAN was online for between 2 given dates
def list_online_days(self, start, end):
start_date = timezone.make_aware(timezone.datetime.strptime(start, "%b %d, %Y"))
end_date = timezone.make_aware(timezone.datetime.strptime(end, "%b %d, %Y"))
end_date = end_date.replace(hour=23, minute=59, second=59, microsecond=999999)
days_online = []
current_date = start.astimezone()
while current_date <= end:
start_of_day = current_date.replace(hour=0, minute=0, second=0, microsecond=0)
end_of_day = current_date.replace(hour=23, minute=59, second=59, microsecond=999999)
if self.online(start=start_of_day, end=end_of_day):
days_online.append(current_date.date())
current_date += timezone.timedelta(days=1)
return days_online
At which point, I can run:
lan = LAN.objects.get(id=1) # Or whatever LAN I'm interested in
days_online = lan.list_online_days(start="Jan 1, 2020", end="Jan 31, 2020")
This works, but results in one query being run per day between my start date and end date. In this case, 31 queries (Jan 1, Jan 2, etc.).
This makes it really, really slow for large time periods, as it needs to go through all the records in the database 31 times. Database indexing helps, but it's still slow with enough data in the database.
Is there a way to do a single database query to give me what I need?
I feel like it would look something like this, but I can't quite get it right:
lan.records.filter(criterion1 & criterion2).annotate(date=TruncDay('start_time')).order_by('date').distinct().values('date').annotate(exists=Exists(SOMETHING))
The first part:
lan.records.filter(criterion1 & criterion2).annotate(date=TruncDay('start_time')).order_by('date').distinct().values('date')
Seems to give me what I want - one value per day, but I'm not sure how to annotate the result with an exists field that shows if any records exist on that day.
Note: This is a simplified version of my app - not the exact models and fields, so if certain things could be improved, like not using CharField for the ip_address field, don't focus too much on that
The answer ended up being simpler than I thought, mostly because I already had it.
This:
lan.records.filter(criterion1 & criterion2).annotate(date=TruncDay('start_time')).order_by('date').distinct().values('date').annotate(exists=Exists(Record.objects.filter(pk=OuterRef('pk'))))
Was what I was expecting, but all it does is return exists=True for all days returned, which is accurate, but not overly helpful. This is because any days that had no records on them are already omitted from the results.
That means I can skip the entire annotate section, and just do this:
lan.records.filter(criterion1 & criterion2).annotate(date=TruncDay('start_time')).order_by('date').distinct().values('date')
which already gives me a list of datetime objects when there were records present, and skips any where there weren't.

How do i make it so i don't have to query a million+ timestamps

Quick synopsis of the problem:
I am working on a graph page to map the performance of a device my company is working on.
I get a new statpoint (timestamp, stats, nodeid, volumeid, clusterid) every 2 seconds. from every node.
this results in approximately 43k records per day per node per stat.
now let's say i have 13 stats. that's 520k ish records a day.
So a row would look something like:
timestamp typeid clusterid nodeid volumeid value
01/02/2016 05:02:22 0 1 1 1 82.20
So brief explanation, we decided to go with mysql because it's easily scaleable in Amazon. i was using Influx before which could easily solve this problem but there is no way to auto scale InfluxDB in Amazon.
My ultimate goal is to get a return value that looks like:
object[ {
node1-stat1: 20.0,
node2-stat1: 23.2,
node3-stat1: xx.x,
node1-stat2: 20.0
node2-stat2: xx.x,
node3-stat2: xx.x,
timestamp: unixtimestamp
},
{
node1-stat1: 20.0,
node2-stat1: 23.2,
node3-stat1: xx.x,
node1-stat2: 20.0
node2-stat2: xx.x,
node3-stat2: xx.x,
timestamp: unixtimestamp + 2 seconds
}]
i currently have a query that gathers all the unique timestamps.
and then loops over those to get the values belonging to that timestamp.
and that get's put in an object.
That results in the desired output. but it takes FOREVER to do this and it's over a million queries.
Can something like this even be done in Mysql? should i go back to a timeseries db? and just deal with scaling it manually?
// EDIT //
I think i might have solved my problem:
SELECT data_points.*, data_types.friendly_name as friendly_name
FROM data_points, data_types
WHERE (cluster_id = '5'
AND data_types.id = data_points.data_type_id
AND unix_timestamp(timestamp) BETWEEN '1456387200' AND '1457769599')
ORDER BY timestamp, friendly_name, node_id, volume_id
Gives me all the fields i need.
I then loop over these datapoints. and create a new "object" for each timestamp, and just add stats to this object for all the ones that match the timestamp.
this executes in under a second going over a million records.
I will for sure try to see if swapping to a Timeseries db will make an improvement in the future.

Rails 4 / RSpec - Testing time values persisted to MYSQL database in controller specs

I'm writing pretty standard RSpec controller tests for my Rails app. One issue I'm running into is simply testing that time values have been persisted in an update action.
In my controller I have:
def update
if #derp.update_attributes(derp_params)
redirect_to #derp, flash: {success: "Updated derp successfully."}
else
render :edit
end
end
#derp has a time attribute of type time. I can test all of its other attributes in the update action as follows:
describe "PATCH #update" do
before do
#attr = {
attribute_1: 5, attribute_2: 6, attribute_3: 7,
time: Time.zone.now
}
end
end
The error I'm getting is:
1) DerpsController logged in PATCH #update updates the derps's attributes
Failure/Error: expect(derp.time).to eq(#attr[:time])
expected: 2015-08-24 18:30:32.096943000 -0400
got: 2000-01-01 18:30:32.000000000 +0000
(compared using ==)
Diff:
## -1,2 +1,2 ##
-2015-08-24 18:30:32 -0400
+2000-01-01 18:30:32 UTC
I've tried using Timecop and also comparing with a to_s or to_i...but every attribute of data type time is completely off as far as the year goes. I've seen a couple posts saying how you can expect it to be within 1 second and how to deal with that, but it looks like my year is completely off?
This can't be that difficult - I just want to test that the controller can take a time sent to it and save it to the database.
What am I missing here?
EDIT: No help here after a couple of days. Let me try to re-iterate what's happening - The date is being stripped because it is a MYSQL type TIME. Notice the 2000-01-01...
let(:time) {'Mon, 24 Aug 2015 23:19:09'}
describe "PATCH #update" do
before do
#attr = {
attribute_1: 5, attribute_2: 6, attribute_3: 7,
time: time.in_time_zone('Eastern Time (US & Canada)').to_datetime
}
end
end
Rather than saving Time.zone.now in the record, define a time variable and update the record with that time so you know which time should be saved in the database. Then, expect to get that time back when you compare.
I ended up doing this:
let(:time) {'01 Jan 2000 16:20:00'}
I really can't find any good explanation as to how or why Rails is storing time as 2000-01-01, or any helper method to format a time like that. I checked the controller params and its actually being sent to the controller with that 2000-01-01 date in it.
Some people say to just use a datetime, but I am truly trying to store a time of the day, so I don't think that makes sense for this use case.
Anyway, I can just use that time variable anywhere in my specs and works.

Translate SQL query to Ruby on Rails

I need to convert a relatively simple query to display a total quiz average for a given user in a table set up in Rails/HAML. We have users take quizzes, record the scores, and display the average per quiz. We now want to total average of all quizzes. Easy:
SELECT (ROUND(AVG(`score`*100), 1)) FROM `quiz_results` WHERE `user_id`=$user
The results need to display in a table cell that is already set up, but I cannot figure this out.
Perhaps this line will help. It's pre-existing code that calculates the average of a particular quiz for that user:
%td.separate="#{(((lesson.quiz_results.average('score', :conditions => "user_id = #{#user.id}")) * 100).to_i)}%"
I have Rails 2.3.x.
Well, as i can see now - all you need is to remove particular quiz restriction, which is imposed by association usage lesson.quiz_results - instead of it just use model class, which is most likely QuizResult.
And, also, there is tiny bug in your existing code - .to_i will rounding down, you should use .round. See the difference:
irb(main):002:0> 1.6.to_i
=> 1
irb(main):003:0> 1.6.round
=> 2
So, full code should be:
(QuizResult.average('score', :conditions => "user_id = #{#user.id}") * 100).round
(I also removed some unnecessary brackets)

How to find all database rows with a time AFTER a specific time

I have a "last_action" column in my users table that updates a unix timestamp every time a user takes an action. I want to select all users in my users table who have made an action on the site after a specific time (probably 15 minutes, the goal here is to make a ghetto users online list).
I want to do something like the following...
time = Time.now.to_i - 900 # save the timestamp 15 minutes ago in a variable
User.where(:all, :where => :last_action > time)
What's the best way to do this? Is there a way of doing it without using any raw SQL?
Maybe this will work?
User.where("users.last_action > ?", 15.minutes.ago)
Try this:
time = Time.now.to_i - 900
User.where("last_action > #{time}")
There might be a nicer/safer syntax to put variable arguments in to the where clause, but since you are working with integers (timestamps) this should work and be safe.