ActiveRecord joins and where - mysql

I have three models Company, Deal and Slot. They are associated as Company has_many deals and Deal has_many slots. All the A company can be expired if all of its deals are expired. And a deal is expired when all of its slots are expired.
I have written a scope..
scope :expired,
lambda { |within|
self.select(
'DISTINCT companies.*'
).latest(within).joins(
:user =>{ :deals => :slots }
).where(
"companies.spam = false AND deals.deleted_at IS NULL
AND deals.spam = false AND slots.state = 1
OR slots.begin_at <= :time",
:time => Time.zone.now + SLOT_EXPIRY_MARGIN.minutes
)
}
The above scope does not seem right to me from what I am trying to achieve. I need companies with all of its slots for all the deals are either in state 1 or the begin_at is less than :time making it expired.
Thanks for having a look in advance.

AND has a higher precedence than OR in SQL so your where actually gets parsed like this:
(
companies.spam = false
and deals.deleted_at is null
and deals.spam = false
and slots.state = 1
)
or slots.begin_at <= :time
For example (trimmed a bit for brevity):
mysql> select 1 = 2 and 3 = 4 or 5 = 5;
+---+
| 1 |
+---+
mysql> select (1 = 2 and 3 = 4) or 5 = 5;
+---+
| 1 |
+---+
mysql> select 1 = 2 and (3 = 4 or 5 = 5);
+---+
| 0 |
+---+
Also, you might want to use a placeholder instead of the literal false in the SQL, that should make things easier if you want to switch databases (but of course, database portability is largely a myth so that's just a suggestion); you could also just use not in the SQL. Furthermore, using a class method is the preferred way to accept arguments for scopes. Using scoped instead of self is also a good idea in case other scopes are already in play but if you use a class method, you don't have to care.
If we fix the grouping in your SQL with some parentheses, use a placeholder for false, and switch to a class method:
def self.expired(within)
select('distinct companies.*').
latest(within).
joins(:user => { :deals => :slots }).
where(%q{
not companies.spam
and not deals.spam
and deals.deleted_at is null
and (slots.state = 1 or slots.begin_at <= :time)
}, :time => Time.zone.now + SLOT_EXPIRY_MARGIN.minutes)
end
You could also write it like this if you prefer little blobs of SQL rather than one big one:
def self.expired(within)
select('distinct companies.*').
latest(within).
joins(:user => { :deals => :slots }).
where('not companies.spam').
where('not deals.spam').
where('deals.deleted_at is null').
where('slots.state = 1 or slots.begin_at <= :time', :time => Time.zone.now + SLOT_EXPIRY_MARGIN.minutes)
end
This one also neatly sidesteps your "missing parentheses" problem.
UPDATE: Based on the discussion in the comments, I think you're after something like this:
def self.expired(within)
select('distinct companies.*').
latest(within).
joins(:user => :deals).
where('not companies.spam').
where('not deals.spam').
where('deals.deleted_at is null').
where(%q{
companies.id not in (
select company_id
from slots
where state = 1
and begin_at <= :time
group by company_id
having count(*) >= 10
)
}, :time => Time.zone.now + SLOT_EXPIRY_MARGIN.minutes
end
That bit of nastiness at the bottom grabs all the company IDs that have ten or more expired or used slots and then companies.id not in (...) excludes them from the final result set.

Related

How to query in EF core with OrderByDescending, Take, Select and FirstOrDefault

So I've got a table named Summaries, it looks like this
I need to get to sum the latest entries of TotalPieces based on CoveredDate and should be grouped by ServiceCode and queried by month
for example, ServiceCode 'A' has entries on 2020-01-01, 2020-01-02, 2020-01-03, 2020-01-31, 2020-02-01, 2020-02-28, 2020-02-29
and ServiceCode 'B' has entries on 2020-01-01, 2020-01-02, 2020-01-31, 2020-02-20, 2020-02-21,
i need to get the sum based on month, lastest entry on 'A' on January is on 2020-01-31, and 'B' has latest entry on 2020-01-31, I need to sum their 'TotalPieces', so I should get 25 + 25 = 50.
basically i need to do is
Get all the lastest entries based on CoveredDate and month/year
Sum the TotalPieces by ServiceCode
i got a working query, but this is just a workaround because i can't get it right on query.
int sum_totalpieces = 0;
foreach (var serviceCode in service_codes)
{
var totalpieces = _DbContext.ActiveSummaries.Where(acs =>
acs.CoveredDate.Date.Month == query_month
&& acs.CoveredDate.Date.Year == query_year
&& acs.service_codes == serviceCode
)
.OrderByDescending(obd => obd.CoveredDate)
.Take(1)
.Select(s => s.TotalPieces)
.ToList()
.FirstOrDefault();
sum_totalpieces += totalpieces;
}
the service_codes is just a List of string
If you guys could just get rid of the foreach block their and make it services_codes.Contains() on query, or another workaround to make the result faster that would be great. Thanks a lot.
This will do it, but I don't think it will translate to SQL and run at the server:
_DbContext.ActiveSummaries
.Where(b =>
b.CoveredDate >= new DateTime(2020,1,1) &&
b.CoveredDate < new DateTime(2020,2,1) &&
new [] { "A", "B" }.Contains(b.ServiceCode)
)
.GroupBy(g => g.ServiceCode)
.Sum(g => g.OrderByDescending(gb=> gb.CoveredDate).First().TotalPieces);
If you want to do it as a raw SQL for best performance it would look like:
SELECT SUM(totalpieces)
FROM
x
INNER JOIN
(
SELECT servicecode, MAX(covereddate) cd
FROM x
WHERE x.servicecode IN ('A','B') AND covereddate BETWEEN '2020-01-01' AND '2020-01-31'
)y ON x.servicecode=y.servicecode and x.covereddate = y.cd

MySQL match pattern and select number and letters

I have a list of IDs which are created in various third party applications systems and manually added to our system. I need to try and auto increment these IDs based on the largest number. The values are either entirely a number or any number of letters followed by any number of numbers.
For example:
Array ( [works_id] => MD001 [num] => 0 )
Array ( [works_id] => WX9834V [num] => 0 )
Array ( [works_id] => WK009 [num] => 0 )
Array ( [works_id] => W4KHA2 [num] => 0 )
Array ( [works_id] => MD001 [num] => 0 )
Array ( [works_id] => DE1234 [num] => 0 )
Array ( [works_id] => 99 [num] => 99 )
Array ( [works_id] => 100 [num] => 100 )
In the above example, I would need to return 'DE' and 1234 as 1234 is the largest number which matches the pattern (WX9834V does not match as it is LLNNNNL)
So far I have tried:
SELECT works_id, CAST(works_id as UNSIGNED) as num
FROM table
WHERE (works_id REGEXP '^[a-zA-Z]+[0-9]' or works_id REGEXP '^[0-9]+$')
But this returns all rows and returns 0 for the number part unless it is only made up of numbers - how can I return only 'DE' and 1234 from the above?
From the comments, I understant that your primary intent is to select the records that do match your format spec (possibly characters at the beginning of the string, then mandatory numbers until the end of string).
The problem with you current query is that the first regexp, '^[a-zA-Z]+[0-9]' is too permissive: it does allow non-numbers characters at the end of the field, and would be better written '^[a-zA-Z]+[0-9]+$'
Bottom line, the two regexes can be combined into one:
SELECT works_id
FROM mytable
WHERE works_id REGEXP '^[a-zA-Z]*[0-9]+$'
The regexp means:
^ beginning of the string
[a-zA-Z]* 0 to N letters
[0-9]+ at least one digit
$ end of string
In this db fiddle with your test data, this returns:
| works_id |
| -------- |
| MD001 |
| WK009 |
| MD001 |
| 99 |
| 100 |
NB : in MySQL pre-8.0, splitting the string in order to find the max numerical pain is hard to do, since functions such as REGEXP_REPLACE are not available. It is probably easier to do this in your application (unless you have a very large numbers of matching records...). You can have a look at this post or this other one for solutions that mostly rely on MySQL functions.

Rails Multiplying value of afcolumn using ActiveRecord

I want to multiply a value of an specific column considering the user id.
Assume I have a table users with user 1 (id 1) and user 2 (id 2), and a table animals which has name and mensal_cost.
Ok, then I added two animals for user 1 (id 1) and 1 animal for user 2 (id 2)
I want to know how I can using ActiveRecord calculates the mensal_cost income after 3 months increasing the same base value, it means I have to multiply the actual value by 3.
I'm trying something like this:
Animal.where(user_id: ?).sum('3*mensal_cost')
Since I don't know how many users can exist, I must write a call which will list for each user id the amount after 3 months.
Ok, you nearly had it on your own - just the minor details can be like this:
user_ids = [id1, id2]
full_sum = 3 * Animal.where(:user_id => user_ids).sum(:mensal_cost)
Note: don't forget you can multiply by three after summing and it'll be the same as summing each one multiplied by 3 eg
(3 * 2) + (3 * 3) + (3 * 4) == 3 * (2 + 3 + 4)
or you can iterate through the users to get their individual sums like so:
mensal_sums = {}
user_ids = [id1, id2]
user_ids.each do |user_id|
mensal_sums[user_id] = 3 * Animal.where(:user_id => user_id).sum(:mensal_cost)
end
puts mensal_sums
=> {id1 => 63, id2 => 27}
EDIT
and one where you want the user name as well:
mensal_sums = {}
users = User.find([id1, id2])
users.each do |user|
mensal_sums[user.id] = {:user_name => user.name,
:sum => (3 * user.animals.sum(:mensal_cost)) }
end
puts mensal_sums
=> {id1 => {:user_name => "Bob Jones", :sum => 63},
id2 => {:user_name => "cJane Brown", :sum =>27}
}
I just figured out the solution:
Animal.group('user_id').sum('3*mensal_cost')
the group was the key :D

Dividing a list of 2 query statements

I have some query statements and I want to take the average by basically doing top_level_comment_count.fdiv(code_review_assigned_count).round(2)
Here are my 2 query statements:
top_level_comment_count = CrucibleComment.group(:user_id).where(parent_comment_id: nil).count
code_review_assigned_count = Reviewer.group(:user_id).count
Both of these return something that looks like this:
40=>5,
41=>1,
43=>4,
44=>10,
45=>2,
46=>13,
48=>7,
50=>7,
51=>6,
52=>5,
54=>7,
55=>41,
56=>2,
58=>21,
60=>7,
61=>8,
62=>3,
63=>1,
So, what I am wanting to do is if the :user_ids are the same, take the average.
My def currently looks like this:
def self.average_top_level_comments
a = CrucibleComment.group(:user_id).where(parent_comment_id: nil).count
b = Reviewer.group(:user_id).count
end
In other words I am wanting to do this statement:
return nil unless code_review_assigned_count && top_level_comment_count
top_level_comment_count.fdiv(code_review_assigned_count).round(2)
for a group of numbers. How can I do this?
For example:
id:40 => 5.0/3.3
id: 41 => 1/2.2
id: 43 => 4 /1.0
I would suggest using inject.
Something like:
division_result = top_level_comment_count.inject({}) do |result, item|
id = item.first
count = item.last
result[id] = count.to_f / code_review_assigned_count[id]
result
end
That will return a hash with the IDs as keys and the results of the division as the values.

Data set too large to load into memory for processing

I have a bigger rapidly growing data set of around 4 million rows, in order to define and exclude the outliers (for statistics / analytics usage) I need the algorithm to consider all entries in this data set. However this is too much data to load into memory and my system chokes. I'm currently using this to collect and process the data:
#scoreInnerFences = innerFence Post.where( :source => 1 ).
order( :score ).
pluck( :score )
Using the typical divide and conquer method won't work, I don't think because every entry has to be considered to keep my outlier calculation accurate. How can this be achieved efficiently?
innerFence identifies the lower quartile and upper quartile of the data set, then uses those findings to calculate the outliers. Here is the (yet to be refactored, non-DRY) code for this:
def q1(s)
q = s.length / 4
if s.length % 2 == 0
return ( s[ q ] + s[ q - 1 ] ) / 2
else
return s[ q ]
end
end
def q2(s)
q = s.length / 4
if s.length % 2 == 0
return ( s[ q * 3 ] + s[ (q * 3) - 1 ] ) / 2
else
return s[ q * 3 ]
end
end
def innerFence(s)
q1 = q1(s)
q2 = q2(s)
iq = (q2 - q1) * 3
if1 = q1 - iq
if2 = q2 + iq
return [if1, if2]
end
This is not the best way, but it is an easy way:
Do several querys. First you count the number of scores:
q = Post.where( :source => 1 ).count
then you do your calculations
then you fetch the scores
q1 = Post.where( :source => 1 ).
reverse_order(:score).
select("avg(score) as score").
offset(q).limit((q%2)+1)
q2 = Post.where( :source => 1 ).
reverse_order(:score).
select("avg(score) as score").
offset(q*3).limit((q%2)+1)
The code is probably wrong but I'm sure you get the idea.
For large datasets, I sometimes drop down below ActiveRecord. It's a memory hog, even I imagine, using pluck. Of course it's less portable, but sometimes it's worth it.
scores = Post.connection.execute('select score from posts where score > 1 order by score').map(&:first)
Don't know if that will help enough for 4 million record. If not, maybe look at a stored procedure?