Rails Multiplying value of afcolumn using ActiveRecord - mysql

I want to multiply a value of an specific column considering the user id.
Assume I have a table users with user 1 (id 1) and user 2 (id 2), and a table animals which has name and mensal_cost.
Ok, then I added two animals for user 1 (id 1) and 1 animal for user 2 (id 2)
I want to know how I can using ActiveRecord calculates the mensal_cost income after 3 months increasing the same base value, it means I have to multiply the actual value by 3.
I'm trying something like this:
Animal.where(user_id: ?).sum('3*mensal_cost')
Since I don't know how many users can exist, I must write a call which will list for each user id the amount after 3 months.

Ok, you nearly had it on your own - just the minor details can be like this:
user_ids = [id1, id2]
full_sum = 3 * Animal.where(:user_id => user_ids).sum(:mensal_cost)
Note: don't forget you can multiply by three after summing and it'll be the same as summing each one multiplied by 3 eg
(3 * 2) + (3 * 3) + (3 * 4) == 3 * (2 + 3 + 4)
or you can iterate through the users to get their individual sums like so:
mensal_sums = {}
user_ids = [id1, id2]
user_ids.each do |user_id|
mensal_sums[user_id] = 3 * Animal.where(:user_id => user_id).sum(:mensal_cost)
end
puts mensal_sums
=> {id1 => 63, id2 => 27}
EDIT
and one where you want the user name as well:
mensal_sums = {}
users = User.find([id1, id2])
users.each do |user|
mensal_sums[user.id] = {:user_name => user.name,
:sum => (3 * user.animals.sum(:mensal_cost)) }
end
puts mensal_sums
=> {id1 => {:user_name => "Bob Jones", :sum => 63},
id2 => {:user_name => "cJane Brown", :sum =>27}
}

I just figured out the solution:
Animal.group('user_id').sum('3*mensal_cost')
the group was the key :D

Related

How to query in EF core with OrderByDescending, Take, Select and FirstOrDefault

So I've got a table named Summaries, it looks like this
I need to get to sum the latest entries of TotalPieces based on CoveredDate and should be grouped by ServiceCode and queried by month
for example, ServiceCode 'A' has entries on 2020-01-01, 2020-01-02, 2020-01-03, 2020-01-31, 2020-02-01, 2020-02-28, 2020-02-29
and ServiceCode 'B' has entries on 2020-01-01, 2020-01-02, 2020-01-31, 2020-02-20, 2020-02-21,
i need to get the sum based on month, lastest entry on 'A' on January is on 2020-01-31, and 'B' has latest entry on 2020-01-31, I need to sum their 'TotalPieces', so I should get 25 + 25 = 50.
basically i need to do is
Get all the lastest entries based on CoveredDate and month/year
Sum the TotalPieces by ServiceCode
i got a working query, but this is just a workaround because i can't get it right on query.
int sum_totalpieces = 0;
foreach (var serviceCode in service_codes)
{
var totalpieces = _DbContext.ActiveSummaries.Where(acs =>
acs.CoveredDate.Date.Month == query_month
&& acs.CoveredDate.Date.Year == query_year
&& acs.service_codes == serviceCode
)
.OrderByDescending(obd => obd.CoveredDate)
.Take(1)
.Select(s => s.TotalPieces)
.ToList()
.FirstOrDefault();
sum_totalpieces += totalpieces;
}
the service_codes is just a List of string
If you guys could just get rid of the foreach block their and make it services_codes.Contains() on query, or another workaround to make the result faster that would be great. Thanks a lot.
This will do it, but I don't think it will translate to SQL and run at the server:
_DbContext.ActiveSummaries
.Where(b =>
b.CoveredDate >= new DateTime(2020,1,1) &&
b.CoveredDate < new DateTime(2020,2,1) &&
new [] { "A", "B" }.Contains(b.ServiceCode)
)
.GroupBy(g => g.ServiceCode)
.Sum(g => g.OrderByDescending(gb=> gb.CoveredDate).First().TotalPieces);
If you want to do it as a raw SQL for best performance it would look like:
SELECT SUM(totalpieces)
FROM
x
INNER JOIN
(
SELECT servicecode, MAX(covereddate) cd
FROM x
WHERE x.servicecode IN ('A','B') AND covereddate BETWEEN '2020-01-01' AND '2020-01-31'
)y ON x.servicecode=y.servicecode and x.covereddate = y.cd

Dividing a list of 2 query statements

I have some query statements and I want to take the average by basically doing top_level_comment_count.fdiv(code_review_assigned_count).round(2)
Here are my 2 query statements:
top_level_comment_count = CrucibleComment.group(:user_id).where(parent_comment_id: nil).count
code_review_assigned_count = Reviewer.group(:user_id).count
Both of these return something that looks like this:
40=>5,
41=>1,
43=>4,
44=>10,
45=>2,
46=>13,
48=>7,
50=>7,
51=>6,
52=>5,
54=>7,
55=>41,
56=>2,
58=>21,
60=>7,
61=>8,
62=>3,
63=>1,
So, what I am wanting to do is if the :user_ids are the same, take the average.
My def currently looks like this:
def self.average_top_level_comments
a = CrucibleComment.group(:user_id).where(parent_comment_id: nil).count
b = Reviewer.group(:user_id).count
end
In other words I am wanting to do this statement:
return nil unless code_review_assigned_count && top_level_comment_count
top_level_comment_count.fdiv(code_review_assigned_count).round(2)
for a group of numbers. How can I do this?
For example:
id:40 => 5.0/3.3
id: 41 => 1/2.2
id: 43 => 4 /1.0
I would suggest using inject.
Something like:
division_result = top_level_comment_count.inject({}) do |result, item|
id = item.first
count = item.last
result[id] = count.to_f / code_review_assigned_count[id]
result
end
That will return a hash with the IDs as keys and the results of the division as the values.

What way to "rank" dataset Mysql

Situation is as follows:
I have a database with 40.000 cities. Those cities have certain types of properties with an value.
For example "mountains" or "beaches". If a city has lots of mountains the value for mountain will be high if there are less mountains the number is lower.
Table with city name and properties and values:
With that, I have a table with the avarage values of all those properties.
What I need to happen: I want the user search for a city with has one or multiple properties, find the best match and attach a score from 0 - 100 to it.
The way I do this is as follow:
1. I first get the 25%, 50% and 70% values for the properties:
_var_[property]_25 = [integer]
_var_[property]_50 = [integer]
_var_[property]_70 = [integer]
2. Then I need to use this algorithm:
_var_user_search_for_properties = [mountain,beach]
_var_max_property_percentage = 100 / [properties user search for]
_var_match_percentage = 0
for each _var_user_search_for_properties
if [property] < _var_[property]_25 then
_var_match_percentage += _var_max_property_percentage
elseif [property] < _var_[property]_50 then
_var_match_percentage += _var_max_property_percentage / 4 * 3
elseif [property] < _var_[property]_75 then
_var_match_percentage += _var_max_property_percentage / 4 * 2
elseif [property] < 0 then
_var_match_percentage += _var_max_property_percentage / 4 * 1
end if
next
order all rows by _var_match_percentage desc
The question is: is it posible to do this with MySQL?
How do I calculate this "match percentage" with it?
Or wil it be faster to get all the rows and indexes out of the database and loop them all trough .NET?
If the percentages can be stored in the database, you could try MySQL's LIMIT clause. See http://www.mysqltutorial.org/mysql-limit.aspx.

Data set too large to load into memory for processing

I have a bigger rapidly growing data set of around 4 million rows, in order to define and exclude the outliers (for statistics / analytics usage) I need the algorithm to consider all entries in this data set. However this is too much data to load into memory and my system chokes. I'm currently using this to collect and process the data:
#scoreInnerFences = innerFence Post.where( :source => 1 ).
order( :score ).
pluck( :score )
Using the typical divide and conquer method won't work, I don't think because every entry has to be considered to keep my outlier calculation accurate. How can this be achieved efficiently?
innerFence identifies the lower quartile and upper quartile of the data set, then uses those findings to calculate the outliers. Here is the (yet to be refactored, non-DRY) code for this:
def q1(s)
q = s.length / 4
if s.length % 2 == 0
return ( s[ q ] + s[ q - 1 ] ) / 2
else
return s[ q ]
end
end
def q2(s)
q = s.length / 4
if s.length % 2 == 0
return ( s[ q * 3 ] + s[ (q * 3) - 1 ] ) / 2
else
return s[ q * 3 ]
end
end
def innerFence(s)
q1 = q1(s)
q2 = q2(s)
iq = (q2 - q1) * 3
if1 = q1 - iq
if2 = q2 + iq
return [if1, if2]
end
This is not the best way, but it is an easy way:
Do several querys. First you count the number of scores:
q = Post.where( :source => 1 ).count
then you do your calculations
then you fetch the scores
q1 = Post.where( :source => 1 ).
reverse_order(:score).
select("avg(score) as score").
offset(q).limit((q%2)+1)
q2 = Post.where( :source => 1 ).
reverse_order(:score).
select("avg(score) as score").
offset(q*3).limit((q%2)+1)
The code is probably wrong but I'm sure you get the idea.
For large datasets, I sometimes drop down below ActiveRecord. It's a memory hog, even I imagine, using pluck. Of course it's less portable, but sometimes it's worth it.
scores = Post.connection.execute('select score from posts where score > 1 order by score').map(&:first)
Don't know if that will help enough for 4 million record. If not, maybe look at a stored procedure?

ActiveRecord joins and where

I have three models Company, Deal and Slot. They are associated as Company has_many deals and Deal has_many slots. All the A company can be expired if all of its deals are expired. And a deal is expired when all of its slots are expired.
I have written a scope..
scope :expired,
lambda { |within|
self.select(
'DISTINCT companies.*'
).latest(within).joins(
:user =>{ :deals => :slots }
).where(
"companies.spam = false AND deals.deleted_at IS NULL
AND deals.spam = false AND slots.state = 1
OR slots.begin_at <= :time",
:time => Time.zone.now + SLOT_EXPIRY_MARGIN.minutes
)
}
The above scope does not seem right to me from what I am trying to achieve. I need companies with all of its slots for all the deals are either in state 1 or the begin_at is less than :time making it expired.
Thanks for having a look in advance.
AND has a higher precedence than OR in SQL so your where actually gets parsed like this:
(
companies.spam = false
and deals.deleted_at is null
and deals.spam = false
and slots.state = 1
)
or slots.begin_at <= :time
For example (trimmed a bit for brevity):
mysql> select 1 = 2 and 3 = 4 or 5 = 5;
+---+
| 1 |
+---+
mysql> select (1 = 2 and 3 = 4) or 5 = 5;
+---+
| 1 |
+---+
mysql> select 1 = 2 and (3 = 4 or 5 = 5);
+---+
| 0 |
+---+
Also, you might want to use a placeholder instead of the literal false in the SQL, that should make things easier if you want to switch databases (but of course, database portability is largely a myth so that's just a suggestion); you could also just use not in the SQL. Furthermore, using a class method is the preferred way to accept arguments for scopes. Using scoped instead of self is also a good idea in case other scopes are already in play but if you use a class method, you don't have to care.
If we fix the grouping in your SQL with some parentheses, use a placeholder for false, and switch to a class method:
def self.expired(within)
select('distinct companies.*').
latest(within).
joins(:user => { :deals => :slots }).
where(%q{
not companies.spam
and not deals.spam
and deals.deleted_at is null
and (slots.state = 1 or slots.begin_at <= :time)
}, :time => Time.zone.now + SLOT_EXPIRY_MARGIN.minutes)
end
You could also write it like this if you prefer little blobs of SQL rather than one big one:
def self.expired(within)
select('distinct companies.*').
latest(within).
joins(:user => { :deals => :slots }).
where('not companies.spam').
where('not deals.spam').
where('deals.deleted_at is null').
where('slots.state = 1 or slots.begin_at <= :time', :time => Time.zone.now + SLOT_EXPIRY_MARGIN.minutes)
end
This one also neatly sidesteps your "missing parentheses" problem.
UPDATE: Based on the discussion in the comments, I think you're after something like this:
def self.expired(within)
select('distinct companies.*').
latest(within).
joins(:user => :deals).
where('not companies.spam').
where('not deals.spam').
where('deals.deleted_at is null').
where(%q{
companies.id not in (
select company_id
from slots
where state = 1
and begin_at <= :time
group by company_id
having count(*) >= 10
)
}, :time => Time.zone.now + SLOT_EXPIRY_MARGIN.minutes
end
That bit of nastiness at the bottom grabs all the company IDs that have ten or more expired or used slots and then companies.id not in (...) excludes them from the final result set.