Query ActiveRecord for records and relation calculations at once - mysql

TL;DR? See Edit 2
I've got a little Rails application that has a few different sort of games people can play: it's based around sports, so they can pick the winners of each game every week (model PickEm, attribute correct boolean with nil for unfinished games), and predict the outcome of a specific team's game (model Guess, attribute score with integer, nil for unfinished games). Every User has_many PickEms and Guesses. And I'm trying to display standings (correct/total - total being all non-nil, score/total possible).
What I'm finding is that I can gather the users and their associated records, but in trying to display standings I'm discovering that every single User is triggering another query - slow and not sustainable as the user base increases. That's because #user.pick_em_score is pick_ems.where(correct: true).size and #user.guess_Score is guesses.where.not(score: nil).sum(:score). So I call user.pick_em_score and it runs that query. I feel like there should be a way to get every User, as well as these specific counts, at once, rather than buffering a whole bunch of needless extra stuff.
What I need:
User record
User.pick_em_score (calculated by counting correct records)
User.pick_ems count where NOT NULL
User.guesses_score (calculated by guesses.sum(:score))
User.guesses count where NOT NULL
Most of the stuff I find on Rails's ActiveRecord helpers, especially related to calculations, is for retrieving only the calculation. It looks like I'll probably need to delve directly into select() etc. But I can't get it working. Can someone point me in the right direction?
Edit
For clarification: I'm aware that I can write this information to the User model, but this is overly restrictive: next season, I'll need to add a new column to the User for that year's results, etc. In addition, this is a third degree of callback updating related models – the Match model already updates related PickEms and Guesses on save. I'm looking for the simplest ActiveRecord query or queries to be able to work with this information, as indicated by the title. Ideally one query that returns the above information, but if it needs to a few, that's OK.
I used to work directly in MySQL with PHP, but those skills have rusted (in raw MySQL, I imagine, I'd have several sub-select statements to help pull these counts) and I'd also like to be able to use Rails's ActiveRecord helpers and such, and avoid constructing raw SQL as much as possible.
Second Edit:
I seem to have it down to one call that starts to work, but I'm writing a lot of SQL. It's also brittle, IMO, and trying to run with it has failed. It also looks like I'm just pushing the million singular SELECT queries from Rails right into SQL, but that may still be a step up.
User.unscoped.select('users.*',
'(SELECT COUNT(*) FROM pick_ems WHERE pick_ems.user_id = users.id AND pick_ems.correct) AS correct_pick_ems',
'(SELECT COUNT(*) FROM pick_ems WHERE pick_ems.user_id = users.id AND pick_ems.correct IS NOT NULL) AS total_pick_ems',
'(SELECT SUM(guesses.score) FROM guesses WHERE guesses.user_id = users.id AND guesses.score IS NOT NULL) AS guesses_score',
'(SELECT COUNT(*) FROM guesses WHERE guesses.user_id = users.id AND guesses.score IS NOT NULL) AS guesses_count' )
The issue seems to be: is there a way to use Rails, and not raw SQL, to link up users.id that we see there with these subqueries? Or just … a better way to construct this, in general?
In addition, I'm running another set of SELECTs for the WHERE, which would hinge on total_pick_ems and guesses_count being > 0 but since I can't use those aliased columns, I have to call the SELECT one more time.

Welcome to AR. Its really only good for simple CRUD like queries. Once you actually want to query your data in anger it just doesn't have the capababilities to do the queries you want without resorting to wholesale SQL strings and often abandoning the ability to chain as a result.
Its precisely why I moved to Sequel as it does have the features to compose queries using a much fuller SQL feature set, including join conditions, window functions, recursive common table expressions, and advanced eager loading. The author is incredibly responsive and documentation is excellent compared to AR and Arel.
I don't expect you will like this answer but a time will come when you will start to look outside the opinionated components that come with rails which I have to say are hardly best of breed. Sequel also sped my application up many times over what I was able to get with AR as well, it not just developer happiness, it means less servers to run. Yes it will be a learning curve but IMO its better to learn tools that have your back covered.

Joins might work. Smthing like below
User.unscoped.joins(:guesses).joins(:pick_ems).
where("guesses.score IS NOT NULL").
select("users.*,
sum(guesses.score) as guesses_score,
count(guesses.id) as guesses_count,
count(case when pick_ems.correct = True then 1 else null end)
as correct_pick_ems,
count(case when pick_ems.correct != null then 1 else null end)
as total_pick_ems,
").
group("users.id")
If you need this information for a limited number of users at a time then above query or eager loading (User.includes(:guesses, :pick_ems)) with class methods like
def correct_pick_ems
pick_ems.count(&:correct)
end
would work.
However If you need this information for all the users most of the time, cached counters within the users table would be more optimal.

What you need is some sort of custom (smart) counter_cache to count only at certain conditions (e.g correct is true)
You can achive this using conditional after_save & after_destroy triggers to build your own custom counter_cache that looks like this:
class PickEm
belongs_to :user
after_save :increment_finished_counter_cache, if: Proc.new { |pick_em| pick_em.correct }
after_destroy :decrement_finished_counter_cache, if: Proc.new { |pick_em| pick_em.correct }
private
def increment_finished_counter_cache
self.user.update_column(:finished_games_counter, self.user.finished_games_counter + 1) #update_column should not trigger any validations or callbacks
end
def decrement_finished_counter_cache
self.user.update_column(:finished_games_counter, self.user.finished_games_counter - 1) #update_column should not trigger any validations or callbacks
end
end
Notes:
Code not tested (only to show the idea)
Some guys said it's better to avoid naming custom counters as rails name them (foo_counter_cache)

You should benchmark it, but my hunch is that adding all of that data into a single SELECT isn't going to be much faster than breaking it up into separate SELECTs (I've actually had cases where the latter was faster). By breaking it up, you can also stick to more ActiveRecord and less raw SQL, e.g.:
user_ids_to_pick_em_score = User.joins(:pick_ems).where(pick_ems: {correct: true}).group(:user_id).count
user_ids_to_pick_ems_count = User.joins(:pick_ems).where.not(pick_ems: {correct: nil}).group(:user_id).count
user_ids_to_guesses_score = Hash[User.select("users.id, SUM(guesses.score) AS total_score").joins(:guesses).group(:user_id).map{|u| [u.id, u.total_score]}]
user_ids_to_guesses_count = User.joins(:guesses).where.not(guesses: {score: nil}).group(:user_id).count
Edit: To display them, you could do like so:
<%- User.select(:id, :name).find_each do |u| -%>
Name: <%= u.name %>
Picks Correct: <%= user_ids_to_pick_em_score[u.id] %>/<%= user_ids_to_pick_ems_count[u.id] %>
Total Score: <%= user_ids_to_guesses_score[u.id] %>/<%= user_ids_to_guesses_count[u.id] %>
<%- end -%>

Related

Thinking sphinx ranking and statistics

I'm trying to set up an ability to get some numbers from my Sphinx indexes, but not sure how to get the info I want.
I have a mysql db with articles, sphinx index set up for that db and full text search, all working. What I want is to get some numbers:
How many times search text (keyword, or key phrase) appears over all articles for all time (more likely limited to "articles from time interval from X and to Y")
Same as previous but for how many times 2 keywords or keyphrases (so "x AND y") appear in same articles
I was doing something similar to first manually using bat file I made
indexer ind_core -c c:\%SOME_PATH%\development.sphinx.conf --buildstops stats.txt 10000 --buildfreqs
Which generated me a txt with all repeating keywords and how often they appear at early development stages, which helped to form a list of keywords I'm interested in. Now I'm trying to do the same but just for a finite list of predetermined keywords and integrated into my rails project to be able to build charts in future.
I tried running some queries like
#testing = Article.search 'Keyword1 AND Keyword2', :ranker => :wordcount
but I'm not sure how it works and how to process the result, as well as if that's what I'm looking for.
Another approach I tried was manual mysql queries such as
SELECT id,title,WEIGHT() AS w FROM ind_core WHERE MATCH('#title keyword1 | keyword2') OPTION ranker=expr('sum(hit_count)');
but I'm not sure how to process results from here either (as well as how to actually implement it into my existing rails project), and it's limited to 20 lines per query (which I think I can change somewhere in settings?). But at least looking at mysql results what I'm interested in is hit_count over all articles (or all articles from set timeframe).
Any ideas on how to do this?
UPDATE:
Current way I found was to add
#testing = Article.search params[:search], :without => {:is_active => false}, :ranker => :bm25
to controller with some conditions (so it doesn't bug out from nil search). :is_active is my soft delete flag, don't want to search deleted entries, so don't mind it. And in view I simply displayed
<%= #testing.total_entries %>
Which if I understand it correct shows me number of matches sphinx found (so pretty much what I was looking for).
So, to figure out the number of hits per document, you're pretty much on the right track, it's just a matter of getting it into Ruby/Thinking Sphinx.
To get the raw Sphinx results (if you don't need the ActiveRecord objects):
search = Article.search "foo",
:ranker => "expr('SUM(hit_count)')",
:select => "*, weight()",
:middleware => ThinkingSphinx::Middlewares::RAW_ONLY
… this will return an array of hashes, and you can use the weight() string key for the hit count, and the sphinx_internal_id string key for the model's primary key (id is Sphinx's own primary key, which isn't so useful).
Or, if you want to use the ActiveRecord objects, Thinking Sphinx has the ability to wrap each search result in a helper object which passes appropriate methods through to the underlying model instances, but lets weight respond with the values from Sphinx:
search = Article.search "foo",
:ranker => "expr('SUM(hit_count)')",
:select => "*, weight()"; ""
search.context[:panes] << ThinkingSphinx::Panes::WeightPane
search.each do |article|
puts article.weight
end
Keep in mind that panes must be added before the search is evaluated, so if you're testing this in a Rails console, you'll want to avoid letting the console inspect the search variable (which I usually do by adding ; "" at the end of the initial search call.
In both of these cases, as you've noted, the search results are paginated - you can use the :page option to determine which page of results you want, and :per_page to determine the number of records returned in each request. There is a standard limit of 1000 results overall, but that can be altered using the max_matches setting.
Now, if you want the number of times the keywords appear across all Sphinx records, then the best way to do that while also taking advantage of Thinking Sphinx's search options, is to get the raw results of an aggregate SUM - similar to the first option above.
search = Article.search "foo",
:ranker => "expr('SUM(hit_count)')",
:select => "SUM(weight()) AS count",
:middleware => ThinkingSphinx::Middlewares::RAW_ONLY
search.first["count"]

yii pagination issue trying to use 2 criterias

Disclaimer I'm self taught. Got my rudimentary knowledge of php reading forums. I'm an sql newb, and know next to nothing about yii.
I've got a controller that shows the products on our webstore. I would like the out of stock products to show up on the last pages.
I know I could sort by stock quantity but would like the in stock products to change order every time the page is reloaded.
My solution (probably wrong but kinda works) is to run two queries. One for the product that has stock, sorted randomly. One for the out of stock product also ordered randomly. I then merge the two resulting arrays. This much has worked using the code below (although I feel like there must be a more efficient way than running two queries).
The problem is that this messes up the pagination. Every product returned is listed on the same page and changing pages shows the same results. As far as I can tell the pagination only works for 1 CDbCriteria at a time. I've looked at the yii docs for CPagination for a way around this but am not getting anywhere.
$criteria=new CDbCriteria;
$criteria->alias = 'Product';
$criteria->addCondition('(inventory_avail>0 OR inventoried=0)');
$criteria->addCondition('Product.parent IS NULL');
$criteria->addCondition('web=1');
$criteria->addCondition('current=1');
$criteria->addCondition('sell>sell_web');
$criteria->order = 'RAND()';
$criteria2=new CDbCriteria;
$criteria2->alias = 'Product';
$criteria2->addCondition('(inventory_avail<1 AND inventoried=1)');
$criteria2->addCondition('Product.parent IS NULL');
$criteria2->addCondition('web=1');
$criteria2->addCondition('current=1');
$criteria2->addCondition('sell>sell_web');
$criteria2->order = 'RAND()';
$crit1=Product::model()->findAll($criteria);
$crit2=Product::model()->findAll($criteria2);
$models=array_merge($crit1,$crit2);
//I know there is something wrong here, no idea how to fix it..
$count=Product::model()->count($criteria);
$pages=new CPagination($count);
//results per page
$pages->pageSize=30;
$pages->applyLimit($criteria);
$this->render('index', array(
'models' => $models,
'pages' => $pages
));
Clearly I am in over my head. Any help would be much appreciated.
Edit:
I figured that a third CDbCriteria that includes both the in stock and out of stock items could be used for the pagination (as it would include the same number of products as the combined results of the first 2). So I tried adding this (criteria1 and criteria2 remain the same):
$criteria3=new CDbCriteria;
$criteria3->alias = 'Product';
//$criteria3->addCondition('(inventory_avail>0 OR inventoried=0)');
$criteria3->addCondition('Product.parent IS NULL');
$criteria3->addCondition('web=1');
$criteria3->addCondition('current=1');
$criteria3->addCondition('sell>sell_web');
//$criteria3->order = 'RAND()';
$crit1=Product::model()->findAll($criteria);
$crit2=Product::model()->findAll($criteria2);
$models=array_merge($crit1,$crit2);
$count=Product::model()->count($criteria3);
$pages=new CPagination($count);
//results per page
$pages->pageSize=30;
$pages->applyLimit($criteria3);
$crit1=Product::model()->findAll($criteria);
$crit2=Product::model()->findAll($criteria2);
$models=array_merge($crit1,$crit2);
$this->render('index', array(
'models' => $models,
'pages' => $pages
));
I'm sure I'm missing something super obvious here... Been searching all day getting nowhere.
So you are running into what is IMO one of the potential drawbacks of natural language query builder frameworks. They can get your thinking on how you might approach a SQL problem going down a bad path when trying to work with the "out of the box" methods for building queries. Sometimes you might need to think about using raw SQL query capabilities that most every framework to provide in order to best address your problem.
So let's start with the basic SQL for how I would suggest you approach your problem. You can either work this into your query builder style (if possible) or make a raw query.
You could easily form a calculated field representing binary inventory status for sorting. Then also sort by another criteria secondarily.
SELECT
field1,
field2,
/* other fields */
IF(inventory_avail > 0, 1, 0) AS in_inventory
FROM product
WHERE /* where conditions */
ORDER BY
in_inventory DESC, /* sort items in inventory first */
other_field_to_sort ASC /* other sort criteria */
LIMIT ?, ? /* pagination row limit and offset */
Note that this approach only returns the rows of data you need to display. You move away from your current approach of doing a lot of work in the application to merge record sets and such.
I do question use of RAND() for pagination purposes as doing so will yield products potentially appearing on one page after another as the user paginates through the pages, with other products perhaps not showing up at all. Either that or you need to have some additional complexity added to your applicatoin to somehow track the "randomized" version of the entire result set for each specific user. For this reason, it is really unusual to see order randomization for paginated results display.
I know you mentioned you might like to spike out a randomized view to the user on a "first page". If this is a desire that is OK, but perhaps you decouple or differentiate that specific view from a wider paginated view of the product listing so as to not confuse the end user with a seemingly unpredictable pagination interface.
In your ORDER BY clause, you should always have enough sorting conditions to where the final (most specific) condition will guarantee you a predictable order result. Oftentimes this means you have to include an autoincrementing primary key field, or similar field that provides uniqueness for the row.
So let's say for example I had the ability for user to sort items by price, but you still obviously wanted to show all inventoried items first. Now let's say you have 100K products such that you will have many "pages" of products with a common price when ordered by price
If you used this for ordering:
ORDER BY in_inventory DESC, price ASC
You could still have the problem of a user seeing the same product repeated when navigating between pages, because a more specific criteria than price was not given and ordering beyond that criteria is not guaranteed.
You would probably want to do something like:
ORDER BY in_inventory DESC, price ASC, unique_id ASC
Such that the order is totally predictable (even though the user may not even know there is sorting being applied by unique id).

Is it good to filter data in controller or use sql query in model?

What is the best approach for searching?What will be difference if i filter the all data in controller and get result, and use where query in model and get result ?Please suggest your opinion.
It depends on the complexity of your query..
You can try to mesure the time of processing by putting flags in your code between each steps in order to measure the time of processing.
Then you make one test of speed processing like:
print time_flag 1
var results_sql_processing = *complex query*
print time flag 2
var raw_results_script_processing = *dump query*
print time flag 3
var results_script_processing = *processing the data*
print time flag 4
and make sure results_script_processing == results_sql_processing.
You can also set different datasets size (limit 100, 500, 1000) and see how the difference evolves between both solutions
Also, i would recommend to take a look at the query builders you might find in many frameworks (I use laravel's query builder).
They are usually a very good compromise when the query isn't too complex (no data aggregation, complex concat,....), you can still use joins, union and many filters on them.
But in case you want to get a super complex pivot table for instance, just build a strong sql query and then fire it in your code!

Find users created during a specific hour range (any day)

For fun, I'd like to see the set of users (in a Rails database) who were created during a specific hour range (2AM - 5AM to be specific), but on any day. They all have the typical created_at field. I think I know how to extract the hour from this field for one user, and then see if it falls in a range--but how I do do this for all of them? Should I just loop through them? (Even as I write it, that sounds naive).
The first part of Sontyas answer is the easy solution in rails.
I would however move that part to it's own place inside your class to separate your code from the framework a bit more.
# app/models/user.rb
class User < ActiveRecord::Base
# ...
def self.get_users_created_between(start_time, end_time)
User.where("TIME(created_at) BETWEEN TIME(?) AND TIME(?)", start_time, end_time)
end
# ...
end
And use it like this:
irb> User.get_users_created_between(Time.parse("2pm"), Time.parse("5pm"))
This provides you with a couple of benefits:
You can reuse it all over your code without ever having to worry about the syntax of the where or time ranges.
If for some weird reason rails decides to change the interface for this, you only need to edit one method and not code in a thousand places all over your project.
You can easily move this piece of code out of the user.rb if you feel that user.rb gets to big. Maybe to some dedicated finder or query class. Or to something like a repository pattern.
PS: Time functions may vary between different DBMS like MySQL, Postgresql, MSSQL etc. I don't know, if there is generic way to do this. This answer is for MySQL.
Try this,
User.where(created_at: Time.parse("2pm")..Time.parse("5pm"))
Or something like this
User.select { |user| user.created_at.hour.between?(2, 5) }
To return users that where created between two hours on any given day, use this:
User.where('HOUR(created_at) BETWEEN ? AND ?', 2, 5)
Please note that HOUR(created_at) only works for MySQL. The syntax in Postgresql is extract(hour from timestamp created_at) and strftime('%H' created_at) in SQLite.

nested sql queries in rails

I have the following query
#initial_matches = Listing.find_by_sql(["SELECT * FROM listings WHERE industry = ?", current_user.industry])
Is there a way I can run another SQL query on the selection from the above query using a each do? I want to run geokit calculations to eliminate certain listings that are outside of a specified distance...
Your question is slightly confusing. Do you want to use each..do (ruby) to do the filtering. Or do you want to use a sql query. Here is how you can let the ruby process do the filtering
refined list = #initial_matches.map { |listing|
listing.out_of_bounds? ? nil : listing
}.comact
If you wanted to use sql you could simply add additional sql (maybe a sub-select) it into your Listing.find_by_sql call.
If you want to do as you say in your comment.
WHERE location1.distance_from(location2, :units=>:miles)
You are mixing ruby (location1.distance_from(location2, :units=>:miles)) and sql (WHERE X > 50). This is difficult, but not impossible.
However, if you have to do the distance calculation in ruby already, why not do the filtering there as well. So in the spirit of my first example.
listing2 = some_location_to_filter_by
#refined_list = #initial_matches.map { |listing|
listing.distance_from(listing2) > 50 ? nil : listing
}.compact
This will iterate over all listings, keeping only those that are further than 50 from some predetermined listing.
EDIT: If this logic is done in the controller you need to assign to #refined_list instead of refined_list since only controller instance variables (as opposed to local ones) are accessible to the view.
In short, no. This is because after the initial query, you are not left with a relational table or view, you are left with an array of activerecord objects. So any processing to be done after the initial query has to be in the format of ruby and activerecord, not sql.