Calculate new average afterSave (or something better?) in CakePHP - mysql

I'm building a rating system where a user can rate something from 1-5 stars.
I was wondering if there's a way to automatically calculate all of a specific item's ratings (from the ratings table where model='x' and foreign_key='y') on afterSave or something similar.
I can do it in the ratings_controller just fine... just thought it might be more ideal to be done automatically in the model. Can anyone point me in the right direction for this?
I would LOVE to hear that there's some kind of association setting in CakePHP that allows it to do this for you - something like:
//Rating model
var $belongsTo = array(
'Restaurant' => array(
'averageValue' => 'rating
)
);
But - I'm sure that's asking to much :)

if you want to save the average into a field in items table then afterSave would probably be the best solution right now.
The only thing cake can automatically do for you is keeping track of how many ratings an item has (counterCache), but not other aggregate functions.
virtualField may be good, but I have never used that for aggregate functions, so I'm not sure. Besides, if your ratings don't change often, it would put unnecessary work on the system.
In Rating model:
function afterSave($created){
$avgValue = $this->Query('SELECT AVG(rating) as rating FROM ratings WHERE ratings.restaurant_id = '.$this->restaurant_id);
$this->Restaurant->updateRatingAverage($this->restaurant_id,$avgValue[0][0]['rating']);
}
In Restaurant model
function updateRatingAverage($id,$avg){
$this->id = $id;
$this->field('your_average_field_here',$avg);
}
you might want to log the $avgValue to see how it's structured, but I think I got that right.

I need to post and answer because I can't yet comment.
So, your (for example) restaurants hasMany ratings? Try virtualField in restaurants model where you calculate the rating every time restaurants are fetched from database. There might be need for GROUP BY like ypercube mentioned if you need to use AVG().

Related

yii pagination issue trying to use 2 criterias

Disclaimer I'm self taught. Got my rudimentary knowledge of php reading forums. I'm an sql newb, and know next to nothing about yii.
I've got a controller that shows the products on our webstore. I would like the out of stock products to show up on the last pages.
I know I could sort by stock quantity but would like the in stock products to change order every time the page is reloaded.
My solution (probably wrong but kinda works) is to run two queries. One for the product that has stock, sorted randomly. One for the out of stock product also ordered randomly. I then merge the two resulting arrays. This much has worked using the code below (although I feel like there must be a more efficient way than running two queries).
The problem is that this messes up the pagination. Every product returned is listed on the same page and changing pages shows the same results. As far as I can tell the pagination only works for 1 CDbCriteria at a time. I've looked at the yii docs for CPagination for a way around this but am not getting anywhere.
$criteria=new CDbCriteria;
$criteria->alias = 'Product';
$criteria->addCondition('(inventory_avail>0 OR inventoried=0)');
$criteria->addCondition('Product.parent IS NULL');
$criteria->addCondition('web=1');
$criteria->addCondition('current=1');
$criteria->addCondition('sell>sell_web');
$criteria->order = 'RAND()';
$criteria2=new CDbCriteria;
$criteria2->alias = 'Product';
$criteria2->addCondition('(inventory_avail<1 AND inventoried=1)');
$criteria2->addCondition('Product.parent IS NULL');
$criteria2->addCondition('web=1');
$criteria2->addCondition('current=1');
$criteria2->addCondition('sell>sell_web');
$criteria2->order = 'RAND()';
$crit1=Product::model()->findAll($criteria);
$crit2=Product::model()->findAll($criteria2);
$models=array_merge($crit1,$crit2);
//I know there is something wrong here, no idea how to fix it..
$count=Product::model()->count($criteria);
$pages=new CPagination($count);
//results per page
$pages->pageSize=30;
$pages->applyLimit($criteria);
$this->render('index', array(
'models' => $models,
'pages' => $pages
));
Clearly I am in over my head. Any help would be much appreciated.
Edit:
I figured that a third CDbCriteria that includes both the in stock and out of stock items could be used for the pagination (as it would include the same number of products as the combined results of the first 2). So I tried adding this (criteria1 and criteria2 remain the same):
$criteria3=new CDbCriteria;
$criteria3->alias = 'Product';
//$criteria3->addCondition('(inventory_avail>0 OR inventoried=0)');
$criteria3->addCondition('Product.parent IS NULL');
$criteria3->addCondition('web=1');
$criteria3->addCondition('current=1');
$criteria3->addCondition('sell>sell_web');
//$criteria3->order = 'RAND()';
$crit1=Product::model()->findAll($criteria);
$crit2=Product::model()->findAll($criteria2);
$models=array_merge($crit1,$crit2);
$count=Product::model()->count($criteria3);
$pages=new CPagination($count);
//results per page
$pages->pageSize=30;
$pages->applyLimit($criteria3);
$crit1=Product::model()->findAll($criteria);
$crit2=Product::model()->findAll($criteria2);
$models=array_merge($crit1,$crit2);
$this->render('index', array(
'models' => $models,
'pages' => $pages
));
I'm sure I'm missing something super obvious here... Been searching all day getting nowhere.
So you are running into what is IMO one of the potential drawbacks of natural language query builder frameworks. They can get your thinking on how you might approach a SQL problem going down a bad path when trying to work with the "out of the box" methods for building queries. Sometimes you might need to think about using raw SQL query capabilities that most every framework to provide in order to best address your problem.
So let's start with the basic SQL for how I would suggest you approach your problem. You can either work this into your query builder style (if possible) or make a raw query.
You could easily form a calculated field representing binary inventory status for sorting. Then also sort by another criteria secondarily.
SELECT
field1,
field2,
/* other fields */
IF(inventory_avail > 0, 1, 0) AS in_inventory
FROM product
WHERE /* where conditions */
ORDER BY
in_inventory DESC, /* sort items in inventory first */
other_field_to_sort ASC /* other sort criteria */
LIMIT ?, ? /* pagination row limit and offset */
Note that this approach only returns the rows of data you need to display. You move away from your current approach of doing a lot of work in the application to merge record sets and such.
I do question use of RAND() for pagination purposes as doing so will yield products potentially appearing on one page after another as the user paginates through the pages, with other products perhaps not showing up at all. Either that or you need to have some additional complexity added to your applicatoin to somehow track the "randomized" version of the entire result set for each specific user. For this reason, it is really unusual to see order randomization for paginated results display.
I know you mentioned you might like to spike out a randomized view to the user on a "first page". If this is a desire that is OK, but perhaps you decouple or differentiate that specific view from a wider paginated view of the product listing so as to not confuse the end user with a seemingly unpredictable pagination interface.
In your ORDER BY clause, you should always have enough sorting conditions to where the final (most specific) condition will guarantee you a predictable order result. Oftentimes this means you have to include an autoincrementing primary key field, or similar field that provides uniqueness for the row.
So let's say for example I had the ability for user to sort items by price, but you still obviously wanted to show all inventoried items first. Now let's say you have 100K products such that you will have many "pages" of products with a common price when ordered by price
If you used this for ordering:
ORDER BY in_inventory DESC, price ASC
You could still have the problem of a user seeing the same product repeated when navigating between pages, because a more specific criteria than price was not given and ordering beyond that criteria is not guaranteed.
You would probably want to do something like:
ORDER BY in_inventory DESC, price ASC, unique_id ASC
Such that the order is totally predictable (even though the user may not even know there is sorting being applied by unique id).

Query ActiveRecord for records and relation calculations at once

TL;DR? See Edit 2
I've got a little Rails application that has a few different sort of games people can play: it's based around sports, so they can pick the winners of each game every week (model PickEm, attribute correct boolean with nil for unfinished games), and predict the outcome of a specific team's game (model Guess, attribute score with integer, nil for unfinished games). Every User has_many PickEms and Guesses. And I'm trying to display standings (correct/total - total being all non-nil, score/total possible).
What I'm finding is that I can gather the users and their associated records, but in trying to display standings I'm discovering that every single User is triggering another query - slow and not sustainable as the user base increases. That's because #user.pick_em_score is pick_ems.where(correct: true).size and #user.guess_Score is guesses.where.not(score: nil).sum(:score). So I call user.pick_em_score and it runs that query. I feel like there should be a way to get every User, as well as these specific counts, at once, rather than buffering a whole bunch of needless extra stuff.
What I need:
User record
User.pick_em_score (calculated by counting correct records)
User.pick_ems count where NOT NULL
User.guesses_score (calculated by guesses.sum(:score))
User.guesses count where NOT NULL
Most of the stuff I find on Rails's ActiveRecord helpers, especially related to calculations, is for retrieving only the calculation. It looks like I'll probably need to delve directly into select() etc. But I can't get it working. Can someone point me in the right direction?
Edit
For clarification: I'm aware that I can write this information to the User model, but this is overly restrictive: next season, I'll need to add a new column to the User for that year's results, etc. In addition, this is a third degree of callback updating related models – the Match model already updates related PickEms and Guesses on save. I'm looking for the simplest ActiveRecord query or queries to be able to work with this information, as indicated by the title. Ideally one query that returns the above information, but if it needs to a few, that's OK.
I used to work directly in MySQL with PHP, but those skills have rusted (in raw MySQL, I imagine, I'd have several sub-select statements to help pull these counts) and I'd also like to be able to use Rails's ActiveRecord helpers and such, and avoid constructing raw SQL as much as possible.
Second Edit:
I seem to have it down to one call that starts to work, but I'm writing a lot of SQL. It's also brittle, IMO, and trying to run with it has failed. It also looks like I'm just pushing the million singular SELECT queries from Rails right into SQL, but that may still be a step up.
User.unscoped.select('users.*',
'(SELECT COUNT(*) FROM pick_ems WHERE pick_ems.user_id = users.id AND pick_ems.correct) AS correct_pick_ems',
'(SELECT COUNT(*) FROM pick_ems WHERE pick_ems.user_id = users.id AND pick_ems.correct IS NOT NULL) AS total_pick_ems',
'(SELECT SUM(guesses.score) FROM guesses WHERE guesses.user_id = users.id AND guesses.score IS NOT NULL) AS guesses_score',
'(SELECT COUNT(*) FROM guesses WHERE guesses.user_id = users.id AND guesses.score IS NOT NULL) AS guesses_count' )
The issue seems to be: is there a way to use Rails, and not raw SQL, to link up users.id that we see there with these subqueries? Or just … a better way to construct this, in general?
In addition, I'm running another set of SELECTs for the WHERE, which would hinge on total_pick_ems and guesses_count being > 0 but since I can't use those aliased columns, I have to call the SELECT one more time.
Welcome to AR. Its really only good for simple CRUD like queries. Once you actually want to query your data in anger it just doesn't have the capababilities to do the queries you want without resorting to wholesale SQL strings and often abandoning the ability to chain as a result.
Its precisely why I moved to Sequel as it does have the features to compose queries using a much fuller SQL feature set, including join conditions, window functions, recursive common table expressions, and advanced eager loading. The author is incredibly responsive and documentation is excellent compared to AR and Arel.
I don't expect you will like this answer but a time will come when you will start to look outside the opinionated components that come with rails which I have to say are hardly best of breed. Sequel also sped my application up many times over what I was able to get with AR as well, it not just developer happiness, it means less servers to run. Yes it will be a learning curve but IMO its better to learn tools that have your back covered.
Joins might work. Smthing like below
User.unscoped.joins(:guesses).joins(:pick_ems).
where("guesses.score IS NOT NULL").
select("users.*,
sum(guesses.score) as guesses_score,
count(guesses.id) as guesses_count,
count(case when pick_ems.correct = True then 1 else null end)
as correct_pick_ems,
count(case when pick_ems.correct != null then 1 else null end)
as total_pick_ems,
").
group("users.id")
If you need this information for a limited number of users at a time then above query or eager loading (User.includes(:guesses, :pick_ems)) with class methods like
def correct_pick_ems
pick_ems.count(&:correct)
end
would work.
However If you need this information for all the users most of the time, cached counters within the users table would be more optimal.
What you need is some sort of custom (smart) counter_cache to count only at certain conditions (e.g correct is true)
You can achive this using conditional after_save & after_destroy triggers to build your own custom counter_cache that looks like this:
class PickEm
belongs_to :user
after_save :increment_finished_counter_cache, if: Proc.new { |pick_em| pick_em.correct }
after_destroy :decrement_finished_counter_cache, if: Proc.new { |pick_em| pick_em.correct }
private
def increment_finished_counter_cache
self.user.update_column(:finished_games_counter, self.user.finished_games_counter + 1) #update_column should not trigger any validations or callbacks
end
def decrement_finished_counter_cache
self.user.update_column(:finished_games_counter, self.user.finished_games_counter - 1) #update_column should not trigger any validations or callbacks
end
end
Notes:
Code not tested (only to show the idea)
Some guys said it's better to avoid naming custom counters as rails name them (foo_counter_cache)
You should benchmark it, but my hunch is that adding all of that data into a single SELECT isn't going to be much faster than breaking it up into separate SELECTs (I've actually had cases where the latter was faster). By breaking it up, you can also stick to more ActiveRecord and less raw SQL, e.g.:
user_ids_to_pick_em_score = User.joins(:pick_ems).where(pick_ems: {correct: true}).group(:user_id).count
user_ids_to_pick_ems_count = User.joins(:pick_ems).where.not(pick_ems: {correct: nil}).group(:user_id).count
user_ids_to_guesses_score = Hash[User.select("users.id, SUM(guesses.score) AS total_score").joins(:guesses).group(:user_id).map{|u| [u.id, u.total_score]}]
user_ids_to_guesses_count = User.joins(:guesses).where.not(guesses: {score: nil}).group(:user_id).count
Edit: To display them, you could do like so:
<%- User.select(:id, :name).find_each do |u| -%>
Name: <%= u.name %>
Picks Correct: <%= user_ids_to_pick_em_score[u.id] %>/<%= user_ids_to_pick_ems_count[u.id] %>
Total Score: <%= user_ids_to_guesses_score[u.id] %>/<%= user_ids_to_guesses_count[u.id] %>
<%- end -%>

Trying to restrict an Eloquent query to a relationship with a count of 0

I have two models (Organizations and Interactions) and I'd like to query the Organization model for all of the Orgs that have no Interactions. Organizations have a one-to-many relationship with Interactions.
I tried looking into anti-joins in raw SQL, but got nowhere. I also wanted to totally avoid anything like getting all of the full Organizations, then iterating through them to check to see if they had any Interactions, because that's completely impractical given the amount of data I'm working with.
To clarify, I want to avoid this:
$organizations = Organization::all();
foreach ($organizations as $org)
if($org->interactions()->count() == 0){
//Add the org to an array for later use because it has no interactions
}
I'm using Laravel 3.x, and I can't upgrade because the project is really big and I don't have the month it would take to upgrade to 4.1 right now. If there's a significantly better way to do stuff like this 4, that would make selling the conversion process easier.
Here's some relevant code:
//From organization.php
public function interactions() {
return $this->has_many('Interaction');
}
//From interaction.php
public function organization() {
return $this->belongs_to('Organization');
}
// select all Organization IDs that have at least 1 interaction
$uniqueOrganizationIDs = DB::raw('SELECT organization_id FROM interactions GROUP BY(organization_id)');
// Select orgs that were not in the above list.
Organization::whereNotIn('id', $uniqueOrganizationIDs)->get();
This is the solution I came up with:
Query the Organization and Interaction models using list(). For Orgs, get their ID. For Interactions, get their organization_id. I figure these are two low-footprint, fast queries.
Do an array_diff() on them to get an array of Organizations that don't have Interactions.
Query Organization using where_in(), feeding it the diff'ed array.
It looks like this:
$organizationIDs = DB::table('organizations')->where('is_deleted', '=', 0)->lists('id');
$interactionIDs = DB::table('interactions')where('is_deleted', '=', 0)->lists('organization_id');
$uncontactedOrganizationIDs = array_diff($organizationIDs, $interactionIDs);
$uncontactedOrganizations = Organization::where_in('id', $uncontactedOrganizationIDs)->order_by('created_at', 'DESC')->get();
Is there a better way to do this? I feel like there has to be.

CakePHP Displaying Field Names via Two Associations

I apologize for the confusing title, I was a little stumped as to how to word my question.
I am new to CakePHP, but am following along through the cookbook/tutorials nicely, however I have come up against something which I cannot find an answer to.
My structure is as follows:
'Invoices' hasMany 'InvoiceHistory'
'InvoiceHistory' belongsTo 'InvoiceHistoryDeliveryStatus'
Whereby, an invoice can have multiple invoice histories, and each history contains a delivery status id, which links to a name.
On the Invoice view (index.ctp) I am displaying a list of all invoices but wish to display the Most Recent Delivery Status Name (InvoiceHistory contains a date field so it can be sorted) - thereby displaying the 'current Delivery Status'.
When I do:
$this->set('invoices', $this->Invoice->find('all'));
It does not go deep enough in what it returns to provide me with Delivery Status Names, nor have I deduced a way of only returning the most recent Invoice History within my result. I know how to do this manually with a MYSQL query but I figured that is probably just plain wrong.
What is the correct way of going about this while following CakePHP conventions?
Use Containable
$this->Invoice->Behaviors->attach('Containable');
$this->set('invoices', $this->Invoice->find('all', array(
'contain' => array(
'InvoiceHistory' => array(
'InvoiceHistoryDeliveryStatus'
)
)
));
From what I can tell, I think you should check out the Containable behavior.

Logical Column in MySQL - How?

I have a datamodel, let's say: invoices (m:n) invoice_items
and currently I store the invoice total, calculated in PHP by totalling invoice_items, in a column in invoices. I don't like storing derived data as it paves the way for errors later.
How can I create a logical column in the invoices table in MySql? Is this something I would be better handling in the PHP (in this case CakePHP)?
There's something called Virtual Fields in CakePHP which allows you to achieve the same result from within your Model instead of relying on support from MySQL. Virtual Fields allow you to "mashup" various data within your model and provide that as an additional column in your record. It's cleaner than the other approaches here...(no afterFind() hacking).
Read more here: http://book.cakephp.org/view/1608/Virtual-fields
Leo,
One thing you could do is to modify the afterFind() method in your model. This would recalculate the total any time you retrieve an invoice (costing runtime processing), but would mean you're not storing it in the invoices table, which is apparently what you want to avoid (correct if I'm wrong).
Try this:
class Invoice extends AppModel {
// .. other stuff
function afterFind() {
parent::afterFind();
$total = 0;
foreach( $this->data['Invoice']['InvoiceItems'] as $item )
$total += ($item['cost'] * $item['quantity']);
$this->data['Invoice']['total'] = $total;
}
}
I may have messed up the arrays on the hasMany relationship (the foreach line), but I hope you get the jist of it. HTH,
Travis
Either you can return the derived one when you want it via
SELECT COUNT(1) as total FROM invoice_items
Or if invoices can be multiple,
//assuming that invoice_items.num is how many there are per row
SELECT SUM(num) as total FROM invoice_items
Or you can use a VIEW, if you have a certain way you want it represented all the time.
http://forge.mysql.com/wiki/MySQL_virtual_columns_preview
It's not implemented yet, but it should be implemented in mysql 6.0
Currently you could create a view.