yii pagination issue trying to use 2 criterias - mysql

Disclaimer I'm self taught. Got my rudimentary knowledge of php reading forums. I'm an sql newb, and know next to nothing about yii.
I've got a controller that shows the products on our webstore. I would like the out of stock products to show up on the last pages.
I know I could sort by stock quantity but would like the in stock products to change order every time the page is reloaded.
My solution (probably wrong but kinda works) is to run two queries. One for the product that has stock, sorted randomly. One for the out of stock product also ordered randomly. I then merge the two resulting arrays. This much has worked using the code below (although I feel like there must be a more efficient way than running two queries).
The problem is that this messes up the pagination. Every product returned is listed on the same page and changing pages shows the same results. As far as I can tell the pagination only works for 1 CDbCriteria at a time. I've looked at the yii docs for CPagination for a way around this but am not getting anywhere.
$criteria=new CDbCriteria;
$criteria->alias = 'Product';
$criteria->addCondition('(inventory_avail>0 OR inventoried=0)');
$criteria->addCondition('Product.parent IS NULL');
$criteria->addCondition('web=1');
$criteria->addCondition('current=1');
$criteria->addCondition('sell>sell_web');
$criteria->order = 'RAND()';
$criteria2=new CDbCriteria;
$criteria2->alias = 'Product';
$criteria2->addCondition('(inventory_avail<1 AND inventoried=1)');
$criteria2->addCondition('Product.parent IS NULL');
$criteria2->addCondition('web=1');
$criteria2->addCondition('current=1');
$criteria2->addCondition('sell>sell_web');
$criteria2->order = 'RAND()';
$crit1=Product::model()->findAll($criteria);
$crit2=Product::model()->findAll($criteria2);
$models=array_merge($crit1,$crit2);
//I know there is something wrong here, no idea how to fix it..
$count=Product::model()->count($criteria);
$pages=new CPagination($count);
//results per page
$pages->pageSize=30;
$pages->applyLimit($criteria);
$this->render('index', array(
'models' => $models,
'pages' => $pages
));
Clearly I am in over my head. Any help would be much appreciated.
Edit:
I figured that a third CDbCriteria that includes both the in stock and out of stock items could be used for the pagination (as it would include the same number of products as the combined results of the first 2). So I tried adding this (criteria1 and criteria2 remain the same):
$criteria3=new CDbCriteria;
$criteria3->alias = 'Product';
//$criteria3->addCondition('(inventory_avail>0 OR inventoried=0)');
$criteria3->addCondition('Product.parent IS NULL');
$criteria3->addCondition('web=1');
$criteria3->addCondition('current=1');
$criteria3->addCondition('sell>sell_web');
//$criteria3->order = 'RAND()';
$crit1=Product::model()->findAll($criteria);
$crit2=Product::model()->findAll($criteria2);
$models=array_merge($crit1,$crit2);
$count=Product::model()->count($criteria3);
$pages=new CPagination($count);
//results per page
$pages->pageSize=30;
$pages->applyLimit($criteria3);
$crit1=Product::model()->findAll($criteria);
$crit2=Product::model()->findAll($criteria2);
$models=array_merge($crit1,$crit2);
$this->render('index', array(
'models' => $models,
'pages' => $pages
));
I'm sure I'm missing something super obvious here... Been searching all day getting nowhere.

So you are running into what is IMO one of the potential drawbacks of natural language query builder frameworks. They can get your thinking on how you might approach a SQL problem going down a bad path when trying to work with the "out of the box" methods for building queries. Sometimes you might need to think about using raw SQL query capabilities that most every framework to provide in order to best address your problem.
So let's start with the basic SQL for how I would suggest you approach your problem. You can either work this into your query builder style (if possible) or make a raw query.
You could easily form a calculated field representing binary inventory status for sorting. Then also sort by another criteria secondarily.
SELECT
field1,
field2,
/* other fields */
IF(inventory_avail > 0, 1, 0) AS in_inventory
FROM product
WHERE /* where conditions */
ORDER BY
in_inventory DESC, /* sort items in inventory first */
other_field_to_sort ASC /* other sort criteria */
LIMIT ?, ? /* pagination row limit and offset */
Note that this approach only returns the rows of data you need to display. You move away from your current approach of doing a lot of work in the application to merge record sets and such.
I do question use of RAND() for pagination purposes as doing so will yield products potentially appearing on one page after another as the user paginates through the pages, with other products perhaps not showing up at all. Either that or you need to have some additional complexity added to your applicatoin to somehow track the "randomized" version of the entire result set for each specific user. For this reason, it is really unusual to see order randomization for paginated results display.
I know you mentioned you might like to spike out a randomized view to the user on a "first page". If this is a desire that is OK, but perhaps you decouple or differentiate that specific view from a wider paginated view of the product listing so as to not confuse the end user with a seemingly unpredictable pagination interface.
In your ORDER BY clause, you should always have enough sorting conditions to where the final (most specific) condition will guarantee you a predictable order result. Oftentimes this means you have to include an autoincrementing primary key field, or similar field that provides uniqueness for the row.
So let's say for example I had the ability for user to sort items by price, but you still obviously wanted to show all inventoried items first. Now let's say you have 100K products such that you will have many "pages" of products with a common price when ordered by price
If you used this for ordering:
ORDER BY in_inventory DESC, price ASC
You could still have the problem of a user seeing the same product repeated when navigating between pages, because a more specific criteria than price was not given and ordering beyond that criteria is not guaranteed.
You would probably want to do something like:
ORDER BY in_inventory DESC, price ASC, unique_id ASC
Such that the order is totally predictable (even though the user may not even know there is sorting being applied by unique id).

Related

Trying to sort comment by the newest sub comment - Laravel

I'm trying to sort by the newest messages first, including the sub-messages. For example, if someone adds a sub-comment to an old message at the bottom, it would be moved to the top.
Here is the code I'm using:
$comments = Comment::where('discussion_id' , $info['id'])->where('parent_id', 0)->orderby('id' , 'desc')->paginate(6);
if parent_id is 0, it is not a sub-comment.
It's going to be a bit tricky to produce a query that will work well for you. Especially if you have to support a deeply nested hierarchy of comments (for example comments on Reddit).
Usually comments sections are loaded very often so even if you write this query it will be slow. For that reason, I suggest a solution with an additional table field followed by additional code when you're inserting new comments.
Add a new timestamp field to the comments table and call it touched_on. Every time you insert a new sub-comment update the touched_on of every parent comment in the hierarchy. Now you can edit your code like this:
$comments = Comment::where('discussion_id' , $info['id'])->where('parent_id', 0)->orderBy('touched_on' , 'desc')->paginate(6);
If your comments are not editable then you can even use updated_on column for this but that depends on your requirements.
With this approach, your inserts will be a bit slower but your query for loading comments is super simple and fast. I would take that trade-off any day. Also, if you really need deeply nested comment threads I would look into more robust structures (nested sets maybe) than only using parent_id.
Just order by created_at column.
$comments = Comment::where('discussion_id' , $info['id'])->where('parent_id', 0)->orderby('created_at' , 'desc')->paginate(6);

Query ActiveRecord for records and relation calculations at once

TL;DR? See Edit 2
I've got a little Rails application that has a few different sort of games people can play: it's based around sports, so they can pick the winners of each game every week (model PickEm, attribute correct boolean with nil for unfinished games), and predict the outcome of a specific team's game (model Guess, attribute score with integer, nil for unfinished games). Every User has_many PickEms and Guesses. And I'm trying to display standings (correct/total - total being all non-nil, score/total possible).
What I'm finding is that I can gather the users and their associated records, but in trying to display standings I'm discovering that every single User is triggering another query - slow and not sustainable as the user base increases. That's because #user.pick_em_score is pick_ems.where(correct: true).size and #user.guess_Score is guesses.where.not(score: nil).sum(:score). So I call user.pick_em_score and it runs that query. I feel like there should be a way to get every User, as well as these specific counts, at once, rather than buffering a whole bunch of needless extra stuff.
What I need:
User record
User.pick_em_score (calculated by counting correct records)
User.pick_ems count where NOT NULL
User.guesses_score (calculated by guesses.sum(:score))
User.guesses count where NOT NULL
Most of the stuff I find on Rails's ActiveRecord helpers, especially related to calculations, is for retrieving only the calculation. It looks like I'll probably need to delve directly into select() etc. But I can't get it working. Can someone point me in the right direction?
Edit
For clarification: I'm aware that I can write this information to the User model, but this is overly restrictive: next season, I'll need to add a new column to the User for that year's results, etc. In addition, this is a third degree of callback updating related models – the Match model already updates related PickEms and Guesses on save. I'm looking for the simplest ActiveRecord query or queries to be able to work with this information, as indicated by the title. Ideally one query that returns the above information, but if it needs to a few, that's OK.
I used to work directly in MySQL with PHP, but those skills have rusted (in raw MySQL, I imagine, I'd have several sub-select statements to help pull these counts) and I'd also like to be able to use Rails's ActiveRecord helpers and such, and avoid constructing raw SQL as much as possible.
Second Edit:
I seem to have it down to one call that starts to work, but I'm writing a lot of SQL. It's also brittle, IMO, and trying to run with it has failed. It also looks like I'm just pushing the million singular SELECT queries from Rails right into SQL, but that may still be a step up.
User.unscoped.select('users.*',
'(SELECT COUNT(*) FROM pick_ems WHERE pick_ems.user_id = users.id AND pick_ems.correct) AS correct_pick_ems',
'(SELECT COUNT(*) FROM pick_ems WHERE pick_ems.user_id = users.id AND pick_ems.correct IS NOT NULL) AS total_pick_ems',
'(SELECT SUM(guesses.score) FROM guesses WHERE guesses.user_id = users.id AND guesses.score IS NOT NULL) AS guesses_score',
'(SELECT COUNT(*) FROM guesses WHERE guesses.user_id = users.id AND guesses.score IS NOT NULL) AS guesses_count' )
The issue seems to be: is there a way to use Rails, and not raw SQL, to link up users.id that we see there with these subqueries? Or just … a better way to construct this, in general?
In addition, I'm running another set of SELECTs for the WHERE, which would hinge on total_pick_ems and guesses_count being > 0 but since I can't use those aliased columns, I have to call the SELECT one more time.
Welcome to AR. Its really only good for simple CRUD like queries. Once you actually want to query your data in anger it just doesn't have the capababilities to do the queries you want without resorting to wholesale SQL strings and often abandoning the ability to chain as a result.
Its precisely why I moved to Sequel as it does have the features to compose queries using a much fuller SQL feature set, including join conditions, window functions, recursive common table expressions, and advanced eager loading. The author is incredibly responsive and documentation is excellent compared to AR and Arel.
I don't expect you will like this answer but a time will come when you will start to look outside the opinionated components that come with rails which I have to say are hardly best of breed. Sequel also sped my application up many times over what I was able to get with AR as well, it not just developer happiness, it means less servers to run. Yes it will be a learning curve but IMO its better to learn tools that have your back covered.
Joins might work. Smthing like below
User.unscoped.joins(:guesses).joins(:pick_ems).
where("guesses.score IS NOT NULL").
select("users.*,
sum(guesses.score) as guesses_score,
count(guesses.id) as guesses_count,
count(case when pick_ems.correct = True then 1 else null end)
as correct_pick_ems,
count(case when pick_ems.correct != null then 1 else null end)
as total_pick_ems,
").
group("users.id")
If you need this information for a limited number of users at a time then above query or eager loading (User.includes(:guesses, :pick_ems)) with class methods like
def correct_pick_ems
pick_ems.count(&:correct)
end
would work.
However If you need this information for all the users most of the time, cached counters within the users table would be more optimal.
What you need is some sort of custom (smart) counter_cache to count only at certain conditions (e.g correct is true)
You can achive this using conditional after_save & after_destroy triggers to build your own custom counter_cache that looks like this:
class PickEm
belongs_to :user
after_save :increment_finished_counter_cache, if: Proc.new { |pick_em| pick_em.correct }
after_destroy :decrement_finished_counter_cache, if: Proc.new { |pick_em| pick_em.correct }
private
def increment_finished_counter_cache
self.user.update_column(:finished_games_counter, self.user.finished_games_counter + 1) #update_column should not trigger any validations or callbacks
end
def decrement_finished_counter_cache
self.user.update_column(:finished_games_counter, self.user.finished_games_counter - 1) #update_column should not trigger any validations or callbacks
end
end
Notes:
Code not tested (only to show the idea)
Some guys said it's better to avoid naming custom counters as rails name them (foo_counter_cache)
You should benchmark it, but my hunch is that adding all of that data into a single SELECT isn't going to be much faster than breaking it up into separate SELECTs (I've actually had cases where the latter was faster). By breaking it up, you can also stick to more ActiveRecord and less raw SQL, e.g.:
user_ids_to_pick_em_score = User.joins(:pick_ems).where(pick_ems: {correct: true}).group(:user_id).count
user_ids_to_pick_ems_count = User.joins(:pick_ems).where.not(pick_ems: {correct: nil}).group(:user_id).count
user_ids_to_guesses_score = Hash[User.select("users.id, SUM(guesses.score) AS total_score").joins(:guesses).group(:user_id).map{|u| [u.id, u.total_score]}]
user_ids_to_guesses_count = User.joins(:guesses).where.not(guesses: {score: nil}).group(:user_id).count
Edit: To display them, you could do like so:
<%- User.select(:id, :name).find_each do |u| -%>
Name: <%= u.name %>
Picks Correct: <%= user_ids_to_pick_em_score[u.id] %>/<%= user_ids_to_pick_ems_count[u.id] %>
Total Score: <%= user_ids_to_guesses_score[u.id] %>/<%= user_ids_to_guesses_count[u.id] %>
<%- end -%>

Top level MySQL statistics

Have not been able to find any information on this, I could do this in its own but I feel keeping it in the query might be the best option, if its possible.
Basically I want to try to add a top level "statistics" portion of a query.
So when I get the results I will see it like so
num_rows = 900
distinct_col = 9
results = array()
This way I can loop the results normally, and then pull out information that I would only need once outside of it. Is this possible?
EDIT:
I am not looking for the normal mysql statistics like num_rows exactly. But in a case where lets say you limit the results to ten, num_rows would return 10, but you want total results, so 900. In most cases I would just use another query and look just for the amount, however combining it all into one query logically seems faster for me. There is also more then just the num_rows I may need, say they are all products and have a specific category, I would need to count the amount of categories all items fall under. So looping the raw results when there is only one result for those columns is sillyness.
EDIT 2:
To clarify further I need to get some counts on some columns, and maybe a min-max result on a join. Having it return on every loop would work, but the same exact return uselessly returning on every loop when its only needed once does not seem logical. I am no MySQL expert and am mainly just trying to make sure I come up with the most logical and fastest method to get the required data.
Here is a PHP return example:
array(
[num_rows] => 900,
[categories] => 9,
[min_price] => 400,
[max_price] => 900,
[results] => array(
[0] => //row array
[1] -> //row array
)
);
Mysql returns its default num rows before you "fetch" the results, having custom results added there may be sufficient.
Dunno why do you need that but that's very easy to get
Assuming you are using safeMysql (though you can use whatever way to get data into array)
$results = $db->getAll("SELECT * FROM t");
$num_rows = count($results);
$num_cols = count($results[0]);
that's all
I am mainly just trying to make sure I come up with the most logical and fastest method to get the required data.
Yes, you are.
Nothing wrong with getting aggregated data with every loop.
As for the count beyond LIMITs - when you need it, you can use mysql's SQL_CALC_FOUND_ROWS / FOUND_ROWS() feature

Codeigniter database call efficiency

This is an efficiency/best practice question. Hoping to receive some feed back on performance. Any advice is greatly appreciated.
So here is a little background in what i have setup. I'm using codeigniter, the basic setup is pretty similar to any other product relationships. Basic tables are: Brands, products, categories. On top of these tables there is a need for install sheets, marketing materials, and colors.
I created some relationship tables:
Brands_Products
Products_Colors
Products_Images
Products_Sheets
I also have a Categories_Relationships table that holds all of the relationships to categories. Install sheets etc can have their own categories but i didn't want to define a different category relationship table for each type because i didn't think that would be very expandable.
On the front end I am sorting by brands, and categories.
I think that covers the background now to the efficiency part. I guess my question pertains mostly to weather it would be better to use joins or to make separate calls to return individual parts of each item (colors, images, etc)
What I currently have coded is working, and sorting fine but I think i can improve the performance, as it take some time to return the query. Right now its returning about 45 items. Here is my first function it grabs all the products and its info.
It works by first selecting all the products and joining it's brand information. then looping through the result i set up the basic information, but for the categories images and installs i am using functions that returns each of respected items.
public function all()
{
$q = $this->db
->select('*')
->from('Products')
->join('Brands_Products', 'Brands_Products.product_id = Products.id')
->join('Brands', 'Brands.id = Brands_Products.brand_id')
->get();
foreach($q->result() as $row)
{
// Set Regular Data
$data['Id'] = $row->product_id;
$data['Name'] = $row->product_name;
$data['Description'] = $row->description;
$data['Brand'] = $row->brand_name;
$data['Category'] = $this->categories($row->product_id);
$data['Product_Images'] = $this->product_images($row->product_id);
$data['Product_Installs'] = $this->product_installs($row->product_id);
$data['Slug'] = $row->slug;
// Set new item in return object with created data
$r[] = (object)$data;
}
return $r;
}
Here is an example of one of the functions used to get the individual parts.
private function product_installs($id)
{
// Select Install Images
$install_images = $this->db
->select('*')
->where('product_id', $id)
->from('Products_Installs')
->join('Files', 'Files.id = Products_Installs.file_id')
->get();
// Add categories to category object
foreach($install_images->result() as $pImage)
{
$data[] = array(
'id' => $pImage->file_id,
'src' => $pImage->src,
'title' => $pImage->title,
'alt' => $pImage->alt
);
}
// Make sure data exists
if(!isset($data))
{
$data = array();
}
return $data;
}
So again really just looking on advice on what is the most efficient, best practice way of doing this. I really appreciate any advice, or information.
I think your approach is correct. There are only a couple of options: 1) load your product list first, then loop, and load required data for each product row. 2) create a big join on all tables first, then loop through (possibly massive) cartesian product. The second might get rather ugly to parse. For example, if you got Product A and Product B, and Product A has Install 1, Install 2, Install 3, and product B has Install 1, and Install 2,t hen your result is
Product A Install 1
Product A Install 2
Product A Install 3
Product B Install 1
Product B Install 2
Now, add your images and categories to the join and it might become huge.
I am not sure what the sizes of your tables are but returning 45 rows shouldn't take long. The obvious thing to ensure (and you probably did that already) is that product_id is indexed in all tables as well as your brands_products tables and others. Otherwise, you'll do a table scan.
The next question is how you're displaying your data on the screen. So you're getting all products. Do you need to load categories, images, installs when you're getting a list of products? If you're simply listing products on the screen, you might want to wait to load that data until user picks a products they are viewing.
On a side note, any reason you're converting your array to object
$r[] = (object)$data;
Also, in the second function, you can simply add
$data = array();
before the foreach, instead of
// Make sure data exists
if(!isset($data))
{
$data = array();
}
You can try this:
Query all of the products
Get all of the product IDs from step 1
Query all of the install images that has a product ID from step 2, sorted by product ID
Iterate through the products from step 1, and add the results from step 3
That takes you from 46 queries (for 45 products) to 2 queries, without any additional joins.
You can also use CodeIgniter's Query Caching to increase performance even further, if it's worth the time to write the code to reset the cache when data is updated.
Doing more work in PHP is generally better than doing the work in MySQL in terms of scalability. You can scale PHP easily with load balancers and web servers. Scaling MySQL isn't as easy due to concurrency issues.

CakePHP Find - Order By String-To-Int?

I want to use CakePHP to pull an array of photos from a database, sorted by photo title (0, 1, 2, 3...) My query currently looks something like:
$ss_photos = $this->Asset->find('all',array(
'conditions'=>array('kind'=>'photo'),
'order'=>'title'
));
Unfortunately the titles seem to be in string format, leading to an undesirable sort order (2.jpg after 19.jpg, etc). Is there a quick way to cast 'title' as an int for ordering purposes within a Cake query of this type?
Not sure if this is "recommended practice", but on a first pass it seems to work:
$ss_photos = $this->Asset->find('all',array(
'conditions'=>array('kind'=>'photo'),
'order'=>'Asset.title + 0'
));
Any opinions?
The solution is to create a hidden column which is responsible for orders in your example image names should be: 00002.jpg, 00019.jpg - this way the order will work properly.
If the results are not too many, I think it's easier to sort them in PHP (if you use it of course :)) See this natsort() you just need to extract a list of images and to sort them.