Trying to do a simple Model.all.page(1)
But whenever .page is called, it creates a SQL COUNT call. (My actual code is more complex than above, but simplified for ease of reading.) Is there a way to prevent .page from calling a SQL count? I'm dealing with millions of products and this calls is making the page refresh take an extra 2 seconds to load. I already have my own custom count which is instant so I don't need this .page count.
Edit: Def not using .all. Bad example sorry.
Heres a really simple example that is basically my code in a nutshell:
Product.limit(1).page(1)
With my real code SQL produces: (1495.3ms) SELECT COUNT(*) FROM 'products' LEFT OUTER JOIN...
I have joins on the products table that I don't need to be counted, hence the fact I have my own count methods I want to use and don't need .page to produce it's own count.
When you call Model.all.page(1) you are getting back an array instead of an ActiveRecord relation.
Try just calling Model.page(1) and you should get what you want... If what you want is:
Model.page(1)
# results in SELECT "models".* FROM "models" LIMIT 30 OFFSET 0
Edit:
So the issue ended up being in the will_paginate gem as it was calling count on the query to know the total number of entries so it can get an accurate number of pages. However will_paginate does provide an option to the paginate method which allows you to pass in a custom total_entries count which is useful if you have a massive table and don't care to get the precise number of pages for every record that matches the query.
You can pass in the option like so:
Model.paginate(:page => params[:page], :per_page => 30, :total_entries => 100)
You are concerned about doing a COUNT query, yet you are selecting ALL records from your database by doing Model.all? Are you joking right now?
Also you need to provide code in order to get help. We cant read your mind, we cant make up what code you might have. Especially when you say "my actual code is more complex than above". Don't try to simplify issues or hide code that you THINK is irrelevant.
What does your code look like? What does your log look like, specifically query time and total page time (rendering and ActiveRecord split out). You need to give more information.
Related
I'm working with Yii2 Relational Database / Active Query Models and I ran into an issue trying to use the magic method getModelName() with $this->hasMany()->viaTable() to set the relation while trying to sort by a sort column in my junction table.
First I tried to just add my ->orderBy() clause to the main query:
return $this->hasMany(Category::class,
['id' => 'categoryId'])
->viaTable('{{kits_x_categories}}',
['kitId' => 'id'])
->orderBy('{{kits_x_categories}}.sort asc');
That didn't work as expected and upon further digging I found out that this results in two separate queries, the first one selects my category Ids into an array, then uses said array for a WHERE IN() clause in the main (2nd) query to get the actual models that are related.
My first thought was to use the 3rd function($query) {} callback parameter of the ->viaTable() call and putting my $query->orderBy() clause on there:
return $this->hasMany(Category::class,
['id' => 'categoryId'])
->viaTable('{{kits_x_categories}}',
['kitId' => 'id'],
function($query) {
return $query->orderBy('{{kits_x_categories}}.sort asc');
}
);
However, all that did was return the category ID's in my desired order but ultimately had no effect on the main query that does the IN() condition with said ids since the order of the ids in the IN() condition have no effect on anything.
Finally, I ended up with this which lets it do what it wants, but then forces in my join to the main query with the IN() condition so that I can have the main query sort by my junction table sort column. This works as desired:
return $this->hasMany(Category::class,
['id' => 'categoryId'])
->viaTable('{{kits_x_categories}}',
['kitId' => 'id'])
->leftJoin('{{kits_x_categories}}', '{{kits_x_categories}}.categoryId = {{categories}}.id')
->where(['{{kits_x_categories}}.kitId' => $this->id])
->orderBy('{{kits_x_categories}}.sort asc');
This results in 2 queries.
First the query gets the category ids from the join table:
SELECT * FROM `kits_x_categories` WHERE `kitId`='49';
Then the main query with the IN() condition and my forced join for sort:
SELECT `categories`.* FROM `categories`
LEFT JOIN `kits_x_categories` ON `kits_x_categories`.categoryId = `categories`.id
WHERE (`kits_x_categories`.`kitId`='49') AND (`categories`.`id` IN ('11', '7', '9'))
ORDER BY `kits_x_categories`.`sort`
So here is my actual question... This seems largely inefficient to me but I am by no means a database/SQL god so maybe I just don't understand fully. What I want is to understand.
Why does Yii do this? What is the point of making one query to get the IDs first, then making another query to get the objects based on the ids of the relation? Wouldn't it be more efficient to just do a regular join here? Then, in my opinion, sorting by a junction sort column would be intuitive rather than counter-intuitive.
The only thing I can think of is has to do with the lazy vs eager loading of data, maybe in lazy in only gets the IDs first then when it needs to load the data it pulls the actual data using IN()? If I used joinWith() instead of viaTable() would that make any difference here? I didn't dig into this as I literally just thought of that as I was typing this.
Lastly, In this scenario, There is only going to be a few categories for each kit so efficiency is not a big deal but I'm curious are there any performance implications in my working solution if I were to use this in the future on a different model set that could have thousands+ of relations?
Yii 2 does that:
To support lazy loading.
To support cross-database relations such as MySQL -> Redis.
To reduce number of edge-cases significantly so internal AR code becomes less complicated.
3rd party software is usually designed to get you started with databases. But then they fall apart when the app grows. This means that you need to learn the details of the underlying database in addition to the details of the layer.
Possibly this specific issue can be solved by improving the indexes on the many-to-many table with the tips here: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
This, of course, depends on whether the layer lets you tweak the schema that it created for you.
If there is a way to write "raw" SQL, that might let you get rid of the 2-step process, but you still need to improve the indexes on that table.
I have a quite complex view with two queries (a view in a view), one select users with related data and another one select orders with related data. Both of them have some filters, but now I have an issue and I am looking for proper and just decent solution, with good performance because I have a lot of data and relationships in the queries.
Assume I have:
Query 1 - Select user data, some left joins to other tables, conditions depends on provided parameters.
Query 2 - Select orders depends on users from Query 1, many joins, conditions depends on parameters.
I display data from two queries in one view, users, their data, orders, and some orders data and now I want to implement pager, but it has to work and display proper number of users depends on filters form Query 1 and Query 2. So there is an issue that I can't really limit from any query cuz another one has filters as well so maybe those users maybe aren't really selected to display depends on other query filters.
So I guess there are two ways, one is to put those queries in loop and collect data until I get proper number of results depends on query.
Another way is to merge those two queries into one, but there an issue that I get many rows per user, so I can't set any page limit and get results only for specific number of users, like for example 30. Because results will be like user 1 => order 1, user 1 => order 2, so is there any way to get specific number of unique results depends on user id or something.
Let me know if you have any questions.
Sample data will make more sense. I am unable to understand the whole requirement here in your question. will you be able to create some sample data and share with us ? if you are dealing with a lot of data, avoid loops as that will just make performance worse.
I have Order model, that has_many :order_operations. OrderOperation create all the time, when order state is changed. I want to show all OrderOperations.created_at for my orders without creating new queries. I used MySQL.
class Order < ActiveRecord::Base
has_many :order_operations
def change_state new_state
order_operations.create to_state: new_state
end
def date_for_state state_name
order_operations.where(state: state_name).pluck(:created_at).last
end
end
I know about includes and joins methods, but on calling date_for_state always run new query. Even I remove where and pluck query will perform too.
I have only one idea to create service object for this.
When you do a join/include, it caches the results of doing a particular query: specifically, a query to get all of the order_operations associated with the order.
If you had loaded #order, eager-loading the associated order_operations, and did #order.order_operations, then Rails has cached the associated order_operations as part of the include, and doesn't need to load them again.
However, if you do #order.order_operations.where(state: state_name).pluck(:created_at).last, this is a different query than the one used in the include, so rails says "he's asking for some different stuff to the stuff i cached, so i can't use the cached stuff, i need to make another query". You might say "aha, but this will always just be a subset of the stuff you cached, so can't you just work out which of the cached records this applies to?", but Rails isn't that smart.
If you were to do
#order.order_operations.select{|oo| oo.state == state_name}.map(:created_at).last
then you're just doing some array operations and .order_operations will use the cached records, as it's the same query as the one you cached with includes, ie a straight join to all associated records. But, if you call it on an instance var #order that doesn't happen to have already eager-loaded the associated records then it will be much less efficient because it will do a much bigger query than the one you had originally.
In other words: if you want to get an efficiency gain by using includes then the parameters to your includes call needs to exactly match the association calls you will make subsequently on the object.
I have a model Item that has_many of another model Vote. I have a method for Item called vote_count:
def vote_count
self.votes.count
end
and I'm printing a list of the 10 Items with the most Votes:
Item.all.sort_by(&:vote_count).reverse.first(10)
This method works, but it's fairly slow, because the server has to load every single Vote in the database. Does anyone know of a more efficient way to do this?
I can't just create a column that is increased by 1 every time a Vote is created, because Votes are soft-deleted after 7 days with a default_scope.
Can you also express the logic of the soft delete in SQL?
The best thing here would be to use an ORDER BY in the database, whereby you GROUP BY ITEMS.ID and order by a SQL aggregate function ...
order("sum(case when vote.created_at >= ... then 1 else 0 end)")
Don't know MySQL syntax, but it can probably do something like this, and it would be much faster than a Rails method.
Your best bet is to do the work of sorting and limiting the records in the database, and then loading the final 10 results into rails after that. You might try something like this:
Item.order('vote_count desc').limit(10)
You might need to change 'desc' to 'asc' if I've incorrectly guessed what the reverse order should be.
How to get Sum of Difference of two fields in Django ORM?
I have a modal where user activities are mapped. I add Time when user login, and add time when user logs out.
I need the difference of these two fields and then all the instances together to get Total time spent by user on the site.
User.objects.filter(**user_kwargs).annotate(
visit_count=Count('visit_history'),
time_on_site=Avg('visit_history__time_on_site'),
).filter(visit_count__gt=0).order_by('-time_on_site')
How to find total time spent by user?
I think you cant do that with django orm.
Not without using at least raw query anyway.
Let me repeat things so you can verify, if i understood you correctly.
You want to select the sum of difference betweem model.field_a and model.field_b.
Django orm only allows you to select model fields, not their differences or sums. If you want to use django orm here, then create additonal model and use raw query and aggregate function. Raw query would be the one that creates ghost field, by having something like
SELECT ..., row.one_value - row.second_value AS row.somecolumn, ... FROM foo ...
But since you already need to get down and dirty with pure sql, then i suggest you skip the orm completely and use transactions and connections and just pure sql to get what you need.