In MySQL it is possible to limit the number of records affected by an update query. In an ideal world this should not be necessary, but having such a limit does in some cases help save your bacon :)
I'd have thought that in SQLAlchemy it can be achieved by something like:
tgt_meta.tables['ps_product'].update(tgt_meta.tables['ps_product'].c.id_product == product_id).values(**upd_product_values).limit(1)
But apparently this is not so.
AttributeError: 'Update' object has no attribute 'limit'
Is there something else that I can try?
The Mysql dialect has this thrown in as update(..., mysql_limit=x)
https://docs.sqlalchemy.org/en/latest/dialects/mysql.html#mysql-sql-extensions
Related
I am trying to sort my data according to timestamp field in my controller, note that the timestamp field may be null and may have some value. I wrote the following query.
#item = Item.sort_by(&:item_timestamp).reverse
.paginate(:page => params[:page], :per_page =>5)
But this gives error when I have items that have time_timestamp field value as NULL, but following query works.
#item = Item.order(:item_timestamp).reverse
.paginate(:page => params[:page], :per_page => 5)
Can anybody tell the difference between these two queries, and in which condition to use which one?
And I am using order and reverse to get the latest items from the database, Is this the best way or there are other best ways to get the latest data from database in terms of performance?
.sort_by is a Ruby method from Enumerable that is used to sort arrays (or array like objects). Using .sort_by will cause all the records to be loaded from the database into the servers memory, which can lead to serious performance problems (as well as your issue with nil values).
.order is a ActiveRecord method that adds a ORDER BY clause to the SQL select statement. The database will handle sorting the records. This is preferable in 99% of cases.
sort_by is executed in Ruby, so if you have a nil value, things will break in a way similar to this:
[3, nil, 1].sort
#=> ArgumentError: comparison of Fixnum with nil failed
order is executed by your RDBMS, which generally will do fine with NULL values. You can even specify where you want to put the NULL VALUES, by adding NULL FIRST (usually the default) or NULL LAST to your ORDER BY clause?
Hey you needn't you sort in that query, it'll work very long, if you work with DB you should always use :order, there solution for your problem
#item = Item.order('item_timestamp DESC NULLS LAST').paginate(:page => params[:page], :per_page => 5)
As it was said before me, .order is quicker, and it's enough in most cases, but sometimes you need sort_by, if you want to sort by value in a relation for example.
If you have a posts table and a view_counters table, where you have the number of views by article, you can't easily sort your posts by total views with .order.
But with sort_by, you can just do:
posts = #user.posts.joins(:view_counter)
#posts = posts.sort_by { |p| p.total_views }
.sort_by going to browse each element, get the relation value, then sort by the value of this relation, just with one code line.
You can further reduce the code with &:[attributeName], for example:
#posts = posts.sort_by(&:total_views)
Also, for your last question about the reverse, you can do this:
Item.order(item_timestamp: :desc)
When you use sort_by you break active record caching and as pointed out before, you load all the records into RAM memory.
When writing down queries, please always think about the SQL and the memory world, they are 2 separate things. It is like having an archive (SQL) and cart (Memory) where you put the files you take out of the archive to use later.
As most people mentioned the main difference is sort_by is a Ruby method and order is Rails ActiveRecord method. However, the scenario where to use them may vary case by case. For example you may have a scenario where sort_by may be appropriate if you already retrieved the data from the DB and want to sort on the loaded data. If you use order on then you might introduce n+1 issue and go to the database again while you already have the data loaded.
The project that I'm working on is using MySQL on RDS (mysql2 gem specifically).
When I use a hash of conditions including a range in a where statement I'm getting a bit of an odd addition to my query.
User.where(id: [1..5])
and
User.where(id: [1...5])
Result in the following queries respectively:
SELECT `users`.* FROM `users` WHERE ((`users`.`id` BETWEEN 1 AND 5 OR 1=0))
SELECT `users`.* FROM `users` WHERE ((`users`.`id` >= 1 AND `users`.`id` < 5 OR 1=0))
The queries work perfectly fine since OR FALSE is effectively a no-op. I'm just wondering why Rails or ARel is adding this snippet into the query.
EDIT
It looks like the line that could explain this is line 26 in ActiveRecord::PredicateBuilder. Still no idea how the hash could be empty? at that point but maybe someone else does.
EDIT 2
This is intersting. I was looking into Filip's comment to see why he made it since it seems just like a clarification but he is correct that 1..5 != [1..5]. The former is an inclusive range from 1 to 5 where as the latter is an array whose first element is the former. I tried putting these into an ARel where call to see the SQL produced and the OR 1=0 is not there!
User.where(id: 1..5) #=> SELECT "users".* FROM "users" WHERE ("users"."id" BETWEEN 1 AND 5)
User.where(id: 1...5) #=> SELECT "users".* FROM "users" WHERE ("users"."id" >= 1 AND "users"."id" < 5)
While I still do not know why ARel is adding the OR 1=0 which will always be false and seemingly unnecessary. It may be due to how Arrays and Ranges are handled differently.
Building on the fact, which you've discovered, that [1..5] is not the correct way to specify the range... I have discovered why [1..5] behaves as it does. To get there, I first found that an empty array in a hash condition produces the 1=0 SQL condition:
User.where(id: []).to_sql
# => "SELECT \"users\".* FROM \"users\" WHERE 1=0"
And, if you check the ActiveRecord::PredicateBuilder::ArrayHandler code, you'll see that array values are always partitioned into ranges and other values.
ranges, values = values.partition { |v| v.is_a?(Range) }
This explains why you don't see the 1=0 when using non-range values. That is, the only way to get 1=0 from an array without including a range is to supply an empty array, which yields the 1=0 condition, as shown above. And when all the array has in it is a range you're going to get the range conditions (ranges) and, separately, an empty array condition (values) executed. My guess is that there isn't a good reason for this... it just simply is easier to let this be than to avoid it (since the result set is equivalent either way). If the partition code was a bit smarter then it wouldn't have to tack on the additional, empty values array and could skip the 1=0 condition.
As for where the 1=0 comes from in the first place... I think that comes from the database adapter, but I couldn't find exactly where. However, I would call it an attempt to fail to find a record. In other words, WHERE 1=0 isn't ever going to return any users, which makes sense over alternative SQL like WHERE id=null which will find any users whose id is null (realizing that this isn't really correct SQL syntax). And this is what I'd expect when attempting to find all Users whose id is in the empty set (i.e. we're not asking for nil ids or null ids or whatever). So, in my mind, leaving the bit about exactly where 1=0 comes from as a black box is OK. At least we now can reason about why the range inside of the array is causing it to show up!
UPDATE
I've also found that, even when using ARel directly, you can still get 1=0:
User.arel_table[:id].in([]).to_sql
# => "1=0"
This is strictly speaking a guess, since I did something similar in a project of my own (although I used AND 1).
For whatever reason, when generating a query, it is easier to always have a WHERE clause containing a no-op than it is to conditionally generate the WHERE clause at all. That is, if you don't include any where sections it will end up generating something still valid.
On the other hand, I'm not sure why it's taking this form: when I did it I use 1 [<AND (generated code)>...] it allowed arbitrary chaining, but I don't see how what you're seeing would allow it. None the less, I still think it likely to be a result of an algorithmic code generation scheme.
Check to see if you are using active_record-acts_as. That was the problem with me.
Add the line below to your Gemfile:
gem 'active_record-acts_as', :git => 'https://github.com/hzamani/active_record-acts_as.git'
This will just pull the latest version of the Gem that will hopefully be fixed. Worked for me.
I think you're seeing side effects of ruby personally.
I think the better way to do what you're doing would be with
2.0.0-p481#meri :008 > [*1..5]
=> [1, 2, 3, 4, 5]
User.where(id: [*1..5]).to_sql
"SELECT `users`.* FROM `users` WHERE `users`.`id` IN (1, 2, 3, 4, 5)"
As this creates an Array vs an Array with element 1 of class Range.
OR
use an explicit Range to trigger the BETWEEN in AREL.
# with end element, i.e. exclude_end=false
2.0.0-p481#meri :013 > User.where(id: Range.new(1,5)).to_sql
=> "SELECT `users`.* FROM `users` WHERE (`users`.`id` BETWEEN 1 AND 5)"
# without end element, i.e. exclude_end=true
2.0.0-p481#meri :022 > User.where(id: Range.new(1, 5, true)).to_sql
=> "SELECT `users`.* FROM `users` WHERE (`users`.`id` >= 1 AND `users`.`id` < 5)"
If you care about having control of the queries you generate and the full power of the SQL language and database features then I would suggest moving from ActiveRecord/Arel to Sequel.
I can honestly say there are a lot more quirks and infuriating times ahead for you with ActiveRecord, especially when you move beyond simple crud like queries. When you start trying to query your data in anger, perhaps needing to join a few join tables here and there and realize you really do need join conditions or union all type queries.
It is also significantly faster and more reliable in its query generation and result handling and much easier to compose the queries you want. It also has real documentation you can actually read unlike arel.
I just wish I had discovered it much earlier rather than persisting with the rails default data access layer.
Delving into the documentation and the api, I seem to be missing how to update one field in multiple rows at once.
Something like
Table.select(:field).update("update to this").where(id: 4,5,6)
would be nice.
Does something like this exist? It would be much better than having to store everything in an array, set it to a value, and calling save every time.
You can use the update_all method, for example:
Table.update_all("field = 'update to this'", ["id in (?)", ids])
I am using DBIx::Class and I would like to only update one row in my table. Currently this is how I do it:
my $session = my_app->model("DB::Session")->find(1);
$session->update({done_yn=>'y',end_time=>\'NOW()'});
It works, but the problem is that when it does find to find the row, it does this whole query:
SELECT me.id, me.project_id, me.user_id, me.start_time, me.end_time, me.notes, me.done_yn FROM sessions me WHERE ( me.id = ? ): '8'
Which seems a bit much when all I want to do is update a row. Is there anyway to update a row without having to pull the whole row out of the database first? Something like this is what I am looking for:
my_app->model("DB::Session")->update({done_yn=>'y',end_time=>\'NOW()'},{id=>$id});
Where $id is the WHERE id=? part of the query. Does anyone know how to do this? Thanks!
You can run update on a restricted resultset which only matches this single row:
my_app->model("DB::Session")->search_rs({ id=> 1 })->update({done_yn=>'y',end_time=>\'NOW()'});
I suggest you use a DateTime->now object instead of literal SQL for updating the end_time column because it uses the apps servers date and time instead of the database servers and makes your schema more compatible with different RDBMSes.
Do you have a check if the row was found to prevent an error in case it wasn't?
You might want to use update_or_create instead.
You could use the "columns" attribute:
my $session = my_app->model("DB::Session")->find(1, {columns => "id"});
I have a query that's returning a LOT of results and my code is running out of memory trying to parse the results... how can I run a query in CakePHP and just get normal results?
By parsing it I mean....
SELECT table1.*, table2.* FROM table1 INNER JOIN table2 ON table1.id = table2.table1_id
With the above query it'll return....
array(
0 => array(
'table1' => array(
'field1' => value,
'field2' => value
),
'table2' => array(
'field1' => value,
'field2' => value
)
)
)
When it parses those results into nested arrays is when it's running out of memory.... how do I avoid this?
I couldn't hate CakePHP any more than I do right now :-\ If the documentation was decent that would be one thing, but it's not decent and it's functionality is annoying.
you could do:
$list = $this->AnyModel->query("SELECT * FROM big_table");
but i dont think that will solve your problem, because if you have, for exemple, 10millon rows.. php wont be able to manage an array of 10millon values...
but you might want to read this two links to change the execution time and the memory limit.. you could also change them on your php.ini
Good Luck!
EDITED
hmm thanks to your question i've learned something :P First of all, we all agree that you're receiving that error because Cake executes the query and tries to store the results in one array but php doesn't support an array that big so it runs out of memory and crashes.. I have never used the classic mysql_query() (i prefer PDO) but after reading the docs, it seems that mysql_query stores the results inside a resource therefore, it's not loading the results on memory, and that allows you to loop the results (like looping though a big file). So now i see the difference... and your question is actually, this one:
Can I stop CakePHP fetching all rows for a query?
=) i understand your frustration with cake, sometimes i also get frustrated with it (could you believe there's no simple way to execute a query with a HAVING clause?? u_U)
Cheers!
I'd suggest you utilize the Containable behavior on your model. This is the easiest way to control the amount of data that's returned. I've confident that this is precisely what you need to implement.
CakePHP :: Containable :: Core Behaviors
You should limit the rows returned from your query (like 500 rows) and allow the user to fetch more rows when needed (next 500 rows at a time). You could do that nicely with the pagination component and a little AJAX.