CakePHP parsing query results alternative - mysql

I have a query that's returning a LOT of results and my code is running out of memory trying to parse the results... how can I run a query in CakePHP and just get normal results?
By parsing it I mean....
SELECT table1.*, table2.* FROM table1 INNER JOIN table2 ON table1.id = table2.table1_id
With the above query it'll return....
array(
0 => array(
'table1' => array(
'field1' => value,
'field2' => value
),
'table2' => array(
'field1' => value,
'field2' => value
)
)
)
When it parses those results into nested arrays is when it's running out of memory.... how do I avoid this?
I couldn't hate CakePHP any more than I do right now :-\ If the documentation was decent that would be one thing, but it's not decent and it's functionality is annoying.

you could do:
$list = $this->AnyModel->query("SELECT * FROM big_table");
but i dont think that will solve your problem, because if you have, for exemple, 10millon rows.. php wont be able to manage an array of 10millon values...
but you might want to read this two links to change the execution time and the memory limit.. you could also change them on your php.ini
Good Luck!
EDITED
hmm thanks to your question i've learned something :P First of all, we all agree that you're receiving that error because Cake executes the query and tries to store the results in one array but php doesn't support an array that big so it runs out of memory and crashes.. I have never used the classic mysql_query() (i prefer PDO) but after reading the docs, it seems that mysql_query stores the results inside a resource therefore, it's not loading the results on memory, and that allows you to loop the results (like looping though a big file). So now i see the difference... and your question is actually, this one:
Can I stop CakePHP fetching all rows for a query?
=) i understand your frustration with cake, sometimes i also get frustrated with it (could you believe there's no simple way to execute a query with a HAVING clause?? u_U)
Cheers!

I'd suggest you utilize the Containable behavior on your model. This is the easiest way to control the amount of data that's returned. I've confident that this is precisely what you need to implement.
CakePHP :: Containable :: Core Behaviors

You should limit the rows returned from your query (like 500 rows) and allow the user to fetch more rows when needed (next 500 rows at a time). You could do that nicely with the pagination component and a little AJAX.

Related

Order and sort_by difference in Ruby on Rails ActiveRecord

I am trying to sort my data according to timestamp field in my controller, note that the timestamp field may be null and may have some value. I wrote the following query.
#item = Item.sort_by(&:item_timestamp).reverse
.paginate(:page => params[:page], :per_page =>5)
But this gives error when I have items that have time_timestamp field value as NULL, but following query works.
#item = Item.order(:item_timestamp).reverse
.paginate(:page => params[:page], :per_page => 5)
Can anybody tell the difference between these two queries, and in which condition to use which one?
And I am using order and reverse to get the latest items from the database, Is this the best way or there are other best ways to get the latest data from database in terms of performance?
.sort_by is a Ruby method from Enumerable that is used to sort arrays (or array like objects). Using .sort_by will cause all the records to be loaded from the database into the servers memory, which can lead to serious performance problems (as well as your issue with nil values).
.order is a ActiveRecord method that adds a ORDER BY clause to the SQL select statement. The database will handle sorting the records. This is preferable in 99% of cases.
sort_by is executed in Ruby, so if you have a nil value, things will break in a way similar to this:
[3, nil, 1].sort
#=> ArgumentError: comparison of Fixnum with nil failed
order is executed by your RDBMS, which generally will do fine with NULL values. You can even specify where you want to put the NULL VALUES, by adding NULL FIRST (usually the default) or NULL LAST to your ORDER BY clause?
Hey you needn't you sort in that query, it'll work very long, if you work with DB you should always use :order, there solution for your problem
#item = Item.order('item_timestamp DESC NULLS LAST').paginate(:page => params[:page], :per_page => 5)
As it was said before me, .order is quicker, and it's enough in most cases, but sometimes you need sort_by, if you want to sort by value in a relation for example.
If you have a posts table and a view_counters table, where you have the number of views by article, you can't easily sort your posts by total views with .order.
But with sort_by, you can just do:
posts = #user.posts.joins(:view_counter)
#posts = posts.sort_by { |p| p.total_views }
.sort_by going to browse each element, get the relation value, then sort by the value of this relation, just with one code line.
You can further reduce the code with &:[attributeName], for example:
#posts = posts.sort_by(&:total_views)
Also, for your last question about the reverse, you can do this:
Item.order(item_timestamp: :desc)
When you use sort_by you break active record caching and as pointed out before, you load all the records into RAM memory.
When writing down queries, please always think about the SQL and the memory world, they are 2 separate things. It is like having an archive (SQL) and cart (Memory) where you put the files you take out of the archive to use later.
As most people mentioned the main difference is sort_by is a Ruby method and order is Rails ActiveRecord method. However, the scenario where to use them may vary case by case. For example you may have a scenario where sort_by may be appropriate if you already retrieved the data from the DB and want to sort on the loaded data. If you use order on then you might introduce n+1 issue and go to the database again while you already have the data loaded.

Yii activerecord and pagination count() slow query

So basicly the problem is in query SELECT COUNT(*) which executed in calculateTotalItemCount function in activedataprovider. As i understood it needed for pagination for $itemcount variable. The problem is this query slow for big tables. For my ~30m table it executes 5 seconds.
So there are 2 ways to solve this problem:
1. Disable pagination ('pagination'=>'false') and write own pagination.
2. Rewrite AR count function.
I dont have enough experience/knowledge to acomplish this.
Maybe some one had same issues before and can share his solution.
Atleast for totalItemCount we can use EXPLAIN SELECT *. Its way more faster.
I appreciate any help. Thank you.
If you have a "cheaper" query in raw SQL than the one that active records create automatically, you can also query manually (e.g. through DAO) and set the totalItemCount on your data provider:
$count = Yii::app()->db->createCommand('SELECT COUNT(*)...')->queryScalar();
$provider = new CActiveDataProvider('SomeModel', array(
'totalItemCount' => $count,
'criteria' => $criteria,
...

Minimum and maximum of a field in cakephp and mysql

I am trying to build a search function for a cakephp and mysql site. Selecting different parameters like price of the product or the length triggers an ajax call which returns the number of matching results. I want to extend the returned results with the the minimum and maximum values for the lengths and prices. I tried doing this, http://bin.cakephp.org/view/1004813660 . Using the first 4 finds is too time consuming. The last one functions locally, but I get the error;
1140 - Mixing of GROUP columns (MIN(),MAX(),,...) with no GROUP columns is illegal if there is no GROUP BY clause`
remotely, due to ONLY_FULL_GROUP_BY being on.
Is it possible to use the last option with some improvements, or can I switch off ONLY_FULL_GROUP_BY?
If I understood you well, you want to get in a single request
MIN(Yacht.price) as min_price
MAX(Yacht.price) as max_price
MIN(Yacht.long) as min_length
MAX(Yacht.long) as max_length
right ?
For this, you do not need any "Group By" clause. MIN and MAX functions are already aggregations functions. But nothing prevents you from using multiple aggregations functions in a single request.
Have you tried simply doing this ?
$stats = $this->Yacht->find(array(
'conditions' => $conditions,
'fields' => array(
'MIN(Yacht.price) as min_price',
'MAX(Yacht.price) as max_price',
'MIN(Yacht.long) as min_length',
'MAX(Yacht.long) as max_length'
)
)
);
By the way, according to the documentation, there seems to be quite a lot of redundancy in your original code. "find('first', array(...))" by itself ensures you get only one result hence, there is no need to specify "'limit' => 1" in the request nor "order" clause as there would be only one field anyway :)
Hope it helps.
The way to set server modes can be found here... If you read the top of the document it will tell you how to set the server mode defaults:
http://dev.mysql.com/doc/refman/5.1/en/server-sql-mode.html
However, I'm not sure that is necessary to get to your solution. I think your query is running for a long time because you need a different group by in your code and less queries. You should be able to use a logical group by that will maximize your primary key (index):
'group' => 'Yacht.id'
So you have one query returning everything:
$this->Yacht->find('first', array(
'conditions' => $conditions,
'fields' => array('MAX(Yacht.price) as max_price', 'MIN(Yacht.price) as min_price', ...)
'group' => 'Yacht.id'
'order' => '...'));
I ended up solving the problem by changing the way I was searching. Instead of doing queries in the conditions that would lead to joins, I explicitly did the searching with where. I had things like,
$conditions = array('Brand.name LIKE'=> '%bla%');
which I replaced it with
$condtions = array('Yacht.brand_name LIKE' => '%bla%');
I had to restructure the database a bit, but the tradeoff between speed and database normalization is one I can live with.

CakePHP model's find() not hitting mysql index b/c of integer strings

While using a code like this:
$Model->find('all', array(
'conditions' => array(
'field' => '1111'
)
));
where field is varchar mysql field cake generates a query like this:
SELECT * FROM Models WHERE field = 1111;
instead of expected
SELECT * FROM Models WHERE field = '1111';
This also makes mysql cast the entire DB to int instead of using string index.
I'm trying to optimize an already working system written by someone else, and a quick-grep shows thousands of find's I need to "fix". So an only acceptable solution should be either model-layer or mysql-layer.
tl;dr: How to make Cake pass integer strings from conditions to mysql as string and not as numbers?
After running out of chars in the comment.
Would you should really be doing:
Optimizing at the PHP (Cake) level.
the find() wrappers are not ideal. they incredibly powerful but very slow. they allow rapid development but need more queries than necessary.
so there you should try to do some bottle neck fixing.
the DB is probably (internally) still the fastest piece in the dispatcher chain - with or without casts.
To get more concrete:
use manual query() if you feel you have to
try to use atomic methods (updateAll, deleteAll) where possible
try to use containable and linkable behavior (especially the last) if you use find() calls with joins to cut down the amount of queries
cache your db results somehow

Can I Recreate The Following MySQL Query In CakePHP?

I'm trying to achieve this query in CakePHP (1.3, if that's relevant):
select * from releases r join formats f on r.id = f.release_id
where r.default_upc = f.bar_code
I was hoping I could do something in the Release model like:
var $hasOne = array('Format'=>array(
'conditions' => array('Release.default_upc'=>'Format.bar_code')
));
Unfortunately this just results in a null Format; evidently 'Format.bar_code' is not yet available at the time the query is made.
What's the quickest route to getting the results I want?
Hmm, it does appear that simply changing the conditions to
'conditions' => array('Release.default_upc = Format.bar_code')
may elicit the results I seek. Is this an idiomatic Cake way of doing things?
As far as I know, using conditions with JOINS, in cakePHP, should be done as you did in the answer you provided.
Happened to me several times before
The first way should be used in the model itself and within "regular" find calls.