Codeigniter database call efficiency - mysql

This is an efficiency/best practice question. Hoping to receive some feed back on performance. Any advice is greatly appreciated.
So here is a little background in what i have setup. I'm using codeigniter, the basic setup is pretty similar to any other product relationships. Basic tables are: Brands, products, categories. On top of these tables there is a need for install sheets, marketing materials, and colors.
I created some relationship tables:
Brands_Products
Products_Colors
Products_Images
Products_Sheets
I also have a Categories_Relationships table that holds all of the relationships to categories. Install sheets etc can have their own categories but i didn't want to define a different category relationship table for each type because i didn't think that would be very expandable.
On the front end I am sorting by brands, and categories.
I think that covers the background now to the efficiency part. I guess my question pertains mostly to weather it would be better to use joins or to make separate calls to return individual parts of each item (colors, images, etc)
What I currently have coded is working, and sorting fine but I think i can improve the performance, as it take some time to return the query. Right now its returning about 45 items. Here is my first function it grabs all the products and its info.
It works by first selecting all the products and joining it's brand information. then looping through the result i set up the basic information, but for the categories images and installs i am using functions that returns each of respected items.
public function all()
{
$q = $this->db
->select('*')
->from('Products')
->join('Brands_Products', 'Brands_Products.product_id = Products.id')
->join('Brands', 'Brands.id = Brands_Products.brand_id')
->get();
foreach($q->result() as $row)
{
// Set Regular Data
$data['Id'] = $row->product_id;
$data['Name'] = $row->product_name;
$data['Description'] = $row->description;
$data['Brand'] = $row->brand_name;
$data['Category'] = $this->categories($row->product_id);
$data['Product_Images'] = $this->product_images($row->product_id);
$data['Product_Installs'] = $this->product_installs($row->product_id);
$data['Slug'] = $row->slug;
// Set new item in return object with created data
$r[] = (object)$data;
}
return $r;
}
Here is an example of one of the functions used to get the individual parts.
private function product_installs($id)
{
// Select Install Images
$install_images = $this->db
->select('*')
->where('product_id', $id)
->from('Products_Installs')
->join('Files', 'Files.id = Products_Installs.file_id')
->get();
// Add categories to category object
foreach($install_images->result() as $pImage)
{
$data[] = array(
'id' => $pImage->file_id,
'src' => $pImage->src,
'title' => $pImage->title,
'alt' => $pImage->alt
);
}
// Make sure data exists
if(!isset($data))
{
$data = array();
}
return $data;
}
So again really just looking on advice on what is the most efficient, best practice way of doing this. I really appreciate any advice, or information.

I think your approach is correct. There are only a couple of options: 1) load your product list first, then loop, and load required data for each product row. 2) create a big join on all tables first, then loop through (possibly massive) cartesian product. The second might get rather ugly to parse. For example, if you got Product A and Product B, and Product A has Install 1, Install 2, Install 3, and product B has Install 1, and Install 2,t hen your result is
Product A Install 1
Product A Install 2
Product A Install 3
Product B Install 1
Product B Install 2
Now, add your images and categories to the join and it might become huge.
I am not sure what the sizes of your tables are but returning 45 rows shouldn't take long. The obvious thing to ensure (and you probably did that already) is that product_id is indexed in all tables as well as your brands_products tables and others. Otherwise, you'll do a table scan.
The next question is how you're displaying your data on the screen. So you're getting all products. Do you need to load categories, images, installs when you're getting a list of products? If you're simply listing products on the screen, you might want to wait to load that data until user picks a products they are viewing.
On a side note, any reason you're converting your array to object
$r[] = (object)$data;
Also, in the second function, you can simply add
$data = array();
before the foreach, instead of
// Make sure data exists
if(!isset($data))
{
$data = array();
}

You can try this:
Query all of the products
Get all of the product IDs from step 1
Query all of the install images that has a product ID from step 2, sorted by product ID
Iterate through the products from step 1, and add the results from step 3
That takes you from 46 queries (for 45 products) to 2 queries, without any additional joins.
You can also use CodeIgniter's Query Caching to increase performance even further, if it's worth the time to write the code to reset the cache when data is updated.
Doing more work in PHP is generally better than doing the work in MySQL in terms of scalability. You can scale PHP easily with load balancers and web servers. Scaling MySQL isn't as easy due to concurrency issues.

Related

Querying multiple relationships in Laravel and left join another table

i'm trying to do something perhaps a bit too crazy with Eloquent right now, i have a database where i have the following Tables
Crons - (Has Many) - Campaign - (Has Many) - Leads - (Has Many) - Conversions
I need to get all leads from a Cron, that have no entries in the Conversions table in the last X amount of days
I'm thinking of using a Scope on the Cron model but i'm completely stuck on how to proceed from here.
public function scopeWithValidLeads($query) {
return $query->with(['leads' => function($q) {
}]);
}
So i need to get LEADS where the following is true.
A - The leads belong to a campaign associated with the Cron via a Many-To-Many relationship.
B - They have no record in the conversions table Under this specific campaign or if they do, that the lead is older than X amount of days.
You can get your desired result using doesntHave() method like this:
$x = 10; // last 10 days
$crons = Cron::doesntHave('compaign.leads.conversions')
->whereBetween('created_at', [Carbon::now(), Carbon::now()->addDays($x)])
->get();
Querying Relationship Absence: When accessing the records for a model, you may wish to limit your results based on the absence of a relationship. For example, imagine you want to retrieve all blog posts that don't have any comments. To do so, you may pass the name of the relationship to the doesntHave method
UPDATE
As per the updated question conditions, according to my understanding the leads can be obtained by:
$leads = Lead::whereHas('compaign', function($q) use($compaign) {
$q->has('crons')
->where('id', $compaign->id);
// Use the above line if in case of a compaign is to be filtered out
})->doesntHave('conversions')
->whereBetween('created_at', [Carbon::now(), Carbon::now()->addDays($x)])
->get();
Hope this helps!

yii pagination issue trying to use 2 criterias

Disclaimer I'm self taught. Got my rudimentary knowledge of php reading forums. I'm an sql newb, and know next to nothing about yii.
I've got a controller that shows the products on our webstore. I would like the out of stock products to show up on the last pages.
I know I could sort by stock quantity but would like the in stock products to change order every time the page is reloaded.
My solution (probably wrong but kinda works) is to run two queries. One for the product that has stock, sorted randomly. One for the out of stock product also ordered randomly. I then merge the two resulting arrays. This much has worked using the code below (although I feel like there must be a more efficient way than running two queries).
The problem is that this messes up the pagination. Every product returned is listed on the same page and changing pages shows the same results. As far as I can tell the pagination only works for 1 CDbCriteria at a time. I've looked at the yii docs for CPagination for a way around this but am not getting anywhere.
$criteria=new CDbCriteria;
$criteria->alias = 'Product';
$criteria->addCondition('(inventory_avail>0 OR inventoried=0)');
$criteria->addCondition('Product.parent IS NULL');
$criteria->addCondition('web=1');
$criteria->addCondition('current=1');
$criteria->addCondition('sell>sell_web');
$criteria->order = 'RAND()';
$criteria2=new CDbCriteria;
$criteria2->alias = 'Product';
$criteria2->addCondition('(inventory_avail<1 AND inventoried=1)');
$criteria2->addCondition('Product.parent IS NULL');
$criteria2->addCondition('web=1');
$criteria2->addCondition('current=1');
$criteria2->addCondition('sell>sell_web');
$criteria2->order = 'RAND()';
$crit1=Product::model()->findAll($criteria);
$crit2=Product::model()->findAll($criteria2);
$models=array_merge($crit1,$crit2);
//I know there is something wrong here, no idea how to fix it..
$count=Product::model()->count($criteria);
$pages=new CPagination($count);
//results per page
$pages->pageSize=30;
$pages->applyLimit($criteria);
$this->render('index', array(
'models' => $models,
'pages' => $pages
));
Clearly I am in over my head. Any help would be much appreciated.
Edit:
I figured that a third CDbCriteria that includes both the in stock and out of stock items could be used for the pagination (as it would include the same number of products as the combined results of the first 2). So I tried adding this (criteria1 and criteria2 remain the same):
$criteria3=new CDbCriteria;
$criteria3->alias = 'Product';
//$criteria3->addCondition('(inventory_avail>0 OR inventoried=0)');
$criteria3->addCondition('Product.parent IS NULL');
$criteria3->addCondition('web=1');
$criteria3->addCondition('current=1');
$criteria3->addCondition('sell>sell_web');
//$criteria3->order = 'RAND()';
$crit1=Product::model()->findAll($criteria);
$crit2=Product::model()->findAll($criteria2);
$models=array_merge($crit1,$crit2);
$count=Product::model()->count($criteria3);
$pages=new CPagination($count);
//results per page
$pages->pageSize=30;
$pages->applyLimit($criteria3);
$crit1=Product::model()->findAll($criteria);
$crit2=Product::model()->findAll($criteria2);
$models=array_merge($crit1,$crit2);
$this->render('index', array(
'models' => $models,
'pages' => $pages
));
I'm sure I'm missing something super obvious here... Been searching all day getting nowhere.
So you are running into what is IMO one of the potential drawbacks of natural language query builder frameworks. They can get your thinking on how you might approach a SQL problem going down a bad path when trying to work with the "out of the box" methods for building queries. Sometimes you might need to think about using raw SQL query capabilities that most every framework to provide in order to best address your problem.
So let's start with the basic SQL for how I would suggest you approach your problem. You can either work this into your query builder style (if possible) or make a raw query.
You could easily form a calculated field representing binary inventory status for sorting. Then also sort by another criteria secondarily.
SELECT
field1,
field2,
/* other fields */
IF(inventory_avail > 0, 1, 0) AS in_inventory
FROM product
WHERE /* where conditions */
ORDER BY
in_inventory DESC, /* sort items in inventory first */
other_field_to_sort ASC /* other sort criteria */
LIMIT ?, ? /* pagination row limit and offset */
Note that this approach only returns the rows of data you need to display. You move away from your current approach of doing a lot of work in the application to merge record sets and such.
I do question use of RAND() for pagination purposes as doing so will yield products potentially appearing on one page after another as the user paginates through the pages, with other products perhaps not showing up at all. Either that or you need to have some additional complexity added to your applicatoin to somehow track the "randomized" version of the entire result set for each specific user. For this reason, it is really unusual to see order randomization for paginated results display.
I know you mentioned you might like to spike out a randomized view to the user on a "first page". If this is a desire that is OK, but perhaps you decouple or differentiate that specific view from a wider paginated view of the product listing so as to not confuse the end user with a seemingly unpredictable pagination interface.
In your ORDER BY clause, you should always have enough sorting conditions to where the final (most specific) condition will guarantee you a predictable order result. Oftentimes this means you have to include an autoincrementing primary key field, or similar field that provides uniqueness for the row.
So let's say for example I had the ability for user to sort items by price, but you still obviously wanted to show all inventoried items first. Now let's say you have 100K products such that you will have many "pages" of products with a common price when ordered by price
If you used this for ordering:
ORDER BY in_inventory DESC, price ASC
You could still have the problem of a user seeing the same product repeated when navigating between pages, because a more specific criteria than price was not given and ordering beyond that criteria is not guaranteed.
You would probably want to do something like:
ORDER BY in_inventory DESC, price ASC, unique_id ASC
Such that the order is totally predictable (even though the user may not even know there is sorting being applied by unique id).

Trying to restrict an Eloquent query to a relationship with a count of 0

I have two models (Organizations and Interactions) and I'd like to query the Organization model for all of the Orgs that have no Interactions. Organizations have a one-to-many relationship with Interactions.
I tried looking into anti-joins in raw SQL, but got nowhere. I also wanted to totally avoid anything like getting all of the full Organizations, then iterating through them to check to see if they had any Interactions, because that's completely impractical given the amount of data I'm working with.
To clarify, I want to avoid this:
$organizations = Organization::all();
foreach ($organizations as $org)
if($org->interactions()->count() == 0){
//Add the org to an array for later use because it has no interactions
}
I'm using Laravel 3.x, and I can't upgrade because the project is really big and I don't have the month it would take to upgrade to 4.1 right now. If there's a significantly better way to do stuff like this 4, that would make selling the conversion process easier.
Here's some relevant code:
//From organization.php
public function interactions() {
return $this->has_many('Interaction');
}
//From interaction.php
public function organization() {
return $this->belongs_to('Organization');
}
// select all Organization IDs that have at least 1 interaction
$uniqueOrganizationIDs = DB::raw('SELECT organization_id FROM interactions GROUP BY(organization_id)');
// Select orgs that were not in the above list.
Organization::whereNotIn('id', $uniqueOrganizationIDs)->get();
This is the solution I came up with:
Query the Organization and Interaction models using list(). For Orgs, get their ID. For Interactions, get their organization_id. I figure these are two low-footprint, fast queries.
Do an array_diff() on them to get an array of Organizations that don't have Interactions.
Query Organization using where_in(), feeding it the diff'ed array.
It looks like this:
$organizationIDs = DB::table('organizations')->where('is_deleted', '=', 0)->lists('id');
$interactionIDs = DB::table('interactions')where('is_deleted', '=', 0)->lists('organization_id');
$uncontactedOrganizationIDs = array_diff($organizationIDs, $interactionIDs);
$uncontactedOrganizations = Organization::where_in('id', $uncontactedOrganizationIDs)->order_by('created_at', 'DESC')->get();
Is there a better way to do this? I feel like there has to be.

Does a large number of very simple sql queries impact performance?

I have search filters. Each filter/attribute have own values I need to show.
<?php foreach(getAdditionalFilters() as $attributeName => $filterData): ?>
<ul>
<?php showLiCheckBoxes($attributeName); ?>
</ul>
<?php endforeach; ?>
There are two ways how to display filters:
1.
function showLiCheckBoxes($attribute,$checked=null){
$CI = get_instance();
$CI->load->model('helper_model');
$allAttrOptions = $CI->helper_model->getAttrOptions($attribute);
foreach($allAttrOptions as $key => $option){
//print html of inputs like <li><input type='checkox' value='{$option['optionId']}...
};
}
function getAttrOptions($attrCode){
$sql = "SELECT option_id as optionId, label FROM eav_attribute_option
INNER JOIN eav_attribute USING(attr_id)
WHERE code='$attrCode'";
$query = $this->db->prepare($sql);
$query->execute();
$query->setFetchMode(PDO::FETCH_ASSOC);
return $query->fetchAll();
}
2.
function showLiCheckBoxes($attribute,$checked=null){
foreach(Attributes::${$attribute} as $key => $value){
//print html of inputs like <li><input type='checkox' value='{$option['optionId']}...
};
}
class Attributes
{
public static $education = array(0 => 'No answer',
1 => 'High school',
2 => 'Some college',
3 => 'In college',
4 => 'College graduate',
5 => 'Grad / professional school',
6 => 'Post grad');
public static $...
So in the first way attributes options are stored in database and as you can see for each attribute extra query is made. In the second way attributes options are stored in array.
I am not asking which way is correct because there are some other facts not wrote here. What I am asking is just that if 20 extra very simple queries for table eav_attribute_option with only 500 rows could make any performance impact. Should I care about it or is the query I wrote here so simple and table is so small that it won't make any difference at all?
Of course I could make this also with 1 query, but this is not the question again. I was asking myself if so many extra requests to sql server could be a bad thing, not the query itself.
You have to profile it to know if it takes too long for your scenario.
Every query has an overhead, regardless of the number of rows returned.
So, in general, one query that returns 20 rows is better than 20 queries that return one row each.
In general, I wouldn't worry about performance until you actually have an issue. Customer feedback may prompt you to rewrite the function long before performance becomes a problem.
Having said that, RBAR (row by agonizing row) is THE classic performance issue with databases. A round trip to the database can be expensive (especially if the database is on a different server.) This gets out of hand of you load customers, with all their orders, with all the products in those orders, with the properties of those products. In one query that's trivial; with one query per entity, it's a performance disaster.
So you'd do well to avoid RBAR if you can do so without a lot of work. Since you've already written out both scenario's, it couldn't hurt to go for the one with less queries.
Queries fall into the classic I/O issue - as someone else mentioned, a database on a different server than the application can (and most likely will) increase response time. There's also the issue of physical I/O - result sets returned for queries might be cached if the data is frequently or recently accessed, but if not, there's the response time of the db server.
Extra queries can degrade performance due to context switching overheads (yes, in that case, one query that returns 20 rows is better than 20 queries that return one row each).
If these queries are run very often, it would speed things up considerably if they were cacheable.

Recursive MySQL trigger which calls the same table and the same trigger

I'm writing a simple forum for a php site. I'm trying to calculate the post counts for each category. Now a category can belong to another category with root categories being defined as having a NULL parent_category_id. With this architecture a category can have an unlimited number of sub-categories and keeps the table structure fairly simple.
To keep things simple lets say the categories table has 3 fields: category_id, parent_category_id, post_count. I don't think the remaining database structure is relevant so I'll leave it out for now.
Another trigger is calling the categories table causing this trigger to run. What I want is it to update the post count and then recursively go through each parent category increasing that post count.
DELIMITER $$
CREATE TRIGGER trg_update_category_category_post_count BEFORE UPDATE ON categories FOR EACH ROW
BEGIN
IF OLD.post_count != NEW.post_count THEN
IF OLD.post_count < NEW.post_count THEN
UPDATE categories SET post_count = post_count + 1 WHERE categories.category_id = NEW.parent_category_id;
ELSEIF OLD.post_count > NEW.post_count THEN
UPDATE categories SET post_count = post_count - 1 WHERE categories.category_id = NEW.parent_category_id;
END IF;
END IF;
END $$
DELIMITER ;
The error I'm getting is:
#1442 - Can't update table 'categories' in stored function/trigger because it is already used by statement which invoked this stored function/trigger.
I figure you can do a count() on each page load to calculate the total posts but on large forums this will slow things down as discussed many times on here (e.g. Count posts with php or store in database). Therefore for future proofing i'm storing the post count in the table. To go one step further I thought i'd use triggers to update these counts rather than PHP.
I understand there are limitations in MySQL for running triggers on the same table that's being updated which is what is causing this error (i.e. to stop an infinite loop) but in this case surely the loop would stop once it reaches a category with a NULL parent_category_id? There must be some kind of solution whether it's adjusting this trigger or something different entirely. Thanks.
EDIT I appreciate this might not be the best way of doing things but it is the best thing I can think of. I suppose if you changed a parents category to another it would mess things up, but this could be fixed by another trigger which re-syncs everything. I'm open to other suggestions on how to solve this problem.
I usually recommend against using triggers unless you really, really need to; recursive triggers are a great way of introducing bugs that are really hard to reproduce, and require developers to understand the side effects of an apparently simple action - "all I did was insert a record into the categories table, and now the whole database has locked up". I've seen this happen several times - nobody did anything wrong or stupid, it's just a risk you run with side effects.
So, I would only resort to triggers once you can prove you need to; rather than relying on the opinion of strangers based on generalities, I'd rig up a test environment, drop in a few million test records, and try to optimize the "calculate posts on page load" solution so it works.
A database design that might help with that is Joe Celko's "nested set" schema - this takes a while to get your head round, but can be very fast for querying.
Only once you know you have a problem that you really can't solve other than by pre-computing the post count would I consider a trigger-based approach. I'd separate out the "post counts" into a separate table; that keeps your design a little cleaner, and should get round the recursive trigger issue.
The easiest solution is to fetch all the posts per category and afterwards link them together using a script/programming language:
for instance in php:
<?php
// category: id, parent, name
// posts: id, title, message
$sql = "select *, count(posts.id) From category left join posts ON posts.cat = category.id Group by category.id";
$query = mysql_query($sql);
$result = array();
while($row = mysql_fetch_assoc($query)){
$parent = $row['parent'] == null ? 0 : $row['parent'];
$result[$parent][] = $row;
}
recur_count(0);
var_dump($result);
function recur_count($depth){
global $result;
var_dump($result[$depth],$depth);
foreach($result[$depth] as $id => $o){
$count = $o['count'];
if(isset($result[$o['id']])){
$result[$depth][$id]['count'] += recur_count($o['id']);
}
}
return $count;
}
Ok so for anyone wondering how I solved this I used a mixture of both triggers and PHP.
Instead of getting each category to update it's parent, I've left it to the following structure: a post updates it's thread and then a thread updates it's category with the post count.
I've then used PHP to pull all categories from the database and loop through adding up each post count value using something like this:
function recursiveCategoryCount($categories)
{
$count = $categories['category']->post_count;
if(!is_null($categories['children']))
foreach($categories['children'] as $child)
$count += recursiveCategoryCount($child);
return $count;
}
At worst instead of PHP adding up every post on every page load, it only adds up the total category posts (depending at what node in the tree you are in). This should be very efficient as you're reducing the total calculations from 1000s to 10s or 100s depending on your number of categories. I would also recommend running a script every week to recalculate the post counts in case they become out of sync, much like phpBB. If I run into issues using triggers then I'll move that functionality into the code. Thanks for everyones suggestions.