Does a large number of very simple sql queries impact performance? - mysql

I have search filters. Each filter/attribute have own values I need to show.
<?php foreach(getAdditionalFilters() as $attributeName => $filterData): ?>
<ul>
<?php showLiCheckBoxes($attributeName); ?>
</ul>
<?php endforeach; ?>
There are two ways how to display filters:
1.
function showLiCheckBoxes($attribute,$checked=null){
$CI = get_instance();
$CI->load->model('helper_model');
$allAttrOptions = $CI->helper_model->getAttrOptions($attribute);
foreach($allAttrOptions as $key => $option){
//print html of inputs like <li><input type='checkox' value='{$option['optionId']}...
};
}
function getAttrOptions($attrCode){
$sql = "SELECT option_id as optionId, label FROM eav_attribute_option
INNER JOIN eav_attribute USING(attr_id)
WHERE code='$attrCode'";
$query = $this->db->prepare($sql);
$query->execute();
$query->setFetchMode(PDO::FETCH_ASSOC);
return $query->fetchAll();
}
2.
function showLiCheckBoxes($attribute,$checked=null){
foreach(Attributes::${$attribute} as $key => $value){
//print html of inputs like <li><input type='checkox' value='{$option['optionId']}...
};
}
class Attributes
{
public static $education = array(0 => 'No answer',
1 => 'High school',
2 => 'Some college',
3 => 'In college',
4 => 'College graduate',
5 => 'Grad / professional school',
6 => 'Post grad');
public static $...
So in the first way attributes options are stored in database and as you can see for each attribute extra query is made. In the second way attributes options are stored in array.
I am not asking which way is correct because there are some other facts not wrote here. What I am asking is just that if 20 extra very simple queries for table eav_attribute_option with only 500 rows could make any performance impact. Should I care about it or is the query I wrote here so simple and table is so small that it won't make any difference at all?
Of course I could make this also with 1 query, but this is not the question again. I was asking myself if so many extra requests to sql server could be a bad thing, not the query itself.

You have to profile it to know if it takes too long for your scenario.
Every query has an overhead, regardless of the number of rows returned.
So, in general, one query that returns 20 rows is better than 20 queries that return one row each.

In general, I wouldn't worry about performance until you actually have an issue. Customer feedback may prompt you to rewrite the function long before performance becomes a problem.
Having said that, RBAR (row by agonizing row) is THE classic performance issue with databases. A round trip to the database can be expensive (especially if the database is on a different server.) This gets out of hand of you load customers, with all their orders, with all the products in those orders, with the properties of those products. In one query that's trivial; with one query per entity, it's a performance disaster.
So you'd do well to avoid RBAR if you can do so without a lot of work. Since you've already written out both scenario's, it couldn't hurt to go for the one with less queries.

Queries fall into the classic I/O issue - as someone else mentioned, a database on a different server than the application can (and most likely will) increase response time. There's also the issue of physical I/O - result sets returned for queries might be cached if the data is frequently or recently accessed, but if not, there's the response time of the db server.

Extra queries can degrade performance due to context switching overheads (yes, in that case, one query that returns 20 rows is better than 20 queries that return one row each).
If these queries are run very often, it would speed things up considerably if they were cacheable.

Related

Thinking sphinx ranking and statistics

I'm trying to set up an ability to get some numbers from my Sphinx indexes, but not sure how to get the info I want.
I have a mysql db with articles, sphinx index set up for that db and full text search, all working. What I want is to get some numbers:
How many times search text (keyword, or key phrase) appears over all articles for all time (more likely limited to "articles from time interval from X and to Y")
Same as previous but for how many times 2 keywords or keyphrases (so "x AND y") appear in same articles
I was doing something similar to first manually using bat file I made
indexer ind_core -c c:\%SOME_PATH%\development.sphinx.conf --buildstops stats.txt 10000 --buildfreqs
Which generated me a txt with all repeating keywords and how often they appear at early development stages, which helped to form a list of keywords I'm interested in. Now I'm trying to do the same but just for a finite list of predetermined keywords and integrated into my rails project to be able to build charts in future.
I tried running some queries like
#testing = Article.search 'Keyword1 AND Keyword2', :ranker => :wordcount
but I'm not sure how it works and how to process the result, as well as if that's what I'm looking for.
Another approach I tried was manual mysql queries such as
SELECT id,title,WEIGHT() AS w FROM ind_core WHERE MATCH('#title keyword1 | keyword2') OPTION ranker=expr('sum(hit_count)');
but I'm not sure how to process results from here either (as well as how to actually implement it into my existing rails project), and it's limited to 20 lines per query (which I think I can change somewhere in settings?). But at least looking at mysql results what I'm interested in is hit_count over all articles (or all articles from set timeframe).
Any ideas on how to do this?
UPDATE:
Current way I found was to add
#testing = Article.search params[:search], :without => {:is_active => false}, :ranker => :bm25
to controller with some conditions (so it doesn't bug out from nil search). :is_active is my soft delete flag, don't want to search deleted entries, so don't mind it. And in view I simply displayed
<%= #testing.total_entries %>
Which if I understand it correct shows me number of matches sphinx found (so pretty much what I was looking for).
So, to figure out the number of hits per document, you're pretty much on the right track, it's just a matter of getting it into Ruby/Thinking Sphinx.
To get the raw Sphinx results (if you don't need the ActiveRecord objects):
search = Article.search "foo",
:ranker => "expr('SUM(hit_count)')",
:select => "*, weight()",
:middleware => ThinkingSphinx::Middlewares::RAW_ONLY
… this will return an array of hashes, and you can use the weight() string key for the hit count, and the sphinx_internal_id string key for the model's primary key (id is Sphinx's own primary key, which isn't so useful).
Or, if you want to use the ActiveRecord objects, Thinking Sphinx has the ability to wrap each search result in a helper object which passes appropriate methods through to the underlying model instances, but lets weight respond with the values from Sphinx:
search = Article.search "foo",
:ranker => "expr('SUM(hit_count)')",
:select => "*, weight()"; ""
search.context[:panes] << ThinkingSphinx::Panes::WeightPane
search.each do |article|
puts article.weight
end
Keep in mind that panes must be added before the search is evaluated, so if you're testing this in a Rails console, you'll want to avoid letting the console inspect the search variable (which I usually do by adding ; "" at the end of the initial search call.
In both of these cases, as you've noted, the search results are paginated - you can use the :page option to determine which page of results you want, and :per_page to determine the number of records returned in each request. There is a standard limit of 1000 results overall, but that can be altered using the max_matches setting.
Now, if you want the number of times the keywords appear across all Sphinx records, then the best way to do that while also taking advantage of Thinking Sphinx's search options, is to get the raw results of an aggregate SUM - similar to the first option above.
search = Article.search "foo",
:ranker => "expr('SUM(hit_count)')",
:select => "SUM(weight()) AS count",
:middleware => ThinkingSphinx::Middlewares::RAW_ONLY
search.first["count"]

Is it good to filter data in controller or use sql query in model?

What is the best approach for searching?What will be difference if i filter the all data in controller and get result, and use where query in model and get result ?Please suggest your opinion.
It depends on the complexity of your query..
You can try to mesure the time of processing by putting flags in your code between each steps in order to measure the time of processing.
Then you make one test of speed processing like:
print time_flag 1
var results_sql_processing = *complex query*
print time flag 2
var raw_results_script_processing = *dump query*
print time flag 3
var results_script_processing = *processing the data*
print time flag 4
and make sure results_script_processing == results_sql_processing.
You can also set different datasets size (limit 100, 500, 1000) and see how the difference evolves between both solutions
Also, i would recommend to take a look at the query builders you might find in many frameworks (I use laravel's query builder).
They are usually a very good compromise when the query isn't too complex (no data aggregation, complex concat,....), you can still use joins, union and many filters on them.
But in case you want to get a super complex pivot table for instance, just build a strong sql query and then fire it in your code!

Top level MySQL statistics

Have not been able to find any information on this, I could do this in its own but I feel keeping it in the query might be the best option, if its possible.
Basically I want to try to add a top level "statistics" portion of a query.
So when I get the results I will see it like so
num_rows = 900
distinct_col = 9
results = array()
This way I can loop the results normally, and then pull out information that I would only need once outside of it. Is this possible?
EDIT:
I am not looking for the normal mysql statistics like num_rows exactly. But in a case where lets say you limit the results to ten, num_rows would return 10, but you want total results, so 900. In most cases I would just use another query and look just for the amount, however combining it all into one query logically seems faster for me. There is also more then just the num_rows I may need, say they are all products and have a specific category, I would need to count the amount of categories all items fall under. So looping the raw results when there is only one result for those columns is sillyness.
EDIT 2:
To clarify further I need to get some counts on some columns, and maybe a min-max result on a join. Having it return on every loop would work, but the same exact return uselessly returning on every loop when its only needed once does not seem logical. I am no MySQL expert and am mainly just trying to make sure I come up with the most logical and fastest method to get the required data.
Here is a PHP return example:
array(
[num_rows] => 900,
[categories] => 9,
[min_price] => 400,
[max_price] => 900,
[results] => array(
[0] => //row array
[1] -> //row array
)
);
Mysql returns its default num rows before you "fetch" the results, having custom results added there may be sufficient.
Dunno why do you need that but that's very easy to get
Assuming you are using safeMysql (though you can use whatever way to get data into array)
$results = $db->getAll("SELECT * FROM t");
$num_rows = count($results);
$num_cols = count($results[0]);
that's all
I am mainly just trying to make sure I come up with the most logical and fastest method to get the required data.
Yes, you are.
Nothing wrong with getting aggregated data with every loop.
As for the count beyond LIMITs - when you need it, you can use mysql's SQL_CALC_FOUND_ROWS / FOUND_ROWS() feature

Recursive MySQL trigger which calls the same table and the same trigger

I'm writing a simple forum for a php site. I'm trying to calculate the post counts for each category. Now a category can belong to another category with root categories being defined as having a NULL parent_category_id. With this architecture a category can have an unlimited number of sub-categories and keeps the table structure fairly simple.
To keep things simple lets say the categories table has 3 fields: category_id, parent_category_id, post_count. I don't think the remaining database structure is relevant so I'll leave it out for now.
Another trigger is calling the categories table causing this trigger to run. What I want is it to update the post count and then recursively go through each parent category increasing that post count.
DELIMITER $$
CREATE TRIGGER trg_update_category_category_post_count BEFORE UPDATE ON categories FOR EACH ROW
BEGIN
IF OLD.post_count != NEW.post_count THEN
IF OLD.post_count < NEW.post_count THEN
UPDATE categories SET post_count = post_count + 1 WHERE categories.category_id = NEW.parent_category_id;
ELSEIF OLD.post_count > NEW.post_count THEN
UPDATE categories SET post_count = post_count - 1 WHERE categories.category_id = NEW.parent_category_id;
END IF;
END IF;
END $$
DELIMITER ;
The error I'm getting is:
#1442 - Can't update table 'categories' in stored function/trigger because it is already used by statement which invoked this stored function/trigger.
I figure you can do a count() on each page load to calculate the total posts but on large forums this will slow things down as discussed many times on here (e.g. Count posts with php or store in database). Therefore for future proofing i'm storing the post count in the table. To go one step further I thought i'd use triggers to update these counts rather than PHP.
I understand there are limitations in MySQL for running triggers on the same table that's being updated which is what is causing this error (i.e. to stop an infinite loop) but in this case surely the loop would stop once it reaches a category with a NULL parent_category_id? There must be some kind of solution whether it's adjusting this trigger or something different entirely. Thanks.
EDIT I appreciate this might not be the best way of doing things but it is the best thing I can think of. I suppose if you changed a parents category to another it would mess things up, but this could be fixed by another trigger which re-syncs everything. I'm open to other suggestions on how to solve this problem.
I usually recommend against using triggers unless you really, really need to; recursive triggers are a great way of introducing bugs that are really hard to reproduce, and require developers to understand the side effects of an apparently simple action - "all I did was insert a record into the categories table, and now the whole database has locked up". I've seen this happen several times - nobody did anything wrong or stupid, it's just a risk you run with side effects.
So, I would only resort to triggers once you can prove you need to; rather than relying on the opinion of strangers based on generalities, I'd rig up a test environment, drop in a few million test records, and try to optimize the "calculate posts on page load" solution so it works.
A database design that might help with that is Joe Celko's "nested set" schema - this takes a while to get your head round, but can be very fast for querying.
Only once you know you have a problem that you really can't solve other than by pre-computing the post count would I consider a trigger-based approach. I'd separate out the "post counts" into a separate table; that keeps your design a little cleaner, and should get round the recursive trigger issue.
The easiest solution is to fetch all the posts per category and afterwards link them together using a script/programming language:
for instance in php:
<?php
// category: id, parent, name
// posts: id, title, message
$sql = "select *, count(posts.id) From category left join posts ON posts.cat = category.id Group by category.id";
$query = mysql_query($sql);
$result = array();
while($row = mysql_fetch_assoc($query)){
$parent = $row['parent'] == null ? 0 : $row['parent'];
$result[$parent][] = $row;
}
recur_count(0);
var_dump($result);
function recur_count($depth){
global $result;
var_dump($result[$depth],$depth);
foreach($result[$depth] as $id => $o){
$count = $o['count'];
if(isset($result[$o['id']])){
$result[$depth][$id]['count'] += recur_count($o['id']);
}
}
return $count;
}
Ok so for anyone wondering how I solved this I used a mixture of both triggers and PHP.
Instead of getting each category to update it's parent, I've left it to the following structure: a post updates it's thread and then a thread updates it's category with the post count.
I've then used PHP to pull all categories from the database and loop through adding up each post count value using something like this:
function recursiveCategoryCount($categories)
{
$count = $categories['category']->post_count;
if(!is_null($categories['children']))
foreach($categories['children'] as $child)
$count += recursiveCategoryCount($child);
return $count;
}
At worst instead of PHP adding up every post on every page load, it only adds up the total category posts (depending at what node in the tree you are in). This should be very efficient as you're reducing the total calculations from 1000s to 10s or 100s depending on your number of categories. I would also recommend running a script every week to recalculate the post counts in case they become out of sync, much like phpBB. If I run into issues using triggers then I'll move that functionality into the code. Thanks for everyones suggestions.

Codeigniter database call efficiency

This is an efficiency/best practice question. Hoping to receive some feed back on performance. Any advice is greatly appreciated.
So here is a little background in what i have setup. I'm using codeigniter, the basic setup is pretty similar to any other product relationships. Basic tables are: Brands, products, categories. On top of these tables there is a need for install sheets, marketing materials, and colors.
I created some relationship tables:
Brands_Products
Products_Colors
Products_Images
Products_Sheets
I also have a Categories_Relationships table that holds all of the relationships to categories. Install sheets etc can have their own categories but i didn't want to define a different category relationship table for each type because i didn't think that would be very expandable.
On the front end I am sorting by brands, and categories.
I think that covers the background now to the efficiency part. I guess my question pertains mostly to weather it would be better to use joins or to make separate calls to return individual parts of each item (colors, images, etc)
What I currently have coded is working, and sorting fine but I think i can improve the performance, as it take some time to return the query. Right now its returning about 45 items. Here is my first function it grabs all the products and its info.
It works by first selecting all the products and joining it's brand information. then looping through the result i set up the basic information, but for the categories images and installs i am using functions that returns each of respected items.
public function all()
{
$q = $this->db
->select('*')
->from('Products')
->join('Brands_Products', 'Brands_Products.product_id = Products.id')
->join('Brands', 'Brands.id = Brands_Products.brand_id')
->get();
foreach($q->result() as $row)
{
// Set Regular Data
$data['Id'] = $row->product_id;
$data['Name'] = $row->product_name;
$data['Description'] = $row->description;
$data['Brand'] = $row->brand_name;
$data['Category'] = $this->categories($row->product_id);
$data['Product_Images'] = $this->product_images($row->product_id);
$data['Product_Installs'] = $this->product_installs($row->product_id);
$data['Slug'] = $row->slug;
// Set new item in return object with created data
$r[] = (object)$data;
}
return $r;
}
Here is an example of one of the functions used to get the individual parts.
private function product_installs($id)
{
// Select Install Images
$install_images = $this->db
->select('*')
->where('product_id', $id)
->from('Products_Installs')
->join('Files', 'Files.id = Products_Installs.file_id')
->get();
// Add categories to category object
foreach($install_images->result() as $pImage)
{
$data[] = array(
'id' => $pImage->file_id,
'src' => $pImage->src,
'title' => $pImage->title,
'alt' => $pImage->alt
);
}
// Make sure data exists
if(!isset($data))
{
$data = array();
}
return $data;
}
So again really just looking on advice on what is the most efficient, best practice way of doing this. I really appreciate any advice, or information.
I think your approach is correct. There are only a couple of options: 1) load your product list first, then loop, and load required data for each product row. 2) create a big join on all tables first, then loop through (possibly massive) cartesian product. The second might get rather ugly to parse. For example, if you got Product A and Product B, and Product A has Install 1, Install 2, Install 3, and product B has Install 1, and Install 2,t hen your result is
Product A Install 1
Product A Install 2
Product A Install 3
Product B Install 1
Product B Install 2
Now, add your images and categories to the join and it might become huge.
I am not sure what the sizes of your tables are but returning 45 rows shouldn't take long. The obvious thing to ensure (and you probably did that already) is that product_id is indexed in all tables as well as your brands_products tables and others. Otherwise, you'll do a table scan.
The next question is how you're displaying your data on the screen. So you're getting all products. Do you need to load categories, images, installs when you're getting a list of products? If you're simply listing products on the screen, you might want to wait to load that data until user picks a products they are viewing.
On a side note, any reason you're converting your array to object
$r[] = (object)$data;
Also, in the second function, you can simply add
$data = array();
before the foreach, instead of
// Make sure data exists
if(!isset($data))
{
$data = array();
}
You can try this:
Query all of the products
Get all of the product IDs from step 1
Query all of the install images that has a product ID from step 2, sorted by product ID
Iterate through the products from step 1, and add the results from step 3
That takes you from 46 queries (for 45 products) to 2 queries, without any additional joins.
You can also use CodeIgniter's Query Caching to increase performance even further, if it's worth the time to write the code to reset the cache when data is updated.
Doing more work in PHP is generally better than doing the work in MySQL in terms of scalability. You can scale PHP easily with load balancers and web servers. Scaling MySQL isn't as easy due to concurrency issues.