How many queries are too many? - mysql

I have to run one time 10 mysql queries for one person in one page. Is it very bad? I have quite good hosting, but still, can it break or something? Thank you very much.

Drupal sites typically make anywhere from 150 to 400+ queries per request. The total time spent querying the database is still under 1s - it's not the number that kills the server, but the quality/complexity of the queries (and possibly the size of the dataset they search through).
I can't tell what queries you're talking about but on most sites 10 is not much at all.
If you're concerned with performance, you can always see how long your queries take to execute in a database management program, such as MySQL Workbench.

10 fast queries can be better than 1 slow one. Define what's acceptable in terms of response time, throughput, in normal and peek traffic conditions, and measure if these 10 queries are a problem or not (i.e. don't respect your expectations).
If they are, then try to change your design and find a better solution.

How many queries are too many?
I will rephrase your question:
Is my app fast enough?
Come up with a business definition of "fast enough" for your application (based on business/user requirements), come up with a way to model all your usage scenarios and expected load, create simulations of that load and profile (trace/time) it.
This approach amounts to an educated guess. Anything short of it is pure speculation, and worthless.
If your application is already in production, and is working well in most cases, you can get feedback from users to determine pain points. From there, you can model those pain points and corresponding load, and profile.
Document your results. Once you make improvements to your application, you have a tool to determine if the optimizations you made achieved your goals.

When new to development as I assume you are. I recommend focusing on the most logical and obvious way to avoid over-processing. That is usually the avoidance of repeating a query by caching its first execution and checking for cached results before running queries.
After that don't spend too much time thinking about the number of queries and focus on well-written code. That means a good use of classes, methods and functions. While still having much to learn, you do not want to over-complicate every interaction with the database.
Enjoy what you are doing and keep it neat. That will result in easier to debug code which in itself can lead to better performance when you have the knowledge to take your code further. The performance of an application can be improved very quickly if the original work is well-written.

It depends on how much CPU cycles will the sum of the queries use.
1 query can consume way more CPU cycles than 100. It all depends on their contents.
You could begin by optimizing them following this guide: http://beginner-sql-tutorial.com/sql-query-tuning.htm

I think its not a problem. 10 Queries are not so much for a site. Less is better no question but when you have 3000 - 5000 then you should think about your structure.
And when you go in one query through a table with millions of rows without an index then are 10 to much.
I have seen a Typo3 site with a lot of extensions that make 7500 requests with the cache. This happens when you install and install and don't look at what happens.
But you can look that you make logical JOIN's over the tables that you have less queries.

Well there are big queries and small trivial queries. Which ones are yours? Generally, you should try to fetch the data in as few queries as possible. The heavier the load is on the database server the harder it will be to serve the clients as the traffic increases.

Just to add a bit of a different perspective to the other good answers:
First, to concur, the type and complexity of queries you are making will matter more 99% of the time than the number of queries.
However, in the rare situation where there is high latency on the network path to your database server (i.e. the db server is remote or such, not saying this is a logical or sane setup, but I have seen it done) then you want to minimize the number of queries done, because every single time you talk to the database server the network transmission time will take an order of magnitude or two longer than it takes to compute the query. This situation can really kill your page loading times, and so you'd really want to minimize the number of queries (actually, you just want to change your server setup...).

Related

Multiple large MySQL SELECT queries - better to run in parallel or in a queue?

I have looked up answers to this question a bunch and couldn't find a specific answer - sorry in advance if I missed something! Also, I'm a SQL optimization noob.
I have an analytics dashboard which pulls data based on users' requests from a large database.
Each page the user loads runs a number of different queries to populate different parts of the page (different charts, tables, etc). Some of these pages can take quite some time to load as the user might request several years of data.
Currently, each part of the page pings off one SELECT query to the SQL server but as there are several parts of the page, those queries end up running in parallel.
Would it be faster to run these queries in a queue - to allow the server to process one query at a time? Or to keep everything in parallel, as is?
The added benefit of running them one at a time is that we could run the queries to fill in the "above-the-fold" part of the page first...
Hope that all makes sense and take it easy on me please :)
I also say "it depends", but I lean toward parallelism.
Probably should not have more parallelism than the number of CPU cores.
I rarely see a system that chews up all the CPU cores -- unless it does not have good enough indexes. That is, fix the indexes before asking the question.
If the data is bigger than can be cached, it may be faster to queue, since you may have a choke point -- I/O.
If the table(s) are continually being changed, turn off the Query Cache.
Is your goal to get some results on the page early (a likely Human Interface goal), add a small delay in all but one AJAX callee (not caller).
If multiple pages could be computing at the same time, things get more complex. For example, you can't really control the parallelism.
Let's see the queries. Perhaps we can speed them up enough to obviate the question.
There is no right answer to this question. Up to a point, running parallel SELECT queries is (generally) going to be faster than one running query. Whether that point is 2 queries or 200 depends on the nature of the queries, the hardware configuration, the data, and the speeds of various components.
The situation becomes even more complex when you consider how many different users may be involved and whether or not the data is being updated. You can get into really bad situations with parallel queries and updates if the locks start cascading. Of course, this can happen with multiple simultaneous users as well.
My guess is that you want a throttling mechanism that will run, say, n queries at a time and put the rest into a queue.

MySQL scale up or scale out?

I have been tasked with investigating reasons why our internal web application is hitting performance problems.
The web application itself is part written in PHP and part written in Perl, and we have a MySQL database which is where I believe the source of performance hit is occurring.
We have about 400 users of the system, of which, most are spread across different timezones, so generally there are only ever a max of 30 users online at any one time. The performance problems have crept up on us, particularly over the past year as the database keeps growing.
The system is running on one single 32-bit debian server - 6GB of RAM, with 8 x 2.4GHz intel CPU. This is probably not hefty enough for the job in-hand. However, even at times where I am the only user online, page loading time can still be slow.
I'm trying to determine whether we need to scale up or scale out. Firstly, I'd like to know is how well our hardware is coping with the demands placed upon it. And secondly, whether it might be worth scaling out and creating some replication slaves to balance the load.
There are a lot of tools available on the internet - probably a bit too many to investigate. Can anyone recommend any tools that can provide some profiling/performance monitoring that may help me on my quest.
Many thanks,
ns
Your slow-down seems to be related to the data and not to the number of concurrent users.
Properly indexed queries tend to scale logarithmically with the amount of data - i.e. doubling the data increases the query time by some constant C, doubling the data again by the same C, doubling again by the same C etc... Before you know it, you have humongous amounts of data, yet your queries are just a little slower.
If the slow-down wasn't as gradual in your case (i.e. it was linear to the amount of data, or worse), this might be an indication of badly optimized queries. Throwing more iron at the problem will postpone it, but unless you have unlimited budget, you'll have to actually solve the root cause at some point:
Measure the query performance on the actual data to identify slow queries.
Examine the execution plans for possible improvements.
If necessary, learn about indexing, clustering, covering and other performance techniques.
And finally, apply that knowledge onto queries you have identified in steps (1) and (2).
If nothing else helps, think about your data model. Sometimes, a "perfectly" normalized model is not the best performing one, so a little judicial denormalization might be warranted.
The easy (lazy) way if you have budget is just to throw some more iron at it.
A better way would be, before deciding where or how to scale, would be to identify the bottlenecks. Is it every page load that is slow? Or just particular pages? If it is just a few pages then invest in a profiler (for PHP both xDebug and the Zend Debugger can do profiling). I would also (if you haven't) invest in a test system that is as similar as possible to the live system to run diagnostics.
You could also look at gathering some stats; either at server level with a program such as sar (from the sysstat package and also at the db level (have you got the slow query log running?).

How significant is the "amount of queries used" to site performance/speed?

I've been wondering how significant is the amount of queries used to site performance/speed.
I have two websites, running with different engines (one with IPB and another with MyBB). The IPB one has less queries used (only 14 on average), but it runs slower than MyBB with more queries used (average on 20).
I thought this is because the IPB is heavily-modded, so I run a fresh-install on my localhost.
But it still results the same. The IPB (which has less queries) runs slower than MyBB (with more queries).
This makes me wonder, so how is queries used affecting the site performance? Is it significant?
Well quantity of queries is one factor. But you also have to consider each individual query, does it do joins? Lots of Math? Queries within queries? Do the tables have indexes etc.
Take a look at these links on optimisation, knowing how to optimise something can tell you what can cause it to slow down.
http://www.fiftyfoureleven.com/weblog/web-development/programming-and-scripts/mysql-optimization-tip
http://msdn.microsoft.com/en-us/library/ff650689.aspx
http://hungred.com/useful-information/ways-optimize-sql-queries/
I believe this question has many answers. There are many things to consider. For example, querying a db which will return A LOT of lines will take more time than a DB which has ust a few records. Another thing to consider is the number of queries, just as you are doing within your post. Another case i can think of is how you do the queries. If you are using jquery or javascript ajax calls to make the queries they will take a lot more time.
Sheer number of queries per page does not correlate with site responsiveness and speed.
What does matter is:
What those queries actually are?
How well DB server can cope with the load that they generate?
Are they executed serially or in parallel?
It depends on what You consider 'significant'.
Let's consider two scenarios:
We have 'lots' of queries that take quite some time to execute
It's bad... Just because it takes much time to process them all
We have 'lots' of queries that take very little time to execute
If Your database server is on different machine as web server it might be a problem due to communication overhead. Web server and database server will most probably spend more time on communication then on processing each query (think of network latency).
If Your database server is on the same machine as web server it might not affect site performance much as communication between web server and database server will be very quick. BUT there are other things to consider. For example You might be locking some tables for update/select queries A LOT and it will decrease site performance considerably.
It's always better to execute less queries.
You should remember that on an average website ~80% is about frontend performance and only 20% about backend performance.
So for website performance it is in most cases more relevant to optimize frontend performance. Here you can make the big points quickly.
Well, of course there are scenarios with sites heavily displaying LOTS of data coming from a DB. Here it is worth to think about optimizing backend performance. And optimizing backend performance means optimizing database stuff in most cases.
A good book about this is "High Performance Web Sites" from Steve Souders.

How many MySql queries/second can be handled by a server?

I've started developing a browser (database) game. My question is how many queries can a regular hosting handle (when I mean regular, I mean a shared hosting you cand find for about 7$/month).
As for the queries, nothing complicated (simple SELECT and WHERE operations).
So... ? 10? 100 ? 10000?
This is completely dependant on the server hardware, it's caching ability and configuration, and the type of hardware it uses for non-volatile storage (e.g., a RAID array of hard drives with spindles or SSDs?), not to mention the type of query and database being queried, including:
Number of joins
Indexes
Number of rows in the tables queried
Size of the result set
Concurrent load
etc...
Without knowing all of these factors, it is impossible to estimate performance. The best estimate comes from actual profiling, performed under normal operating conditions with the type of queries that will actually be presented.
Yoshinori Matsunobu in one of his articles claims 105,000 queries per second using SQL, and 750,000 queries per second using native InnoDB API.
All queries are simple PK lookups.
On a shared hosting these numbers will of course be much lower. How much exactly of course depends on the shared hosting.
Many factors can influence the response time of a database. Hardware, application configuration, (mysql out of the box does not perform all that well), and last but not least, your coding!
Badly written queries can bring make an app feel slow and sluggish. Using count(*) in your code, for a very trivial example, or having no indexes on the database, for example, will influence your db response time as your dataset grows.

How easy (or otherwise) is it to to tune a database AFTER 'going LIVE'?

It is looking increasingly like I'll have to go live before I have had the time to tweak all the queries/tables etc, before I go live with a website (already 6 months behind schedule, so all though this is not the ideal scenario - thats how things are).
Its now a case of having to bite the bullet. Its just a case of trying to work out how big that bullet will be when we come to 'biting it'. Once the databse goes live obviously we cant change the data on a whim, because its live data. I am fairly confident on the most of db schema - for e.g. the tables are in most 3 and 4th normal form, and constraints are used to ensure data integrity. I have also put in some indexes on some column that (I think) will be used a lot in queries though this was done quite hurridly and not tested - this is the bit I am worried about.
To clarify, I am not talking about wholesale structure change. The tables themselves are unlikely to change (if ever), however it is almost guaranteed that I will have to tune the tables at some stage (either personally or by hiring someone).
I want to know how much of a task this is. Specifically, assuming a database of a few gigabytes (so far roughly 300 tables)
Assuming 50% of the tables need tuning in the next few months:
How long will it take to perform the tuning (I know this is a "how long is a piece of string" type question) - but what are the main determinants of the effort required, so I can work out how long it is likely to take?
Is it possible to either lock sections of the database, (or specific tables) whilst the indexes are being reworked, or does the entire databse need to go offline? (I am using mySQL 5.x as the db)
Is what I describe (going live before ALL tables perfectly tuned) outrageously risky/inadvisable ? (Does it justify the months of sleepless nights this has caused me so far) ?
In general it is much harder to fix a poor database design that is causing performance issues after going live becasue you have to deal with the existing records. Even worse, the poor design may not become apparent until months after going live when there are many records instead of a few. This is why databses should be designed with performance in mind (no this is not premature optimization, there are known techniques which generally perform better than other techniques and they shoulod be considered inthe design) and databases should be tested against a test set of records that is close to or more than the expected level of records you would have after a couple of years.
As to how long it will take to completely fix a badly designed database, months or years. Often the worst part is something that is central to the design (like say an EAV table) and which will require almost every query/sp/view. UDF to be adjusted to move to a better structure. You then have to ensure all records are moved to the new better structure. The sooner you can fix a mistake like this the better. Far better to move a couple of thousand records to a new structure than 100,000,000.
If your structure is ok but your queries are bad, you are better off as you can take the top ten worst performing (Choose based not just on total time to run but time X no of times run) and fix, rinse and repeat.
If you are in the midst of fixing a poor database, this book might come in handy:
http://www.amazon.com/Refactoring-Databases-Evolutionary-Database-Design/dp/0321293533/ref=sr_1_1?ie=UTF8&s=books&qid=1268158669&sr=8-1
I would try at least to quantify the limits of the database before going live, so that at least you would know when the activity generated from your application is getting near to that threshold.
You may want to simulate (automatically as much as possible) the typical usage of the database from your application, and check how many concurrent users/sessions/transactions, etc it can handle before it breaks. This, at least, should allow you to solve the "sleepless nights" issue.
As for the original "How easy is it...?" question, the answer obviously depends on many factors. However, the above analysis would undoubtedly help, as at the very least you will be in a position to say whether your database requires tweaking or not.
To answer the title question, I'd say it's fairly easy to tune your DB after deploying into Production.
It's a great idea to be improving performance after deploying to any environment. Being Production adds a bit of pressure, along with the schedule. I'd suggest deploying to Prod, and let it perform as it will. Then start measuring:
how long to run Report X in different times (peak vs after-hours, if there is such a concept in your app).
what's the user's experience when using the app for those critical use-cases?
Then take a backup of your Prod environment, and create yourself a pre-Prod environment. There you'll be able to run your upgrade scripts to be able to measure the 'how long' type questions you have. Index creation, upgrade down-times, etc. When tuning queries, etc, you'll have a great idea of how it performs with production data & volumes. Granted, you won't have the benefits of having those users performing those inserts at the same time.
Keep that backup for multiple iterations, failed upgrades, new/unprepared-for issues, etc.
Keep making backups after each deployment, so that you can test the next round of improvements to your DB.
It depends on what you're tuning. Let's say you're adding an index to a couple tables, or changing a table type from MyISAM to InnoDB or something, then with a large enough table, those things could be done in 5 to 10 minutes depending on your hardware. It won't take hours. That said, it's still best to do any live-db tuning in the middle of the night.
You can grab a read lock by calling FLUSH TABLES WITH READ LOCK but it's probably better to put up a "we're doing maitenance" message in your app for the 15-30 mins you're doing it, just to be safe.
The risk is inherent to the situation and what happens if there are serious problems. I usually take a more cowboy approach and take stuff live, especially if they aren't under high load so I can easily find pain points and fix them. If this is a mission critical system, then no, load test or whatever you can first to be sure you're as ready as you can be. Also, keep in mind that you cannot foresee all the issues you'll have. If your indexes are good, then you're probably okay to take it live and see what needs to be worked on.