Multiple large MySQL SELECT queries - better to run in parallel or in a queue? - mysql

I have looked up answers to this question a bunch and couldn't find a specific answer - sorry in advance if I missed something! Also, I'm a SQL optimization noob.
I have an analytics dashboard which pulls data based on users' requests from a large database.
Each page the user loads runs a number of different queries to populate different parts of the page (different charts, tables, etc). Some of these pages can take quite some time to load as the user might request several years of data.
Currently, each part of the page pings off one SELECT query to the SQL server but as there are several parts of the page, those queries end up running in parallel.
Would it be faster to run these queries in a queue - to allow the server to process one query at a time? Or to keep everything in parallel, as is?
The added benefit of running them one at a time is that we could run the queries to fill in the "above-the-fold" part of the page first...
Hope that all makes sense and take it easy on me please :)

I also say "it depends", but I lean toward parallelism.
Probably should not have more parallelism than the number of CPU cores.
I rarely see a system that chews up all the CPU cores -- unless it does not have good enough indexes. That is, fix the indexes before asking the question.
If the data is bigger than can be cached, it may be faster to queue, since you may have a choke point -- I/O.
If the table(s) are continually being changed, turn off the Query Cache.
Is your goal to get some results on the page early (a likely Human Interface goal), add a small delay in all but one AJAX callee (not caller).
If multiple pages could be computing at the same time, things get more complex. For example, you can't really control the parallelism.
Let's see the queries. Perhaps we can speed them up enough to obviate the question.

There is no right answer to this question. Up to a point, running parallel SELECT queries is (generally) going to be faster than one running query. Whether that point is 2 queries or 200 depends on the nature of the queries, the hardware configuration, the data, and the speeds of various components.
The situation becomes even more complex when you consider how many different users may be involved and whether or not the data is being updated. You can get into really bad situations with parallel queries and updates if the locks start cascading. Of course, this can happen with multiple simultaneous users as well.
My guess is that you want a throttling mechanism that will run, say, n queries at a time and put the rest into a queue.

Related

MySQL scale up or scale out?

I have been tasked with investigating reasons why our internal web application is hitting performance problems.
The web application itself is part written in PHP and part written in Perl, and we have a MySQL database which is where I believe the source of performance hit is occurring.
We have about 400 users of the system, of which, most are spread across different timezones, so generally there are only ever a max of 30 users online at any one time. The performance problems have crept up on us, particularly over the past year as the database keeps growing.
The system is running on one single 32-bit debian server - 6GB of RAM, with 8 x 2.4GHz intel CPU. This is probably not hefty enough for the job in-hand. However, even at times where I am the only user online, page loading time can still be slow.
I'm trying to determine whether we need to scale up or scale out. Firstly, I'd like to know is how well our hardware is coping with the demands placed upon it. And secondly, whether it might be worth scaling out and creating some replication slaves to balance the load.
There are a lot of tools available on the internet - probably a bit too many to investigate. Can anyone recommend any tools that can provide some profiling/performance monitoring that may help me on my quest.
Many thanks,
ns
Your slow-down seems to be related to the data and not to the number of concurrent users.
Properly indexed queries tend to scale logarithmically with the amount of data - i.e. doubling the data increases the query time by some constant C, doubling the data again by the same C, doubling again by the same C etc... Before you know it, you have humongous amounts of data, yet your queries are just a little slower.
If the slow-down wasn't as gradual in your case (i.e. it was linear to the amount of data, or worse), this might be an indication of badly optimized queries. Throwing more iron at the problem will postpone it, but unless you have unlimited budget, you'll have to actually solve the root cause at some point:
Measure the query performance on the actual data to identify slow queries.
Examine the execution plans for possible improvements.
If necessary, learn about indexing, clustering, covering and other performance techniques.
And finally, apply that knowledge onto queries you have identified in steps (1) and (2).
If nothing else helps, think about your data model. Sometimes, a "perfectly" normalized model is not the best performing one, so a little judicial denormalization might be warranted.
The easy (lazy) way if you have budget is just to throw some more iron at it.
A better way would be, before deciding where or how to scale, would be to identify the bottlenecks. Is it every page load that is slow? Or just particular pages? If it is just a few pages then invest in a profiler (for PHP both xDebug and the Zend Debugger can do profiling). I would also (if you haven't) invest in a test system that is as similar as possible to the live system to run diagnostics.
You could also look at gathering some stats; either at server level with a program such as sar (from the sysstat package and also at the db level (have you got the slow query log running?).

How significant is the "amount of queries used" to site performance/speed?

I've been wondering how significant is the amount of queries used to site performance/speed.
I have two websites, running with different engines (one with IPB and another with MyBB). The IPB one has less queries used (only 14 on average), but it runs slower than MyBB with more queries used (average on 20).
I thought this is because the IPB is heavily-modded, so I run a fresh-install on my localhost.
But it still results the same. The IPB (which has less queries) runs slower than MyBB (with more queries).
This makes me wonder, so how is queries used affecting the site performance? Is it significant?
Well quantity of queries is one factor. But you also have to consider each individual query, does it do joins? Lots of Math? Queries within queries? Do the tables have indexes etc.
Take a look at these links on optimisation, knowing how to optimise something can tell you what can cause it to slow down.
http://www.fiftyfoureleven.com/weblog/web-development/programming-and-scripts/mysql-optimization-tip
http://msdn.microsoft.com/en-us/library/ff650689.aspx
http://hungred.com/useful-information/ways-optimize-sql-queries/
I believe this question has many answers. There are many things to consider. For example, querying a db which will return A LOT of lines will take more time than a DB which has ust a few records. Another thing to consider is the number of queries, just as you are doing within your post. Another case i can think of is how you do the queries. If you are using jquery or javascript ajax calls to make the queries they will take a lot more time.
Sheer number of queries per page does not correlate with site responsiveness and speed.
What does matter is:
What those queries actually are?
How well DB server can cope with the load that they generate?
Are they executed serially or in parallel?
It depends on what You consider 'significant'.
Let's consider two scenarios:
We have 'lots' of queries that take quite some time to execute
It's bad... Just because it takes much time to process them all
We have 'lots' of queries that take very little time to execute
If Your database server is on different machine as web server it might be a problem due to communication overhead. Web server and database server will most probably spend more time on communication then on processing each query (think of network latency).
If Your database server is on the same machine as web server it might not affect site performance much as communication between web server and database server will be very quick. BUT there are other things to consider. For example You might be locking some tables for update/select queries A LOT and it will decrease site performance considerably.
It's always better to execute less queries.
You should remember that on an average website ~80% is about frontend performance and only 20% about backend performance.
So for website performance it is in most cases more relevant to optimize frontend performance. Here you can make the big points quickly.
Well, of course there are scenarios with sites heavily displaying LOTS of data coming from a DB. Here it is worth to think about optimizing backend performance. And optimizing backend performance means optimizing database stuff in most cases.
A good book about this is "High Performance Web Sites" from Steve Souders.

How many queries are too many?

I have to run one time 10 mysql queries for one person in one page. Is it very bad? I have quite good hosting, but still, can it break or something? Thank you very much.
Drupal sites typically make anywhere from 150 to 400+ queries per request. The total time spent querying the database is still under 1s - it's not the number that kills the server, but the quality/complexity of the queries (and possibly the size of the dataset they search through).
I can't tell what queries you're talking about but on most sites 10 is not much at all.
If you're concerned with performance, you can always see how long your queries take to execute in a database management program, such as MySQL Workbench.
10 fast queries can be better than 1 slow one. Define what's acceptable in terms of response time, throughput, in normal and peek traffic conditions, and measure if these 10 queries are a problem or not (i.e. don't respect your expectations).
If they are, then try to change your design and find a better solution.
How many queries are too many?
I will rephrase your question:
Is my app fast enough?
Come up with a business definition of "fast enough" for your application (based on business/user requirements), come up with a way to model all your usage scenarios and expected load, create simulations of that load and profile (trace/time) it.
This approach amounts to an educated guess. Anything short of it is pure speculation, and worthless.
If your application is already in production, and is working well in most cases, you can get feedback from users to determine pain points. From there, you can model those pain points and corresponding load, and profile.
Document your results. Once you make improvements to your application, you have a tool to determine if the optimizations you made achieved your goals.
When new to development as I assume you are. I recommend focusing on the most logical and obvious way to avoid over-processing. That is usually the avoidance of repeating a query by caching its first execution and checking for cached results before running queries.
After that don't spend too much time thinking about the number of queries and focus on well-written code. That means a good use of classes, methods and functions. While still having much to learn, you do not want to over-complicate every interaction with the database.
Enjoy what you are doing and keep it neat. That will result in easier to debug code which in itself can lead to better performance when you have the knowledge to take your code further. The performance of an application can be improved very quickly if the original work is well-written.
It depends on how much CPU cycles will the sum of the queries use.
1 query can consume way more CPU cycles than 100. It all depends on their contents.
You could begin by optimizing them following this guide: http://beginner-sql-tutorial.com/sql-query-tuning.htm
I think its not a problem. 10 Queries are not so much for a site. Less is better no question but when you have 3000 - 5000 then you should think about your structure.
And when you go in one query through a table with millions of rows without an index then are 10 to much.
I have seen a Typo3 site with a lot of extensions that make 7500 requests with the cache. This happens when you install and install and don't look at what happens.
But you can look that you make logical JOIN's over the tables that you have less queries.
Well there are big queries and small trivial queries. Which ones are yours? Generally, you should try to fetch the data in as few queries as possible. The heavier the load is on the database server the harder it will be to serve the clients as the traffic increases.
Just to add a bit of a different perspective to the other good answers:
First, to concur, the type and complexity of queries you are making will matter more 99% of the time than the number of queries.
However, in the rare situation where there is high latency on the network path to your database server (i.e. the db server is remote or such, not saying this is a logical or sane setup, but I have seen it done) then you want to minimize the number of queries done, because every single time you talk to the database server the network transmission time will take an order of magnitude or two longer than it takes to compute the query. This situation can really kill your page loading times, and so you'd really want to minimize the number of queries (actually, you just want to change your server setup...).

How many MySql queries/second can be handled by a server?

I've started developing a browser (database) game. My question is how many queries can a regular hosting handle (when I mean regular, I mean a shared hosting you cand find for about 7$/month).
As for the queries, nothing complicated (simple SELECT and WHERE operations).
So... ? 10? 100 ? 10000?
This is completely dependant on the server hardware, it's caching ability and configuration, and the type of hardware it uses for non-volatile storage (e.g., a RAID array of hard drives with spindles or SSDs?), not to mention the type of query and database being queried, including:
Number of joins
Indexes
Number of rows in the tables queried
Size of the result set
Concurrent load
etc...
Without knowing all of these factors, it is impossible to estimate performance. The best estimate comes from actual profiling, performed under normal operating conditions with the type of queries that will actually be presented.
Yoshinori Matsunobu in one of his articles claims 105,000 queries per second using SQL, and 750,000 queries per second using native InnoDB API.
All queries are simple PK lookups.
On a shared hosting these numbers will of course be much lower. How much exactly of course depends on the shared hosting.
Many factors can influence the response time of a database. Hardware, application configuration, (mysql out of the box does not perform all that well), and last but not least, your coding!
Badly written queries can bring make an app feel slow and sluggish. Using count(*) in your code, for a very trivial example, or having no indexes on the database, for example, will influence your db response time as your dataset grows.

Is a good idea to build in-memory indexes and circumvent the DB when operating intensively on a small subset?

I'm working on a program to automatically find optimal shift assignments, subject to lots of constraints. I'm using grails, i.e. the data about workers, shifts and assignments will be kept in a DBMS.
For the optimization itself, I'll have to work very intensively on a small subset of the data (about 600 rows total from about 5 different tables). I'll have to iterate over and search through various sub-subsets dozens of times to compute fitness functions, change some values, compute fitness again, lather, rinse, repeat, perhaps hundreds of times.
Now, while searching and iteration are exactly what a DBMS is for, I believe that in this case the overhead of hundreds of DB requests would dwarf the actual work being done, even for an in-memory DBMS like HSQLDB. So instead, I'm planning to slurp the entire subset into memory at the beginning, build my own indexes (HashMap, mainly) for the lookups I'll have to do, and then work only with those, staying away from the DB until I'm done and write my result to it.
Is this a sound approach? Any better ideas?
I'm assuming you must issue hundreds of commands to the database? There's no way to execute the code inside the DB?
The main thing I'd be worried about is integrity; make sure you handle locking correctly. You'd probably want a version number stored somewhere so you don't need to lock the entire set of data for the duration of processing. In the update transaction, you'd first ensure the version number is the same as when you started reading.
Finally, benchmark it? I've done some apps over the last year or so that had a similar very intensive compute process per request. Using in-process objects to represent the data was orders of magnitude more efficient than hitting the database per request. But every app is different and there might be things not considered that'll impact it.