Closed. This question is opinion-based. It is not currently accepting answers.
Closed 4 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I'm new to MySQL and something that's quickly becoming obvious to me is that it feels considerably easier to create several database queries per page as opposed to a few of them.... but I don't really have a feel for how many queries might be too many, or at what point I should invest more precious time to combining queries, spending time figuring out clever joins, etc.
I'm therefore wondering if there are some kind of "mental benchmarks" experienced folks here use with regard to number of queries per page, and if so, how many might be too many?
I understand that the correct answer in any context is related to what's needed to satisfy an application's functional requirements. However, on projects where client requirements may be flexible or not properly set, or on projects where you as the developer have full control (e.g. sites you develop for yourself), you may be able to negotiate between functionality and performance... basically, to just cut trivial features if coding requirements impact performance and you're unable to optimise it any further.
I would appreciate any views on this.
Thanks
There's no set number, "page" is arbitrary enough - one could be doing one database task while another could have 2 dozen widgets each with their own task.
One good rule of thumb though: the moment you put a SELECT inside a loop that's processing rows of another SELECT, stop. It might seem fast enough early on, but data tends to grow and those nested loops will grow exponentially with it, so expect it to become a bottleneck at some point. Even if the single query ends up being significantly slower, you'll be better off in the long run (and there's always stored procs, query caching, etc).
It depends how often the page is used, the latency between the app server and database server, and a lot of other factors.
For a page which only displays data, my gut feeling is that 100 is too many. However, there are some cases where that may be acceptable.
In practice you should only optimise where necessary, which means you optimise the pages that people use the most, and ignore the minor ones.
In particular, the pages are not available to the public and the (few) authorised users hardly ever use them, there is no incentive to make them faster.
If there is a real performance problem which you believe comes from having too many queries, enable the general query log (which may make performance worse, I'm afraid) and analyse the most common queries with a view to eliminating them.
You might find that there are some "low hanging fruits" - simple queries on rarely changing data which are called on most popular pages, which you can easily eliminate (for example, have your app server fetch the data on a cron job into a local file and read it from there). Or even "lower hanging fruits" like queries which are completely unnecessary.
The difficulty with trying to combine multiple queries is that it tends to go against code-reuse and code maintainability, so you should only do it if it is ABSOLUTELY necessary; it doesn't sound like you have enough data yet to make that determination.
Related
I have looked up answers to this question a bunch and couldn't find a specific answer - sorry in advance if I missed something! Also, I'm a SQL optimization noob.
I have an analytics dashboard which pulls data based on users' requests from a large database.
Each page the user loads runs a number of different queries to populate different parts of the page (different charts, tables, etc). Some of these pages can take quite some time to load as the user might request several years of data.
Currently, each part of the page pings off one SELECT query to the SQL server but as there are several parts of the page, those queries end up running in parallel.
Would it be faster to run these queries in a queue - to allow the server to process one query at a time? Or to keep everything in parallel, as is?
The added benefit of running them one at a time is that we could run the queries to fill in the "above-the-fold" part of the page first...
Hope that all makes sense and take it easy on me please :)
I also say "it depends", but I lean toward parallelism.
Probably should not have more parallelism than the number of CPU cores.
I rarely see a system that chews up all the CPU cores -- unless it does not have good enough indexes. That is, fix the indexes before asking the question.
If the data is bigger than can be cached, it may be faster to queue, since you may have a choke point -- I/O.
If the table(s) are continually being changed, turn off the Query Cache.
Is your goal to get some results on the page early (a likely Human Interface goal), add a small delay in all but one AJAX callee (not caller).
If multiple pages could be computing at the same time, things get more complex. For example, you can't really control the parallelism.
Let's see the queries. Perhaps we can speed them up enough to obviate the question.
There is no right answer to this question. Up to a point, running parallel SELECT queries is (generally) going to be faster than one running query. Whether that point is 2 queries or 200 depends on the nature of the queries, the hardware configuration, the data, and the speeds of various components.
The situation becomes even more complex when you consider how many different users may be involved and whether or not the data is being updated. You can get into really bad situations with parallel queries and updates if the locks start cascading. Of course, this can happen with multiple simultaneous users as well.
My guess is that you want a throttling mechanism that will run, say, n queries at a time and put the rest into a queue.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
First of all, I'm not a very experienced developer, I'm making mid-size apps in PHP, MySQL and Javascript.
There is something though which is making it hard for me to design a MySQL InnoDB database before each project. And that is the performance. I'm always quite worried about if I'm creating a normalized database scheme that when I'll have to join a couple of tables (like 5-6) together (there are usually a few many-to-many, many-to-one relationships between them) it will affect the performance a LOT (in negative) when each of these 5-6 tables has around 100k rows.
These projects that I usually have is creating analytics platforms. Therefore I'm expecting around 100M of clicks in total and I usually have to join this table to many others (each around 100k of rows) to get some data displayed. I'm usually making summarized tables of the clicks but cannot do the same for the other tables.
I'm not quite sure if I have to worry about future performance in this stage. Currently, I am actively managing a few of these applications with 30M+ clicks and tables that I join to this Clicks table with 40k+ rows. The performance is pretty bad - a select operation usually takes more than 10-20s to complete while I believe I have proper indexing, innodb_buffer_pool_size also.
I've read a lot about the key to having an optimized database is the design. That's why I'm usually thinking about the DB scheme a LOT before creating it.
Do I really have to worry about creating DB schemes where I'll have to Join 5-6 many-to-many/many-to-one/one-to-many tables or it's quite usual and MySQL should be able to easily handle this load?
Is there anything else that I should consider before creating a DB scheme?
My usual server setup is having a MySQL Server with 4GB RAM + 2 vCPUs, to serve the DB and a WebServer with 4GB RAM + 2 vCPUs. Both of them are using Ubuntu's 16.04 release and using the latest MySQL (5.7.21) and PHP7-fpm.
Gordon is right. RDBMSs are made to handle your kind of workload.
If you're using virtual machines (cloud, etc) to host your stuff, you can generally increase your RAM, vCPU count, and IO capacity simply by spending more money. But, usually, throwing money at DBMS peformance problems is less helpful than throwing better indexes at them.
At the scale of 100M rows, query performance is a legitimate concern. You will, as your project develops, need to revisit your DBMS indexing to optimize the queries you're actually using. So plan on that. The thing is, you cannot and will not know until you get lots of data what your actual performance issues will be.
Read this for a preview of what's coming: https://use-the-index-luke.com/ .
One piece of advice: partitioning of tables generally doesn't solve performance problems except under very specific circumstances.
Look up this acronym: YAGNI.
And go do your project. Spend your present effort getting it working.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm doing research before I create my social network database and I've found a lot of questions/resources pertaining to graph and key-value databases for social networks. I understand there are a TON of different options and ways to implement the DB. I also understand that what the big companies do is complex and way above what I currently need (1b+ users). I also know each of the big companies have revamped their databases to account for the insane scaling they go through.
Because I don't know how the network will grow, and I don't believe I can accurately create a model that will scale to 1m users (due to unknowns such as how people will use it, how often people post, comment, etc). But I can at least try to create a database that will be easiest to scale when (if) the need arises.
Do most companies create a database to handle up to 1k users, then once they grow, they revamp it for 10k users, then 100k, etc? If they do, at each of these arbitrary numbers (because of the unknowns listed above), do companies typically change a few tables/nodes/etc, or do they completely recreate the database to take advantage of new technologies (such as moving from SQL to graph)?
I want to pick the best solution, but I'm finding the decision between graph, key-value, SQL, among others very difficult--especially with no data to know what relationships/data is most important. I believe I can create a solid system using a graph that can support up to 10k users, but I'm worried having to potentially completely reacreate the database as the system grows. Is this a worry now to avoid issues, or implement now and adapt later type problem?
Going further, if I do need to plan on complete DB restructures, does it typically make sense to use a Multi-Model NoSQL DBMS (such as OrientDB or ArangoDB)?
I personally think you are asking premature questions.
Seriously, even with a bad model, a database can handle 10k users.
You think about scaling, but the hardest problem is not scaling, it is to come to the point where you need to scale.
I'm sure everybody wants 1bn users, but then you are already dreaming about having a social network with 200 times more users than Github itself ? (Github has ~ 5 million users).
Also, even by thinking it ahead, you will refactor and refactor again definitely during years, and you will have more than one persistence layer, be sure of it.
Code and code good, stay lean, remain able to change quickly, deploy, show to users, refactor, test, deploy and show to users in the same day. These are the things you need to do now, not asking questions about a problem you don't have yet, you definitely have a lot of other problems to solve now ;-)
UPDATE
Based on your comment, you might need to think that there are questions we just can not simply answer, because we don't need your exact requirements.
I have a simple app, which uses 4 persistence layers, and this app is not yet online. I'll give you my "why" about using it and which use case :
Neo4j : it is the core of the application data, I use it because I love it, I know it very much (it is my job) and, as the concept of the app is quite new and can evolve rapidly, having a schemaless db is reducing a lot of the refactoring stuff. Also I have now a lot of use cases coming by building the app, which make Neo4j a good choice when you need to add features without breaking what has already been done.
MySQL
I use it for User accounts and profiles. Why ? Because the framework I use already has a lot of bundles integrating this kind of stuff in a couple of lines of code, the bundles are well maintained and if I would use (currently) neo4j for it, I will have to reinvent the wheel. Also all the modules I use evolve in stability and compatibility with the framework.
Of course the mysql data is coupled (minimally) with the neo4j one. But I know that this kind of data will not evolve that much, so Mysql is a good choice and in case I have to refactor some points, this will not be a huge pain.
Redis
I use Redis for storing analytics data, Redis is quite flexible and I can easily create new keys and add data on top of it.
RabbitMQ :
I use a lot of message queues, why ? For testing refactoring. I can easily process messages with multiple consumers for testing "refactoring", testing mutliple database layers while the app is running for testing changes, testing new features, testing refactoring, ...
You will refactor ! Just try to keep it as simple as possible.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Hi i recently came across a situation where i am asked to optimize data model for one of our client for there already developed and running product.The main reason for doing this exerciser is, the product suffers from performance slowness due to too many locks and too many slow running queries.As i am not a DBA, looking at first site to the data model and doing some tracing of queries, i realize that the whole data model suffers from improper design and storage.The database is MySQl 5.6 and we are running InnoDB engine on that.
I want to know that is there any tool out there which can analyze the whole data model and can point out to possible issues including data structure definitions,indexes and other stuffs?
I tried lots of profiling tools including Mysql Workbench,Mysql Enterprise Monitor(paid version),jet profiler but they all are seems to be limited to identifying slow queries only. What i am interested in a tool which can analyze the existing data model and report problems with it and possible solutions for the same
You can not look at the data model in isolation. You need to consider the data model together with the requirements and the actual data access/update patterns.
I recomend you identify the top X slowest queries and perform your root-cause analysis.
Make sure you focus on the parts of the application that matters, i.g. the performance problems that negatively affects the usefulness of the application.
And by data access/update patterns I mean for example:
High vs low nr of concurrent access
Mostly reads or updates?
Single record reads vs reading large nr of records at once
Access is evenly spread out during the day or in bulk at certain times
Random access (every record is likely to be selected or updated) vs mostly recent records
Are all tables equally used or some are more used than others
Are all columns of all tables read at once? Or are there clusters of columns that are used together?
What tables are frequently used together?
The slow queries are the most important to look at. Show us a few of them, together with SHOW CREATE TABLE, EXPLAIN, and how big the tables are.
Also, how many queries per second are you running?
SHOW VARIABLES LIKE '%buffer%';
There are no such tools as those you are looking for, so I guess you'll have to do your homework, proposing another data model that "follows the rules".
You could begin by getting familiar with the first three normal forms.
You could also try to detect SQL antipatterns (there are books talking about these) in your database. This should give you some leads to work on.
I have to run one time 10 mysql queries for one person in one page. Is it very bad? I have quite good hosting, but still, can it break or something? Thank you very much.
Drupal sites typically make anywhere from 150 to 400+ queries per request. The total time spent querying the database is still under 1s - it's not the number that kills the server, but the quality/complexity of the queries (and possibly the size of the dataset they search through).
I can't tell what queries you're talking about but on most sites 10 is not much at all.
If you're concerned with performance, you can always see how long your queries take to execute in a database management program, such as MySQL Workbench.
10 fast queries can be better than 1 slow one. Define what's acceptable in terms of response time, throughput, in normal and peek traffic conditions, and measure if these 10 queries are a problem or not (i.e. don't respect your expectations).
If they are, then try to change your design and find a better solution.
How many queries are too many?
I will rephrase your question:
Is my app fast enough?
Come up with a business definition of "fast enough" for your application (based on business/user requirements), come up with a way to model all your usage scenarios and expected load, create simulations of that load and profile (trace/time) it.
This approach amounts to an educated guess. Anything short of it is pure speculation, and worthless.
If your application is already in production, and is working well in most cases, you can get feedback from users to determine pain points. From there, you can model those pain points and corresponding load, and profile.
Document your results. Once you make improvements to your application, you have a tool to determine if the optimizations you made achieved your goals.
When new to development as I assume you are. I recommend focusing on the most logical and obvious way to avoid over-processing. That is usually the avoidance of repeating a query by caching its first execution and checking for cached results before running queries.
After that don't spend too much time thinking about the number of queries and focus on well-written code. That means a good use of classes, methods and functions. While still having much to learn, you do not want to over-complicate every interaction with the database.
Enjoy what you are doing and keep it neat. That will result in easier to debug code which in itself can lead to better performance when you have the knowledge to take your code further. The performance of an application can be improved very quickly if the original work is well-written.
It depends on how much CPU cycles will the sum of the queries use.
1 query can consume way more CPU cycles than 100. It all depends on their contents.
You could begin by optimizing them following this guide: http://beginner-sql-tutorial.com/sql-query-tuning.htm
I think its not a problem. 10 Queries are not so much for a site. Less is better no question but when you have 3000 - 5000 then you should think about your structure.
And when you go in one query through a table with millions of rows without an index then are 10 to much.
I have seen a Typo3 site with a lot of extensions that make 7500 requests with the cache. This happens when you install and install and don't look at what happens.
But you can look that you make logical JOIN's over the tables that you have less queries.
Well there are big queries and small trivial queries. Which ones are yours? Generally, you should try to fetch the data in as few queries as possible. The heavier the load is on the database server the harder it will be to serve the clients as the traffic increases.
Just to add a bit of a different perspective to the other good answers:
First, to concur, the type and complexity of queries you are making will matter more 99% of the time than the number of queries.
However, in the rare situation where there is high latency on the network path to your database server (i.e. the db server is remote or such, not saying this is a logical or sane setup, but I have seen it done) then you want to minimize the number of queries done, because every single time you talk to the database server the network transmission time will take an order of magnitude or two longer than it takes to compute the query. This situation can really kill your page loading times, and so you'd really want to minimize the number of queries (actually, you just want to change your server setup...).