Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Hi i recently came across a situation where i am asked to optimize data model for one of our client for there already developed and running product.The main reason for doing this exerciser is, the product suffers from performance slowness due to too many locks and too many slow running queries.As i am not a DBA, looking at first site to the data model and doing some tracing of queries, i realize that the whole data model suffers from improper design and storage.The database is MySQl 5.6 and we are running InnoDB engine on that.
I want to know that is there any tool out there which can analyze the whole data model and can point out to possible issues including data structure definitions,indexes and other stuffs?
I tried lots of profiling tools including Mysql Workbench,Mysql Enterprise Monitor(paid version),jet profiler but they all are seems to be limited to identifying slow queries only. What i am interested in a tool which can analyze the existing data model and report problems with it and possible solutions for the same
You can not look at the data model in isolation. You need to consider the data model together with the requirements and the actual data access/update patterns.
I recomend you identify the top X slowest queries and perform your root-cause analysis.
Make sure you focus on the parts of the application that matters, i.g. the performance problems that negatively affects the usefulness of the application.
And by data access/update patterns I mean for example:
High vs low nr of concurrent access
Mostly reads or updates?
Single record reads vs reading large nr of records at once
Access is evenly spread out during the day or in bulk at certain times
Random access (every record is likely to be selected or updated) vs mostly recent records
Are all tables equally used or some are more used than others
Are all columns of all tables read at once? Or are there clusters of columns that are used together?
What tables are frequently used together?
The slow queries are the most important to look at. Show us a few of them, together with SHOW CREATE TABLE, EXPLAIN, and how big the tables are.
Also, how many queries per second are you running?
SHOW VARIABLES LIKE '%buffer%';
There are no such tools as those you are looking for, so I guess you'll have to do your homework, proposing another data model that "follows the rules".
You could begin by getting familiar with the first three normal forms.
You could also try to detect SQL antipatterns (there are books talking about these) in your database. This should give you some leads to work on.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
First of all, I'm not a very experienced developer, I'm making mid-size apps in PHP, MySQL and Javascript.
There is something though which is making it hard for me to design a MySQL InnoDB database before each project. And that is the performance. I'm always quite worried about if I'm creating a normalized database scheme that when I'll have to join a couple of tables (like 5-6) together (there are usually a few many-to-many, many-to-one relationships between them) it will affect the performance a LOT (in negative) when each of these 5-6 tables has around 100k rows.
These projects that I usually have is creating analytics platforms. Therefore I'm expecting around 100M of clicks in total and I usually have to join this table to many others (each around 100k of rows) to get some data displayed. I'm usually making summarized tables of the clicks but cannot do the same for the other tables.
I'm not quite sure if I have to worry about future performance in this stage. Currently, I am actively managing a few of these applications with 30M+ clicks and tables that I join to this Clicks table with 40k+ rows. The performance is pretty bad - a select operation usually takes more than 10-20s to complete while I believe I have proper indexing, innodb_buffer_pool_size also.
I've read a lot about the key to having an optimized database is the design. That's why I'm usually thinking about the DB scheme a LOT before creating it.
Do I really have to worry about creating DB schemes where I'll have to Join 5-6 many-to-many/many-to-one/one-to-many tables or it's quite usual and MySQL should be able to easily handle this load?
Is there anything else that I should consider before creating a DB scheme?
My usual server setup is having a MySQL Server with 4GB RAM + 2 vCPUs, to serve the DB and a WebServer with 4GB RAM + 2 vCPUs. Both of them are using Ubuntu's 16.04 release and using the latest MySQL (5.7.21) and PHP7-fpm.
Gordon is right. RDBMSs are made to handle your kind of workload.
If you're using virtual machines (cloud, etc) to host your stuff, you can generally increase your RAM, vCPU count, and IO capacity simply by spending more money. But, usually, throwing money at DBMS peformance problems is less helpful than throwing better indexes at them.
At the scale of 100M rows, query performance is a legitimate concern. You will, as your project develops, need to revisit your DBMS indexing to optimize the queries you're actually using. So plan on that. The thing is, you cannot and will not know until you get lots of data what your actual performance issues will be.
Read this for a preview of what's coming: https://use-the-index-luke.com/ .
One piece of advice: partitioning of tables generally doesn't solve performance problems except under very specific circumstances.
Look up this acronym: YAGNI.
And go do your project. Spend your present effort getting it working.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm doing research before I create my social network database and I've found a lot of questions/resources pertaining to graph and key-value databases for social networks. I understand there are a TON of different options and ways to implement the DB. I also understand that what the big companies do is complex and way above what I currently need (1b+ users). I also know each of the big companies have revamped their databases to account for the insane scaling they go through.
Because I don't know how the network will grow, and I don't believe I can accurately create a model that will scale to 1m users (due to unknowns such as how people will use it, how often people post, comment, etc). But I can at least try to create a database that will be easiest to scale when (if) the need arises.
Do most companies create a database to handle up to 1k users, then once they grow, they revamp it for 10k users, then 100k, etc? If they do, at each of these arbitrary numbers (because of the unknowns listed above), do companies typically change a few tables/nodes/etc, or do they completely recreate the database to take advantage of new technologies (such as moving from SQL to graph)?
I want to pick the best solution, but I'm finding the decision between graph, key-value, SQL, among others very difficult--especially with no data to know what relationships/data is most important. I believe I can create a solid system using a graph that can support up to 10k users, but I'm worried having to potentially completely reacreate the database as the system grows. Is this a worry now to avoid issues, or implement now and adapt later type problem?
Going further, if I do need to plan on complete DB restructures, does it typically make sense to use a Multi-Model NoSQL DBMS (such as OrientDB or ArangoDB)?
I personally think you are asking premature questions.
Seriously, even with a bad model, a database can handle 10k users.
You think about scaling, but the hardest problem is not scaling, it is to come to the point where you need to scale.
I'm sure everybody wants 1bn users, but then you are already dreaming about having a social network with 200 times more users than Github itself ? (Github has ~ 5 million users).
Also, even by thinking it ahead, you will refactor and refactor again definitely during years, and you will have more than one persistence layer, be sure of it.
Code and code good, stay lean, remain able to change quickly, deploy, show to users, refactor, test, deploy and show to users in the same day. These are the things you need to do now, not asking questions about a problem you don't have yet, you definitely have a lot of other problems to solve now ;-)
UPDATE
Based on your comment, you might need to think that there are questions we just can not simply answer, because we don't need your exact requirements.
I have a simple app, which uses 4 persistence layers, and this app is not yet online. I'll give you my "why" about using it and which use case :
Neo4j : it is the core of the application data, I use it because I love it, I know it very much (it is my job) and, as the concept of the app is quite new and can evolve rapidly, having a schemaless db is reducing a lot of the refactoring stuff. Also I have now a lot of use cases coming by building the app, which make Neo4j a good choice when you need to add features without breaking what has already been done.
MySQL
I use it for User accounts and profiles. Why ? Because the framework I use already has a lot of bundles integrating this kind of stuff in a couple of lines of code, the bundles are well maintained and if I would use (currently) neo4j for it, I will have to reinvent the wheel. Also all the modules I use evolve in stability and compatibility with the framework.
Of course the mysql data is coupled (minimally) with the neo4j one. But I know that this kind of data will not evolve that much, so Mysql is a good choice and in case I have to refactor some points, this will not be a huge pain.
Redis
I use Redis for storing analytics data, Redis is quite flexible and I can easily create new keys and add data on top of it.
RabbitMQ :
I use a lot of message queues, why ? For testing refactoring. I can easily process messages with multiple consumers for testing "refactoring", testing mutliple database layers while the app is running for testing changes, testing new features, testing refactoring, ...
You will refactor ! Just try to keep it as simple as possible.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this question
I am planning on eventually switching my website's database system from MySQL to NoSQL (in this case Cassandra).
From what I have understood so far about Cassandra, is that there is no such thing as a join, but rather just larger records that work more efficiently. I am by no standard an expert in NoSQL atm, i actually understand very very little about it and am very confused on how a lot of it works...
One of my goals for my web project is to switch to Python and Cassandra for a more advanced and speedier solution as my website is beginning to grow and I want to be able to scale it easily with additional servers.
Right now i am in the process of designing a new feature for my website, the ability to take files and create folders out of them. So far this is what I was originally using: How to join/subquery a second table (A question I just asked)
Then the people were suggesting to normalize the data and make it a 3 table system including one for folders, one for folders/files, and one for files. #egrunin answered my question and even gave me the info for the NoSQL, but i really wanted to check it with a second source just to make sure that this is the right approach.
Also are there any conversion tools for SQL to NoSQL?
So my ultimate goal is to design this folder/file system in the database (along with other features that I am adding) so that when I switch from SQL to NoSQL I will be ready and the conversion of all of my data will be a lot easier.
Any tutorials, guides, and information on converting SQL to NoSQL, Cassandra, or how NoSQL works is much appreciated, so far the Cassandra documentation has left me very confused.
At Couchbase we've recently done a webinar series about the transition from RDBMS to NoSQL. It's obviously through the lens of JSON documents, but a lot of the lessons will apply to any distributed database.
http://www.couchbase.com/webinars
MasterGberry:
One of my goals for my web project is to switch to Python and Cassandra for a more advanced and speedier solution as my website is beginning to grow and I want to be able to scale it easily with additional servers.
This is something that you need to clearly quantify before switching to Cassandra.
MySQL can do amazing things and so can Cassandra, but switch to Cassandra usually cannot be driven just by wanting to do things faster, because they might not be faster - at least not in the areas where you are used for MySQL to do great (column level numerical aggregates on well defined, tabular data).
I am by no means discouraging the transition, but I am warning about the expectations.
This might be a good reading:
http://itsecrets.wordpress.com/2012/01/12/jumping-from-mysql-to-cassandra-a-success-story/
Actually, you can use a tool like playOrm to support joins BUT on partitions only NOT entire tables. So if you partition by month or account, you can grab the account 4536 partition and query into that joining it with something else (either another smaller table or another partition from another table).
This is very useful if you have a system with lots of clients and each client is really independent of another client as you can self contain all the client information into that client's partitions of all tables.
later,
Dean
Cassandra isn't really meant to be the main storage for an application. One of its main purposes is storing sequential data and pulling all that back with a key lookup. One example is logging. Interestingly, the row keys are not sorted, but the column names are. So logging would have a key for every minute and then create a new column for each log entry with a sequential time stamp as the name of the column. That is just one example of course, chat history is another.
Closed. This question is opinion-based. It is not currently accepting answers.
Closed 4 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I'm new to MySQL and something that's quickly becoming obvious to me is that it feels considerably easier to create several database queries per page as opposed to a few of them.... but I don't really have a feel for how many queries might be too many, or at what point I should invest more precious time to combining queries, spending time figuring out clever joins, etc.
I'm therefore wondering if there are some kind of "mental benchmarks" experienced folks here use with regard to number of queries per page, and if so, how many might be too many?
I understand that the correct answer in any context is related to what's needed to satisfy an application's functional requirements. However, on projects where client requirements may be flexible or not properly set, or on projects where you as the developer have full control (e.g. sites you develop for yourself), you may be able to negotiate between functionality and performance... basically, to just cut trivial features if coding requirements impact performance and you're unable to optimise it any further.
I would appreciate any views on this.
Thanks
There's no set number, "page" is arbitrary enough - one could be doing one database task while another could have 2 dozen widgets each with their own task.
One good rule of thumb though: the moment you put a SELECT inside a loop that's processing rows of another SELECT, stop. It might seem fast enough early on, but data tends to grow and those nested loops will grow exponentially with it, so expect it to become a bottleneck at some point. Even if the single query ends up being significantly slower, you'll be better off in the long run (and there's always stored procs, query caching, etc).
It depends how often the page is used, the latency between the app server and database server, and a lot of other factors.
For a page which only displays data, my gut feeling is that 100 is too many. However, there are some cases where that may be acceptable.
In practice you should only optimise where necessary, which means you optimise the pages that people use the most, and ignore the minor ones.
In particular, the pages are not available to the public and the (few) authorised users hardly ever use them, there is no incentive to make them faster.
If there is a real performance problem which you believe comes from having too many queries, enable the general query log (which may make performance worse, I'm afraid) and analyse the most common queries with a view to eliminating them.
You might find that there are some "low hanging fruits" - simple queries on rarely changing data which are called on most popular pages, which you can easily eliminate (for example, have your app server fetch the data on a cron job into a local file and read it from there). Or even "lower hanging fruits" like queries which are completely unnecessary.
The difficulty with trying to combine multiple queries is that it tends to go against code-reuse and code maintainability, so you should only do it if it is ABSOLUTELY necessary; it doesn't sound like you have enough data yet to make that determination.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I am looking for a tool that will allow me to compare schemas of MySQL databases.
Which is the best tool to do that?
Navicat is able to do that for you. It will also synchronize schema and/or data between two mysql database instances. I've used it with success in the past.
Link
There is a screenshot of the data and structure synchronization tool here:
http://www.navicat.com/en/products/navicat_mysql/mysql_detail_mac.html#7
Perhaps a bit late to the party, but I've just written a simple tool in PHP to compare MySQL database schemas:
PHP script to compare MySQL database schemas
How to use the script
It exports the schema and serialises it before doing the comparison. This is so that databases can be compared that reside on different hosts (where both hosts may not be accessible by the PHP script).
Edit:
Python Script
I use SQLyog:
http://www.webyog.com/en/
It isn't free but is a very good tool and has saved the cost of it's license many many times over. I'm in no way affiliated with the company, just someone who has used a number of MySQL tools.
Free trial(30-day) available from here.
The best thing to do is to try out some performance benchmarks that are already out there. It's always better to use tried-and-tested benchmarks, unless you're thoroughly convinced that your data and database loading is going to be significantly different to the traditional usage patterns (but, then, what are you using a database for?). I'm going to steal my own answer from ServerFault:
There are a good number of benchmarks
out there for different MySQL database
engines. There's a decent one
comparing MyISAM, InnoDB and Falcon on
the Percona MySQL Performance
Blog, see here.
Another thing to consider between the
two aforementioned engines (MyISAM and
InnoDB) are their approaches to
locking. MyISAM performs
table-locking, whilst InnoDB performs
row-locking. There are a variety of
things to consider, not only downright
performance figures.