Planning for database scaling and schema changes [closed] - mysql

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm doing research before I create my social network database and I've found a lot of questions/resources pertaining to graph and key-value databases for social networks. I understand there are a TON of different options and ways to implement the DB. I also understand that what the big companies do is complex and way above what I currently need (1b+ users). I also know each of the big companies have revamped their databases to account for the insane scaling they go through.
Because I don't know how the network will grow, and I don't believe I can accurately create a model that will scale to 1m users (due to unknowns such as how people will use it, how often people post, comment, etc). But I can at least try to create a database that will be easiest to scale when (if) the need arises.
Do most companies create a database to handle up to 1k users, then once they grow, they revamp it for 10k users, then 100k, etc? If they do, at each of these arbitrary numbers (because of the unknowns listed above), do companies typically change a few tables/nodes/etc, or do they completely recreate the database to take advantage of new technologies (such as moving from SQL to graph)?
I want to pick the best solution, but I'm finding the decision between graph, key-value, SQL, among others very difficult--especially with no data to know what relationships/data is most important. I believe I can create a solid system using a graph that can support up to 10k users, but I'm worried having to potentially completely reacreate the database as the system grows. Is this a worry now to avoid issues, or implement now and adapt later type problem?
Going further, if I do need to plan on complete DB restructures, does it typically make sense to use a Multi-Model NoSQL DBMS (such as OrientDB or ArangoDB)?

I personally think you are asking premature questions.
Seriously, even with a bad model, a database can handle 10k users.
You think about scaling, but the hardest problem is not scaling, it is to come to the point where you need to scale.
I'm sure everybody wants 1bn users, but then you are already dreaming about having a social network with 200 times more users than Github itself ? (Github has ~ 5 million users).
Also, even by thinking it ahead, you will refactor and refactor again definitely during years, and you will have more than one persistence layer, be sure of it.
Code and code good, stay lean, remain able to change quickly, deploy, show to users, refactor, test, deploy and show to users in the same day. These are the things you need to do now, not asking questions about a problem you don't have yet, you definitely have a lot of other problems to solve now ;-)
UPDATE
Based on your comment, you might need to think that there are questions we just can not simply answer, because we don't need your exact requirements.
I have a simple app, which uses 4 persistence layers, and this app is not yet online. I'll give you my "why" about using it and which use case :
Neo4j : it is the core of the application data, I use it because I love it, I know it very much (it is my job) and, as the concept of the app is quite new and can evolve rapidly, having a schemaless db is reducing a lot of the refactoring stuff. Also I have now a lot of use cases coming by building the app, which make Neo4j a good choice when you need to add features without breaking what has already been done.
MySQL
I use it for User accounts and profiles. Why ? Because the framework I use already has a lot of bundles integrating this kind of stuff in a couple of lines of code, the bundles are well maintained and if I would use (currently) neo4j for it, I will have to reinvent the wheel. Also all the modules I use evolve in stability and compatibility with the framework.
Of course the mysql data is coupled (minimally) with the neo4j one. But I know that this kind of data will not evolve that much, so Mysql is a good choice and in case I have to refactor some points, this will not be a huge pain.
Redis
I use Redis for storing analytics data, Redis is quite flexible and I can easily create new keys and add data on top of it.
RabbitMQ :
I use a lot of message queues, why ? For testing refactoring. I can easily process messages with multiple consumers for testing "refactoring", testing mutliple database layers while the app is running for testing changes, testing new features, testing refactoring, ...
You will refactor ! Just try to keep it as simple as possible.

Related

migration from mysql to nosql database in production without code change and mysql without foreign keys and indexes [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
i have two scenarios here :
migrating mysql database to nosql without code change(no orms are used)
using no foriegn keys and indexes in mysql(because they want to migrate to different database in future)
3.all this done by very less code change
these questions are asked by my team lead. so i dont have a answer to give him properly because i feel it very unlikely to do mysql with no indexes and foreign keys and first of all if they are not meant to use mysql.then why they choose that.
i want to know that people do like this in software industries
ofently or they will choose on their need fits correctly
they are saying that foreign key validitations are done by api level
not by mysql level
i dont understand them becasue i have less experience so i dont have an answer why they are saying like this. please give me some insight to this that if this is a good practice or not ?
I don't think it will be possible without adding code - you need to implement how your data is managed by your nosql dB engine in some way. If the project is coded with a clear separation of business logic and database code, it's a simple matter of using the new database implementation instead of the old one. If that is not the case and your db implementation leaked into your business logic, then it will not be possible to switch without changing code. Depending on the size of the code base it might /will most likely be too expensive.
If you want to see an example of a clean separation of dB logic from business logic, have a look at this repository: https://github.com/fathersson/money-transfer
(this is not my repository, I just stumbled upon it today)
If you want to learn and understand the principles driving that design, start by looking for "clean architecture" and/or "Domain Driven Design" - the first one is easier to understand in my opinion and there are some talks on YouTube by Robert C. Martin that you can have a look at before buying some books.
Edit: The project I'm working on at the moment did change from postgresql running on rds to dynamodb using a different repository without changing any existing business logic. It saves a lot of money that way. So yes, changing the db backend does happen and is driven by requirements.
In addition to that, when I start working on a new feature set/micro service/bounded context I usually start with a simple in memory repository implementation that's using a map. After I'm done with the initial set of use cases, I know more about the db requirements and choose the db engine based on these and the general requirement to limit the number of different technologies in use.

Figure out database problems

At the beginning I would like to say that I am not expert in this domain and It is problematic to describe all nuances. I work on Rails application which uses Mysql database. Our DB has grown and now we have seriously problems with performance. In our app we have two features (for example sync with mobile) which process much data and It causes our database hang. We use newrelic to monitoring which confirmed that we have problems with those two parts of app. My main question is how to profile my app to figure out which actions make the biggest problem? Which tools I can use? Do you have any tips what I can do/configure DB to improve performance? What action I should do to find out where the problem is (next small step)? I know that those question are very general but I am junior in this domain and new in rails. I believe that more question will appear after your answers ;)
Firstly what is the size of db, like how many tables and avg no of rows per table. With the basic details available in your question, here are few steps you can look upon.
Regarding data sync with mobile. First step should be avoid heavy processing on the fly rather use background jobs to process the data and store in tables what actually needs to be sent to mobile . So in that case you will avoid multiple queries, rather 1-2 queries should fetch the entire data with minimum processsing.
I'm sure you must have applied indexing and active model relationship has been properly managed. It would actually help if you can post some models and basic relationship between them . Also explan in brief how the sync is being done and apis being handled.
Use benchmarking to figure the time to fetch data and keep a watch on the logs when processing data. There are some better benchmarking tooks available.

MongoDB, MySQL or something third for my project?

Ok guys.
I've begun developing a little sparetime project that might become big someday. Before I really get started, I want to be certain that I'm starting with the right setup. So I come to you.
I'm making a service, which will work mostly as a todolist/project planner.
In this system there will be an amount of users and an amount of tasks. Each task can be assigned to multiple users, and each user can have multiple tasks (many to many relation).
Until now I was planning to use MySQL, but a friend of mine, who is part of the project, sugested MongoDB instead. He tells me that it would increase performance and be more scaleable.
On the other hand I'm thinking that in order to either get all tasks assigned to a specific user, or all users assigned to a specifik task, one would need to use joins, which MongoDB doesnt have (or have in a cumbersome way as far as I have understood).
Now my question to you is "Which DB system would you suggest. MySQL or MongoDB or a third option? And why?"
Thank you for your time and your assistance.
Morten
We use MySQL at IGN to store person relationships (many-to-many like your use case), and have about 5M records in the relationship table. We have 4 MySQL servers in a cluster and the reads are distributed across 3 MySQL slaves. BTW you can always denormalize to optimize reads and penalizing writes among other things based on the read/write heavyness of your system.
We use the DAO pattern with Spring, so its fairly easy for us to swap DB providers through configuration (and by writing a Mongo/MySQL DAO Implementation as applicable). We have moved activities (like in Social Media) to Mongo almost a year ago but the person relationships are living happily in MySQL.
The comment to your post by Jonas says it all,
If need be, you can always scale later.
This.
I am very much of the mindset that If you don't have scaling problems, don't worry too much (if at all) about scaling problems. Why not use what is easiest, smartest and cleanest to deliver the features clients pay for (in my case at least!) This approach saves a lot of time and energy and is the proper one for 9 projects out of 10.
Learning a technology because it scales is great. Being tied to an unlearned technology and unknown technology because it scales in an upcoming project, is not as great. There are many other factors than scalability, when using 3rd party stuff.
MySQL would seem to be a good choice MySQL being more mature and having loads of client libraries, ORM's and other timesaving technologies. MySQL can handle millions (billions if you have the ram) of rows. I have yet to encounter a project it could not handle, and I have seen some pretty impressive datasets!
Of course, when you will need performance, sure maybe you will find yourself ripping out orm and sql generating code to replace with your own hand tweaked queries, but that day is way down the line and chances are, that day will never even come.
Mongodb, although it is real cool I am sorry to say may well bring you issues having nothing to do with scaling.
My 2 cents, happy coding!
MySQL
Either would likely work for your purposes, but your database seems relatively rigid in its structure, something which SQL deals well with. As such, I would recommend MySQL. A many-to-many relationship is relatively easy to implement and access, as well.
You may take a tiny bit of a performance hit, but in my experience, this is generally not extremely noticeable with smaller scale applications (i.e. databases with less than millions of entries). I do agree with #Jonas Elfström's comment, however: you should have an abstraction layer between your application and the database, so that should scaling become an issue, you can address it without too many problems.
Stick with a relational database, it can handle many to many relationships and is fully featured for backup and recovery, high availability and importantly you will find that every developer you need is familiar with it. There are plenty of documented methods for scaling a relational database.
Pick an open source databases either MySQL or Postgres dependant upon which your team is most familiar with and how it integrates into the rest of your infrastructure stack.
Make sure you design your data model correctly most importantly the relationships between the entities.
Good luck!

is mysql capable of managing the data for a site which holds lots of data

is mysql capable of managing the data for a site which holds lots of data (say with hundreds of millions of users)? which database would be the most capable/beneficial?
Wikipedia is based on MySQL. I don't think it has 100M users, but it must be close by now.
No database will handle hundreds of millions of users unless you know how to set it up properly. No single server could handle that kind of traffic, so you need to know how to setup replication and load balancing. Once you reach a certain level, there is no out of the box solution, only tools you can use. MySQL being a very capable tool.
There are a couple of answers to this.
Yes, MySQL can store hundreds of millions of records; you need to know what you're doing, have a decent database schema, pretty robust hardware, but you're not pushing the limits.
When you talk about "hundreds of millions of users", you're talking about a site along the lines of Wikipedia/Facebook/Google/Amazon in scale. You need a custom, highly cached, distributed architecture to run a site at that scale - and the traditional database driven application architecture will almost certainly not be enough. You could still store your data in MySQL, but you'd need a whole bunch of additional components to make it all work - and without knowing more about the application, nobody could tell you what that might be. At that scale, none of the commonly used databases would suffice, so MySQL is no better or worse than any of the other options...
Your question is really irrelevant, because creating a product or service that hundreds of millions of customers actually want is a much bigger and more difficult challenge than choosing a database engine.
If you're starting a business from nothing, pick a technical platform you already know and go with it: productivity and quick implementation will be more important than being scalable to a level you may never reach anyway.
If you do eventually become successful enough to have to deal with hundreds of millions of customers, then you'll certainly be able to raise the cash to buy whatever expertise and hardware you need.

SQL (MySQL) vs NoSQL (CouchDB) [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am in the middle of designing a highly-scalable application which must store a lot of data. Just for example it will store lots about users and then things like a lot of their messages, comments etc. I have always used MySQL before but now I am minded to try something new like couchdb or similar which is not SQL.
Does anyone have any thoughts or guidance on this?
Here's a quote from a recent blog post from Dare Obasanjo.
SQL databases are like automatic
transmission and NoSQL databases are
like manual transmission. Once you
switch to NoSQL, you become
responsible for a lot of work that the
system takes care of automatically in
a relational database system. Similar
to what happens when you pick manual
over automatic transmission. Secondly,
NoSQL allows you to eke more
performance out of the system by
eliminating a lot of integrity checks
done by relational databases from the
database tier. Again, this is similar
to how you can get more performance
out of your car by driving a manual
transmission versus an automatic
transmission vehicle.
However the most notable similarity is
that just like most of us can’t really
take advantage of the benefits of a
manual transmission vehicle because
the majority of our driving is sitting
in traffic on the way to and from
work, there is a similar harsh reality
in that most sites aren’t at Google or
Facebook’s scale and thus have no need
for a Bigtable or Cassandra.
To which I can add only that switching from MySQL, where you have at least some experience, to CouchDB, where you have no experience, means you will have to deal with a whole new set of problems and learn different concepts and best practices. While by itself this is wonderful (I am playing at home with MongoDB and like it a lot), it will be a cost that you need to calculate when estimating the work for that project, and brings unknown risks while promising unknown benefits. It will be very hard to judge if you can do the project on time and with the quality you want/need to be successful, if it's based on a technology you don't know.
Now, if you have on the team an expert in the NoSQL field, then by all means take a good look at it. But without any expertise on the team, don't jump on NoSQL for a new commercial project.
Update: Just to throw some gasoline in the open fire you started, here are two interesting articles from people on the SQL camp. :-)
I Can't Wait for NoSQL to Die (original article is gone, here's a copy)
Fighting The NoSQL Mindset, Though This Isn't an anti-NoSQL Piece
Update: Well here is an interesting article about NoSQL
Making Sense of NoSQL
Seems like only real solutions today revolve around scaling out or sharding. All modern databases (NoSQLs as well as NewSQLs) support horizontal scaling right out of the box, at the database layer, without the need for the application to have sharding code or something.
Unfortunately enough, for the trusted good-old MySQL, sharding is not provided "out of the box". ScaleBase (disclaimer: I work there) is a maker of a complete scale-out solution an "automatic sharding machine" if you like. ScaleBae analyzes your data and SQL stream, splits the data across DB nodes, and aggregates in runtime – so you won’t have to!
And it's free download.
Don't get me wrong, NoSQLs are great, they're new, new is more choice and choice is always good!! But choosing NoSQL comes with a price, make sure you can pay it...
You can see here some more data about MySQL, NoSQL...: http://www.scalebase.com/extreme-scalability-with-mongodb-and-mysql-part-1-auto-sharding
Hope that helped.
One of the best options is to go for MongoDB(NOSql dB) that supports scalability.Stores large amounts of data nothing but bigdata in the form of documents unlike rows and tables in sql.This is fasters that follows sharding of the data.Uses replicasets to ensure data guarantee that maintains multiple servers having primary db server as the base. Language independent.
Flexible to use