Though this can be very abstract question, please show me any proper direction.
DB design and replication configurations for Twitter-like webapp (heavy inserts & reads).
For a very high loads, you might consider NoSQL databases. This solution works well, when you mostly need to read data, and your data logic is not to complex. NoSQL solutions can be times faster then relational databases, when properly configured.
If you want to go with MySQL, this question is too abstract. There are tons of things you need to think about:
proper table structure
proper indexing
caching
normalization and denormalization
your queries
clustering
Google all of these, to understand why those questions are important. If you are serious about getting the best out of MySQL performance, I really recommend "High performance MySQL" - this book is terrific.
Related
By nosql databases I mean something like mongodb or dynamodb
I've been trying to find why NoSql dbs usually are usually better at horizontal scaling than relational dbs, and how to choose between them
I have looked into many videos and posts that tell us the "SQL vs NoSQL". Most of them end up talking about "Normalization vs Denormalization".
Here are some questions I am still confused about.
1.
Many people said that relational dbs have to follow ACID so they are bad at horizontal scaling. But ACID is about transaction, we can always choose not to use any transaction, right? I know not many people do this, but if we denormalized tables enough, would it be like NoSQL dbs where we almost don't use any transaction?. And many NoSql dbs now have transactions too.
2.
I know denormalization is probably good for horizontal scaling, because if data are
spreaded across many nodes(machines), it'll be hard to do table joining(or transaction).
But like transaction, we can choose not to use any table join.
The only thing I can think of is NoSQL are schema-free, it is easier to add new fields(columns) than RDB.
What I am trying to ask are
why is a "Denormalized NoSQL db" better than a "Denormalized relational db" ?
why is a "Normalized NoSQL db" worse than a "Normalized relational db" ?
what's the real thing that prevents relational database from denormalization?
I've read this post
https://softwareengineering.stackexchange.com/questions/194340/why-are-nosql-databases-more-scalable-than-sql
It says
""The SQL API lacks a mechanism to describe queries where ACID's requirements are relaxed. This is why the BASE databases are all NoSQL.""
Could anyone give me an example of this?
Sorry for not being specific
By NoSQL databases I mean something like mongodb
A blog like https://neo4j.com/blog/acid-vs-base-consistency-models-explained/ explains BASE this way:
Basic Availability
The database appears to work most of the time.
Soft-state
Stores don’t have to be write-consistent, nor do different replicas have to be mutually consistent all the time.
Eventual consistency
Stores exhibit consistency at some later point (e.g., lazily at read time).
This level of equivocation doesn't sound very reliable, does it? They trade off availability and consistency to gain performance and scalability.
This is fine if you're running a service that is tolerant of mismatched data or stale data, or which is okay with some minor amount of data loss once in a while. If those issues are an uncommon occurrence, but you get superior performance nearly all the time, it's very attractive. And more importantly, it demos well.
But if you have to run a service with strict requirements for data integrity, it's no good. If losing even one record of data gets you in trouble with auditors, or if you can't reliably read data you just committed a moment before because that commit takes time to propagate to all nodes of your cluster, it could be a deal-breaker.
So which data store to choose depends on the requirements of your app. Only you can judge if the relaxed availability and consistency of a BASE data store is sufficient for the needs of your app.
NoSQL is a term that covers lots of types of storage/query engines e.g. document stores, Graph Databases, etc. - basically anything that looks something like a database but doesn’t use the standard tables/rows/columns structure that a SQL database does.
NoSQL databases were developed to support use cases that relational databases don’t handle well - so while you might be able to use either a SQL or a NoSQL database in any given scenario, the choice between the 2 is normally a no-brainer; they would very rarely both be viable options.
Just to clarify, your questions about types of DB being better or worse are meaningless without context. Without knowing precisely what your requirements are, it’s impossible to say whether a NoSQL DB is better or worse than a SQL one - and that’s before you start looking at specific products in each category.
Also, that post you reference is about 8 years old and much of the information is out of date - as one of the contributors acknowledges in an update made in 2019
My understanding is that MySQL is intended to have tables with millions of rows. I am looking for a database system designed to have millions of relational tables. Am I correct in my understanding that the way MySql queries data makes it inefficient for that sort of an implementation? It is for a long-term, user-driven project, so extensibility is a must.
Thanks!
EDIT:
Due to the immediately negative reaction, I'll explain myself. "Millions" of tables would be an issue if the project lived to accumulate a strong user base in time. It would implement an edit system similar to that on Stack Overflow; I considered a variety of solutions, and decided the one I liked best was one using a relational table for each offshoot of edits. I assumed there was some database framework designed for that sort of thing. Is this really considered "bad" architecture? Why is it not just an abnormal type of architecture? What is "wrong" with doing something that way?
You could always look towards a NoSql DB:
From: http://nosql-database.org/
"NoSQL DEFINITION: Next Generation Databases mostly addressing some of
the points: being non-relational, distributed, open-source and
horizontally scalable."
Edit: Scalable is what I was shooting for..
Suggestion:
http://www.mongodb.org/
Edit: Interesting idea about data versioning:
Ways to implement data versioning in MongoDB
I'm in the process of designing and planning a new website.
it is mainly a message boards site
I have past experience with MySQL, but I hear many voices (not in my head)
which telling NoSQL can be as good solution as RDBMS.
the main claim for NoSQL is performance. what do you think about it?
so,
I need a scalable database-design technology for my website.
if I go with NoSQL, I know there are couple of technologies in this area
(document store, key-value store etc) . how to choose?
what do you think is more suitable for a message boards website:
NoSQL or MySQL?
thanks,
socksocket
Both SQL and no-SQL can be used for your purpose. The two main reasons to go with no-SQL is if you really have a lot of traffic (and your sql solution is not working performance-wise) and if you have a lot of unstructured and changing data that benefits from being schema-less.
Personally I believe a significant factor for you to consider is maintainability.
If you create anything using no-sql you are going to have less than 10% of the audience for maintaining it when compared to SQL.
It is common for programmers to want to use the 'best' solution technically but not factor in the maintainability and costs aspects, especially when the solution is considered 'simple' by them.
), for your purposes, I think a NoSQL is probably a better choice than MySQL. You should check out like MongoDB or CouchDB, both are open-source scalable NoSQL DBs (and as already mentioned, there are other NoSQL DBs and file storage systems commercially available)
Basically, messaging boards do not really need a DBMS. In a DBMS, query processing actions are slower than in a NoSQL DB and messaging boards can have a high volume of traffic as well as data that does not necessarily have a fixed schema. The flexibility of NoSQL with regard to data structure enables utilizing and implementing sharding, partitioning, indexing and other technologies easily.
Although performance is one of the key elements, this is not a feature in NoSQL, it is more a consequence of design, what I think is THE feature is the flexibility of its data structure and the possibility to store information in a single row avoiding multiple round trips when you work with records that are close related (take a look of this post http://djondb.com/blog to get a better understanding of what I'm talking about ).
For any website which requires to change its model on a daily basis it's wise to choose a DB which can keep up with this flexibility.
I'm a little bit biased because I'm the author of a NoSQL document store but I suggest you to give NoSQL document store a try, you'll be surprise on how fast you can create solutions using that kind of easy to store approach.
Have you looked at Redis (http://redis.io/) ?
You can model almost everything you have in your RDBMS with Redis. In most cases you will get x10 performance, and it is supported by a great and very active community .
I suggest that you detail your needs in the Redis forum, and you will probably get the most honest and professional responses; part of them may suggest that you use other NoSQL technologies on different parts of your architecture
Simple question, could I conceivably use redis instead of mysql for all sorts of web applications: social networks, geo-location services etc?
Nothing is impossible in IT. But some things might get extremely complicated.
Using key-value storage for things like full-text search might be extremely painfull.
Also, as far as I see, it lack support for large, clustered databases: so on MySQL you have no problems if you grow over 100s of Gb in Database, and on Redis... Well, it will require more effort :-)
So use it for what it was developed for, storing simple things which just need to be retreived by id.
ACID compliance is a must, if data integrity is important. Medical records and financial transactions would be an example. Most of the NoSQL solutions, including Redis, are fast because they trade ACID properties for speed.
Sometimes data is simply more convenient to represent using a relational database and the queries are simpler.
Also, thanks to foreign relationships and constraints in relational databases, your data is more likely to be correct. Keeping data in sync in NoSQL solutions is more difficult.
So, no I don't think we can talk about full replacement. They are different tools for different jobs. I wouldn't trade my hammer for a screwdriver.
I've been using mysql (with innodb; on Amazon rds) because it's sort of universal default, but it's been ridiculously under-performing, and tweaking it only delays the inevitable.
The data is mostly relatively short (<1kB of bytes each) blobs information about 100Ms of urls. There is (or should be, mysql cannot seem to handle it) very high amount of insert / update / retrieve but few complex queries - not that complex queries wouldn't be useful, but because mysql is so slow that it's far faster to get the data out, process it locally, and cache the results somewhere.
I can keep tweaking mysql and throwing more hardware at it, but it seems increasingly futile.
So what are the options? SQL/relational model/etc. optional - anything will do as long as it's fast, networked, and language-independent.
Have you done any sort of end-to-end profiling of your application and MySQL database? To provide better advice it would also be good to understand what improvements you have tried to implement, and your database structure. You haven't given a lot of information on how your MySQL database is configured either. It provides a lot of options for tuning.
You should pick up a copy of High Performance MySQL if you haven't already to learn more about the product.
There is no point in doing anything until you know what your problem is. NoSQL solutions can offer performance benefits but you have provided little evidence that MySQL is incapable of servicing your needs.
Well "Fast, networked and language-independent" + "few complex queries" brings to mind the various NoSQL solutions. To name a few:
MongoDB
CouchDB
Cassandra
And if that's not fast enough, there are always the wicked fast Redis which is my personal favorite atm. :) It is not a database per se, but it's good enough for most scenarios.
I am sure other people can list more NoSQL databases...
and there is always http://nosql-database.org/ .
Generally speaking, databases in this category is better and faster in your scenario because they have relaxed constraints and thus is easier and faster to insert/update/retrieve frequently. But that requires that you think harder about your data model and it is generally not possible to do SQL-style complex queries directly -- you'll instead write more pre-computed data or use a more denormalized design to account for the lack of complex queries.
But since complex queries is a minor problem in your case, I think NoSQL solutions are ideal for you.
With the data you've given about your application's data and workload, it is almost impossible to determine whether the problem really is MySQL itself or something else. You seem to assume that you can throw any workload to a relational engine and it should handle it. Therefore the suggestions made by other commenters about analyzing the performance more carefully are valid in my opinion. Without more data (transactions / second etc.) any further analysis regarding other suitable engines is also futile.
I'm not sure I agree with the advice to jump ship on traditional databases. It might not be the most efficient tool, but it is the one that is FAR more widely understood and used, and a strongly doubt you have a problem that can't be handled by an efficiently set up relational database.
Obvious answers are Oracle, SQLServer, etc, but it might just be your database structure isn't right. I don't know much about MySQL but I do know it's used in some pretty big projects (eBay being noteworthy).