Mysql, SQLite, Scalability - mysql

Could SQLite be an alternative for mysql in high traffic web sites?
Thanks

SQLite usually will work great as the
database engine for low to medium
traffic websites (which is to say,
99.9% of all websites). The amount of web traffic that SQLite can handle
depends, of course, on how heavily the
website uses its database. Generally
speaking, any site that gets fewer
than 100K hits/day should work fine
with SQLite. The 100K hits/day figure
is a conservative estimate, not a hard
upper bound. SQLite has been
demonstrated to work with 10 times
that amount of traffic.
Source: http://www.sqlite.org/whentouse.html

The short answer is: SQLite is embedded database. It is purpose is different than standalone RBDMS. While it is quicker with simple queries than MySQL, keep in mind that SQLite has:
no good networking support (SQLite purpose is different), so replication is PITA
coarse-grained locking (one write at a time)
no advanced table statistics
no sophisticated query optimizer
high memory consumption with large databases (a 100GB database would require about 25MB or RAM before each transaction)
Then if you do not plan to use SQLite over network, database sizes are quite small, queries are rather simple, and you have a lots of reads (and really small number of writes), then SQLite may be a better choice.
About MySQL: optimizing and using MySQL in super high traffic sites is not for faint hearted. I recommend some good reading:
http://oreilly.com/catalog/9780596003067
https://www.packtpub.com/high-availability-mysql-cookbook/book

No way. SQLLite deals terribly with concurrency. The database would be a huge performance bottleneck.

No! It cannot be!

Only if you push your data to a cache and read from the cache. SQLite can be used as persistence for cache, but its really not recommended.

Related

MySQL: How to optimize when there's only one client ever?

MySQL naturally has many features to allow possibly concurrent access by multiple clients, like locking, etc. These cause an overhead which can be taxing for very large tables (right?).
What if I do not need to serve to multiple clients? I have a very large table (~150 million rows) on my machine, and I'm the only one accessing the data for a scientific project. So there is only one active client at a time, and virtually no chance of corruption.
How can I avoid some of the overhead caused by MySQL's protective measures and speed up my queries?
In this particular case, is there an advantage to either of the MyISAM of InnoDb engines?
Best thing you can do is make sure you're database and data-access code is implemented efficiently. There isn't much performance to be gained by stripping away concurrency safeguards but there may be places you are doing something very inefficiently. Have a look at the performance schema to get more insight into this if you are experiencing any performance issues then tune the database the same way you would if there were multiple clients.

Does MySQL consume significantly more resources compared to other DBMS?

There is a growing tendency for shifting from mysql to NOSQL, SQLite, etc. I have read many blogs and articles comparing the speed of mysql with other types of DBMS. However, I believe that speed is not a problem with mysql, as it is really fast; but the problem is more connected with resource usage. It is common to face extreme server load due to mysql slow queries. For instance, an advantage of Oracle over mysql is to have less problem associated with memory leaks.
Is it true that mysql consumes significantly more resources (CPU and memory) comparing with other databases such as SQLite, Non-relational databases, key/value databases. By significantly I mean is it the main reason for not using mysql for large databases (to save server costs).
If YES (to 1), what can be an estimate of better resource usage of a similar system like SQLite comparing with Mysql.
Note: Consider a simple system as advanced features of mysql is not needed. Just comparing the performance for simple queries.
If you're only using "simple" queries, I don't think there's much of a difference regarding ressource usage between MySQL and e.g. Oracle.
Those "professional" DBMS do a lot of "magic" regarding caching, prefetching and data maintanance.
Of course MySQL does that as well, but it might not be as efficient for really complex databases and advanced queries.
Your choise of DBMS highly depends on what you're planning to do, especially if you're choosing between SQL/NoSQL/Key-Value/..., which are for completely different scenarios… that's not so much a question of memory and CPU usage.
CPU and Memory never are the reason, as they are cheap. The problem is with the I/O speed. NoSQL databases are used in write-intensive applications, as well as in applications which need schema-less database (because changing the table schema in MySQL involves rewriting the table, which may be extremely slow). So some trade-offs are made to optimize the disk operations, which often lead to consuming more CPU, memory or disk space.
Another reason could be pessimistic vs optimistic locks. Which is another topic.
But since the answer to the question "Is it true that mysql consumes significantly more resources (CPU and memory) comparing with other databases" is NO, it is pointless to discuss it further :)

Clustering, Sharding or simple Partition / Replication

We have created a Facebook application and it got a lot of virality. The problem is that our database started getting REALLY FULL (some tables have more than 25 million rows now). It got to the point that the app just stopped working because there was a queue of thousands and thousands of writes to be made.
I need to implement a solution for scaling this app QUICKLY but I'm not sure if I should pursue Sharding or Clustering since I'm not sure what are the pro's and con's of each of them and I was thinking of doing a Partition / Replication approach but I think that doesn't help if the load is on the writes?
25 million rows is a completely reasonable size for a well-constructed relational database. Something you should bear in mind, however, is that the more indexes you have (and the more comprehensive they are), the slower your writes will be. Indexes are designed to improve query performance at the expense of write speed. Be sure that you're not over-indexed.
What sort of hardware is powering this database? Do you have enough RAM? It's far easier to change these attributes than it is to try to implement complex RDBMS load balancing techniques, especially if you're under a time crunch.
Clustering/Sharding/Partitioning comes when single node has reached to the point where its hardware cannot bear the load. But your hardware has still room to expand.
This is the first lesson I learnt when I started being hit by such issues
Well, to understand that, you need to understand how MySQL handles clustering. There are 2 main ways to do it. You can either do Master-Master replication, or NDB (Network Database) clustering.
Master-Master replication won't help with write loads, since both masters need to replay every single write issued (so you're not gaining anything).
NDB clustering will work very well for you if and only if you are doing mostly primary key lookups (since only with PK lookups can NDB operate more efficient than a regular master-master setup). All data is automatically partitioned among many servers. Like I said, I would only consider this if the vast majority of your queries are nothing more than PK lookups.
So that leaves two more options. Sharding and moving away from MySQL.
Sharding is a good option for handling a situation like this. However, to take full advantage of sharding, the application needs to be fully aware of it. So you would need to go back and rewrite all the database accessing code to pick the right server to talk to for each query. And depending on how your system is currently setup, it may not be possible to effectively shard...
But another option which I think may suit your needs best is switching away from MySQL. Since you're going to need to rewrite your DB access code anyway, it shouldn't be too hard to switch to a NoSQL database (again, depending on your current setup). There are tons of NoSQL servers out there, but I like MongoDB. It should be able to withstand your write load without worry. Just beware that you really need a 64 bit server to use it properly (with your data volume).
Replication is for data backup not for performance so its out of question.
Well, 8GB RAM is still not that much you can have many hundred GB RAM with quite big hard disk space and MySQL would still work for you.
Clustering/Sharding/Partitioning comes when single node has reached to the point where its hardware cannot bear the load. But your hardware has still room to expand.
If you don't want to upgrade your hardware then you need to give more information about database design and if there are lot of joins or not so that above named options can be considered deeply.

Maximum capabilities of MySQL

How do I know when a project is just to big for MySQL and I should use something with a better reputation for scalability?
Is there a max database size for MySQL before degradation of performance occurs? What factors contribute to MySQL not being a viable option compared to a commercial DBMS like Oracle or SQL Server?
Google uses MySQL. Is your project bigger than Google?
Smart-alec comments aside, MySQL is a professional level database application. If your application puts a strain on MySQL, I bet it'll do the same to just about any other database.
If you are looking for a couple of examples:
Facebook moved to Cassandra only after it was storing over 7 Terabytes of inbox data. (Source: Lakshman, Malik: Cassandra - A Decentralized Structured Storage System.) (... Even though they were having quite a few issues at that stage.)
Wikipedia also handles hundreds of Gigabytes of text data in MySQL.
I work for a very large Internet company. MySQL can scale very, very large with very good performance, with a couple of caveats.
One problem you might run into is that an index greater than 4 gigabytes can't go into memory. I spent a lot of time once trying to improve the MySQL's full-text performance by fiddling with some index parameters, but you can't get around the fundamental problem that if your query hits disk for an index, it gets slow.
You might find some helper applications that can help solve your problem. For the full-text problem, there is Sphinx: http://www.sphinxsearch.com/
Jeremy Zawodny, who now works at Craig's List, has a blog on which he occasionally discusses the performance of large databases: http://blog.zawodny.com/
In summary, your project probably isn't too big for MySQL. It may be too big for some of the ways that you've used MySQL before, and you may need to adapt them.
Mostly it is table size.
I am assuming here that you will use the Oracle innoDB plugin for mysql as your engine. If you do not, that probably means you're using a commercial engine such as infiniDB, InfoBright for Tokutek, in which case your questions should be sent to them.
InnoDB gets a bit nasty with very large tables. You are advised to partition your tables if at all possible with very large instances. Essentially, if your (frequently used) indexes don't all fit into ram, inserts will be very slow as they need to touch a lot of pages not in ram. This cannot be worked around.
You can use the MySQL 5.1 partitioning feature if it does what you want, or partition your tables at the application level if it does not. If you can get your tables' indexes to fit in ram, and only load one table at a time, then you're on a winner.
You can use the plugin's compression to make your ram go a bit further (as the pages are compressed in ram as well as on disc) but it cannot beat the fundamental limtation.
If your table's indexes don't all (or at least MOSTLY - if you have a few indexes which are NULL in 99.99% of cases you might get away without those ones) fit in ram, insert speed will suck.
Database size is not a major issue, provided your tables individually fit in ram while you're doing bulk loading (and of course, you only load one at once).
These limitations really happen with most row-based databases. If you need more, consider a column database.
Infobright and Infinidb both use a mysql-based core and are column based engines which can handle very large tables.
Tokutek is quite interesting too - you may want to contact them for an evaluation.
When you evaluate the engine's suitability, be sure to load it with very large data on production-grade hardware. There's no point in testing it with a (e.g.) 10G database, that won't prove anything.
MySQL is a commercial DBMS, you just have the option to get the support/monitoring that is offered by Oracle or Microsoft. Or you can use community support or community provided monitoring software.
Things you should look at are not only size at operations. Critical are also:
Scenaros for backup and restore?
Maintenance. Example: SQL Server Enterprise can rebuild an index WHILE THE OLD ONE IS AVAILABLE - transparently. This means no downtime for an index rebuild.
Availability (basically you do not want to have to restoer a 5000gb database if a server dies) - mirroring preferred, replication "sucks" (technically).
Whatever you go for, be carefull with Oracle RAC (their cluster) - it is known to be "problematic" (to say it finely). SQL Server is known to be a lot cheaper, scale a lot worse (no "RAC" option) but basically work without making admins want to commit suicide every hour (the "RAC" option seems to do that). Scalability "a lot worse" still is good enough for the Terra Server (http://msdn.microsoft.com/en-us/library/aa226316(SQL.70).aspx)
THere wer some questions here recently of people having problems rebuilding indices on a 10gb database or something.
So much for my 2 cents. I am sure some MySQL specialists will jump in on issues there.

How to scale MySQL with multiple machines?

I have a web app running LAMP. We recently have an increase in load and is now looking at solutions to scale. Scaling apache is pretty easy we are just going to have multiple multiple machines hosting it and round robin the incoming traffic.
However, each instance of apache will talk with MySQL and eventually MySQL will be overloaded. How to scale MySQL across multiple machines in this setup? I have already looked at this but specifically we need the updates from the DB available immediately so I don't think replication is a good strategy here? Also hopefully this can be done with minimal code change.
PS. We have around a 1:1 read-write ratio.
There're only two strategies: replication and sharding. Replication comes often in place when you have less write and much read traffic, so you can redirect the reads to many slaves, with the pitfall of lots of replication traffic with the time and a probability for inconsitency.
With sharding you shard your database tables across multiple machines (called functional sharding), which makes especially joins much harder. If this doenst fit anymore you also need to shard you rows across multiple machines, but this is no fun and depends a sharding layer implemented between you application and the database.
Document oriented databases or column stores do this work for you, but they are currently optimized for OLAP not for OLTP.
Depends on the application backend (i.e. how the PKs, transactions and insert IDs are handled), you might consider MASTER-MASTER replication with different auto_increment setups. This can be tricky and needs to be thoroughly tested but it can work.
Also, in new MySQL 5.6 there is a GTID (Global Transaction Identifier) that generally helps a lot in keeping the replication in sync, especially in this scenario.
You should take a look at MySQL Performance Blog. Maybe you'll find something useful.
Well... good luck scaling all those writes to a real large scale. The database engine becomes the bottleneck, too many locks and buffers mgmt and stuff...
The only way I found that really works is scale out, sharding, unfortunately sharding is not provided for MySQL "out of the box" (like in some NoSQLs such as Mongo). ScaleBase (disclaimer: I work there) is a maker of a complete scale-out solution an "automatic sharding machine" if you like. ScaleBae analyzes your data and SQL stream, splits the data across DB nodes, route commands and aggregates results in runtime – so you won’t have to!