Database and DB engine for social website - mysql

For a heavy user content website with user profiles, live feeds, photo /video/content sharing, etc what DB to use, and what DB engine? Ofcourse Oracle/Microsoft sql are out because they are not free (or cheap). I am using MySQL with MyISAM but that will run into performance issues on a social site. Even using InnoDB may not help performance. So firstly which DB is better to use and which engine - starting from Free to paid?
Main concern is with live feeds performance, plus there is lots of data tracking and mining with analytic.
Since this is a startup - hardware costs are limited so no hardware farm here to support performance, hence seeking a decent alternative for the first 2 years atleast.

If you are a start-up, you may be able to afford Microsoft's stuff after all. Check out their BizSPark program which gives you software for free for a few years.
I am not sure that I agree that MySQL is not your best option though. Doesn't FaceBook use MySQL? Are you expecting to be bigger?
If you think that InnoDB will be too slow for you, there are other storage engine options on MySQL. For example, have you investigated TokuDB?
Have you considered a hosting service like Linode.com instead of your own hardware? This might be a better fit for your cashflow profile. If you really feel the DB load is onerous, you could have dedicated or multiple servers. Start with a dedicated DB box and scale up from there. You could also go with a cloud service like Amazon EC2.
EDIT: I just realized that Tokutek have a social networking case study:
http://tokutek.com/customers/a-social-networking-case-study/
It sounds to me like you are unlikely to outgrow MySQL any time soon. If you grow busy enough to need anything like TokuDB you can thank me with a few shares. :-)

If your userbase is large enough to require a high-performance DB, your bandwidth and storage charges will not be cheap; why skimp on the database?
MS SQL has free and/or cheap editions which are performance-inhibited. I always start with them, and if the site takes off, I upgrade. Probably Oracle offers something similar.
Edit: I should have said at the start that you're unlikely to outgrow MySQL anytime soon, and that this whole question might be a case of premature optimization.

I would recommend getting a cloud account for the files and potentially a dedicated server for the DB (MySQL and MyISAM).
You can limit your cost on the cloud storing the files, but enhance performance by having a dedicated server for the DB by increasing processing power and memory.

Related

Tuning MySQL Database

I am having a MySQL database which is running on a dedicated Ubuntu server having 2GB RAM and 500GB hard drive. I appreciate if anyone could help on fine tuning the database to increase the performance. Enhancements need to impact on CRUD tasks of the database, including procedure calls' and scheduled events' performances.
I have done searches on the web regarding this and found various mechanisms, tools and etc in various websites to do the job. But I need to know the proper way of escalating the performance (ex: execution time of an SQL query and etc) of a MySQL database itself without using any 3rd party tools or software. The database configurations which I am having are listed below.
MySQL version: 5.5
Used storage engine: MyISAM
Operating system: Ubuntu 12
Hard disk capacity: 500GB
RAM: 2GB
Other: The database consists of Tables, Indexes, Stored Procedures, Scheduled Events and Views
You have said nothing about the specifics of your data, its distribution, the type of workload you use, the ratio of reads to writes, the variety of your queries, the complexity of your queries, and so on. This is a vital part of the tuning process for one simple reason:
Tuning is specific to your data and your workload.
The guys who make database platforms such as MySQL pay a lot of attention to making sure the default settings are good enough for the majority of users. If there was some easy route to improving the performance of a database, they'd already have done it at the factory.
The guys who make the third party tools, on the other hand, write code that reads your data and your logs to find out information about your tables, their contents, and your queries, and that code makes best-guess estimates about tuning based on your data and your workload. They're not perfect, but they sure beat having to do that stuff manually if you don't know how to.
Think of tuning a database like tuning a guitar: You start with an idea of what you want (Standard tuning? Drop D? DADGAD?) and then you make small adjustments to one string at a time, measuring it against your desired result. Once you've achieved the best possible result for that string, you move onto the next one and make small changes there etc. When you get to the final string, you might have adjusted the balance of the whole guitar so you might have to revisit the settings from the beginning to make tiny incremental changes until the whole lot is singing perfectly.
Read http://dev.mysql.com/doc/refman/5.5/en/server-parameters.html to get started on the most important "strings" to tune in MySQL 5.5. There are lots, but none of them are particularly difficult on their own.
As a tangent, tuning your server away from the defaults might give you a 5-10% boost in performance. You'd be much better spending your time looking at your database design, data types, and the indexes you're using. You can often make 50%-100% improvements in performance by doing that sort of thing.
You should find http://www.mysqlcalculator.com/ helpful for starters.
This will show you some critical general defaults and allow you to
enter your own values as displayed by
SHOW GLOBAL VARIABLES
to calculate MySQL maximum memory usage.
This will only scratch the surface - and will be enlightening.
There is NO simple answer.

Maximum capabilities of MySQL

How do I know when a project is just to big for MySQL and I should use something with a better reputation for scalability?
Is there a max database size for MySQL before degradation of performance occurs? What factors contribute to MySQL not being a viable option compared to a commercial DBMS like Oracle or SQL Server?
Google uses MySQL. Is your project bigger than Google?
Smart-alec comments aside, MySQL is a professional level database application. If your application puts a strain on MySQL, I bet it'll do the same to just about any other database.
If you are looking for a couple of examples:
Facebook moved to Cassandra only after it was storing over 7 Terabytes of inbox data. (Source: Lakshman, Malik: Cassandra - A Decentralized Structured Storage System.) (... Even though they were having quite a few issues at that stage.)
Wikipedia also handles hundreds of Gigabytes of text data in MySQL.
I work for a very large Internet company. MySQL can scale very, very large with very good performance, with a couple of caveats.
One problem you might run into is that an index greater than 4 gigabytes can't go into memory. I spent a lot of time once trying to improve the MySQL's full-text performance by fiddling with some index parameters, but you can't get around the fundamental problem that if your query hits disk for an index, it gets slow.
You might find some helper applications that can help solve your problem. For the full-text problem, there is Sphinx: http://www.sphinxsearch.com/
Jeremy Zawodny, who now works at Craig's List, has a blog on which he occasionally discusses the performance of large databases: http://blog.zawodny.com/
In summary, your project probably isn't too big for MySQL. It may be too big for some of the ways that you've used MySQL before, and you may need to adapt them.
Mostly it is table size.
I am assuming here that you will use the Oracle innoDB plugin for mysql as your engine. If you do not, that probably means you're using a commercial engine such as infiniDB, InfoBright for Tokutek, in which case your questions should be sent to them.
InnoDB gets a bit nasty with very large tables. You are advised to partition your tables if at all possible with very large instances. Essentially, if your (frequently used) indexes don't all fit into ram, inserts will be very slow as they need to touch a lot of pages not in ram. This cannot be worked around.
You can use the MySQL 5.1 partitioning feature if it does what you want, or partition your tables at the application level if it does not. If you can get your tables' indexes to fit in ram, and only load one table at a time, then you're on a winner.
You can use the plugin's compression to make your ram go a bit further (as the pages are compressed in ram as well as on disc) but it cannot beat the fundamental limtation.
If your table's indexes don't all (or at least MOSTLY - if you have a few indexes which are NULL in 99.99% of cases you might get away without those ones) fit in ram, insert speed will suck.
Database size is not a major issue, provided your tables individually fit in ram while you're doing bulk loading (and of course, you only load one at once).
These limitations really happen with most row-based databases. If you need more, consider a column database.
Infobright and Infinidb both use a mysql-based core and are column based engines which can handle very large tables.
Tokutek is quite interesting too - you may want to contact them for an evaluation.
When you evaluate the engine's suitability, be sure to load it with very large data on production-grade hardware. There's no point in testing it with a (e.g.) 10G database, that won't prove anything.
MySQL is a commercial DBMS, you just have the option to get the support/monitoring that is offered by Oracle or Microsoft. Or you can use community support or community provided monitoring software.
Things you should look at are not only size at operations. Critical are also:
Scenaros for backup and restore?
Maintenance. Example: SQL Server Enterprise can rebuild an index WHILE THE OLD ONE IS AVAILABLE - transparently. This means no downtime for an index rebuild.
Availability (basically you do not want to have to restoer a 5000gb database if a server dies) - mirroring preferred, replication "sucks" (technically).
Whatever you go for, be carefull with Oracle RAC (their cluster) - it is known to be "problematic" (to say it finely). SQL Server is known to be a lot cheaper, scale a lot worse (no "RAC" option) but basically work without making admins want to commit suicide every hour (the "RAC" option seems to do that). Scalability "a lot worse" still is good enough for the Terra Server (http://msdn.microsoft.com/en-us/library/aa226316(SQL.70).aspx)
THere wer some questions here recently of people having problems rebuilding indices on a 10gb database or something.
So much for my 2 cents. I am sure some MySQL specialists will jump in on issues there.

Is it possible to combine Cloud Computing and MYSQL?

The main bottle neck of a web server locates usually in the database,in my case,MYSQL.
More specifically,fulltext search and master-slave replication.
And sphinx is a probable solution for fulltext-search,so master-slave replication is the
final pain in ass.
Is it possible to boost the performance significantly with the technology of Cloud Coumputing,
for instance,by services offered by Amazon?
Just a wild guess!
EDIT:what about MySQL and Google App Engine?
Of course. MySQL Enterprise for Amazon EC2 is one MySQL package for Amazon EC2. See also Setting Up MySQL on an EC2 AMI and this tutorial/blog post.
EDIT: App Engine is higher-level than EC2 and is really designed for BigTable/GQL only. However, look at approcket, which allows replicating between AppEngine and MySQL.
You may want to be careful with just switching your web app to use an external data base (ie amazon, et.al.), you want to understand where exactly is your bottleneck or you may end up introducing more performance problems... Remember that by going to an external DB, you're introducing more latency into each query compared to a local (box or net) query.
If your problem is performance, try to find out exactly where the problem lies first, and then you may want to explore other options like query optimization, caching, etc.
Possible - for sure. See for example, xeround, rightscale, Amazon and phpfog. There are probably at least a few more with more to come. They come in varying degrees of "freeness" (How's that for a made up word?) too.
The question, it seems to me, will be performance and reliability.
Who knows, localhost may become a thing of the past for development.

What is the best way to diagnose and profile MySQL in live production server?

What tools/methods do you recommend to diagnose and profile MySQL in live production server?
My goal to test alternative ways of scaling up the system and see their influence on read/write timing, memory, CPU load, disk access etc. and to find bottlenecks.
First of all you should set up some kind of monitoring with e.g.:
MySQL Enterprise Monitor
MONyog
Cacti (free)
Munin (free)
MySQL Activity Report (free)
Other may helpful tools: mytop innotop mtop maatkit
In addtion you should enable logging slow-queries in your my.cnf.
Befor you start to tune/change parameters you should create some kind of
test plan and compare the before/after results to see wether your changes
made sense or not.
This is something that I have worked on quite a bit.
MonYog - MySQL monitoring service. We use this in production. It is not free but has a lot of features, including alerts and historical data.
MySQL Enterprise Monitor - available with MySQL enterprise (i.e., not cheap)
Roll Your Own!
About the roll your own option:
We actually developed a really cool monitoring application that uses RRD tool (used by the common MRTG) and a combination of MySQL statistics, and system stats, such as iostat. This was not only a great exercise but gave us a ton of flexibility to monitor exactly what we want from a single interface.
Here is a Brief Description of some approaches to building your own stats.
One of our big motivations for rolling our own, even though we also use MonYog, was to track disk statistics. Disk i/o can be a major bottleneck, and the standard MySQL monitoring systems do not have i/o monitoring. We use iostat which is part of the systat package.
We have an interface that displays graphs of MySQL statistics next to disk i/o stats, allowing us to really get an overall picture of how the MySQL load is affecting disk i/o.
Before this, we really had no idea why our production applications were getting bogged down. We discovered that disk i/o was a major issue, and that MySQL was creating a lot of temporary tables on disk when we were running complex queries. We were able to optimize our queries and improve disk performance dramatically.
Jet Profiler for sure
Also add to list: RHQ 4 (open source) -- http://rhq-project.org/
Vote http://tinyurl.com/vote-gif add to list:
Maatkit
dbForge studio for mysql
Jet Profiler

Is Oracle RDBMS more stable, secure, robust, etc. than MySQL RDBMS?

I've worked on a variety of systems as a programmer, some with Oracle, some with MySQL. I keep hearing people say that Oracle is more stable, more robust, and more secure. Is this the case?
If so in what ways and why?
For the purposes of this question, consider a small-medium sized production DB, perhaps 500,000 records or so.
Yes. Oracle is enterprise grade software.
I'm not sure if its really any more stable that mysql, I haven't used mysql that much, but I dont ever remember having mysql crash on me. I've had oracle crash, but when it does, it gives me more information about why it crashed than I could possibly want, and Oracle support is always there to help ( for a fee ).
Its very very robust, Oracle DB will do virtually everything it can before breaking your data, I've had mysql servers do really weird things when they run out of disk space, Oracle will just halt all transactions, and eventually shutdown if it can't write the files it needs. I've never lost data in oracle, even when I do stupid things like forget the where clause and update every row rather than a single row, its very easy to get the database back to how it was before screwing up.
Not sure about security, certainly Oracle gives you lots of options for how you are going to connect to the DB and authenticate. It gives lots of options regarding which users have access to what, etc. But as with most things, if you want to take security seriously, then you need an expert to do it. Oracle certainly has a lot more to lose if they don't get security right. But, as with all things there has been exploits.
If nothing else, just consider this... When Oracle stuffs up, they have customers who are paying $40k per CPU (if they are suckers and pay list price) license + yearly maintenance fees.. This gives them a very strong intensive to make sure the customers are happy with the product.
For a small database, I'd seriously recommend Oracle XE well before mysql. It has the important features of mysql (Free), its dead easy to install, comes with a nice web interface and application framework (Application Express), if you DB will happy run on a single cpu, 1gb ram and 4gb data, then XE is the way to go IMHO.
Mysql has its uses, many many people have shown that you can build great things with it, but its far behind oracle (and SQL Server, and DB2) in terms of features... But then, its also free and very easy to learn, which for many people is the most important feature.
I've had Oracle create a corrupt database when the disk ran out of space. It's hard to debug, uses loads of resources and is difficult to work with without seriously skilled DBA's holding your hand. Oracle even replaced system binaries (e.g. gcc) in /usr/bin/ when I installed in on an occation.
Working with PostgreSQL, on the other hand, has been much more pleasant. It gives readable error messages and acts in a more understandable way if you're used to work with open source *nix systems. It's quite easy to set up replication, thus making your data fairly secure.
A 500K record database can probably be run on your mobile phone. Seriously, it's so small that both Oracle XE and MySQL will be more than sufficient to manage it.
for smallish DBs (a few million records), Oracle is overkill
you need an experienced DBA to properly install and manage an Oracle system
Oracle has a larger "base overhead", i.e. you need a beefier machine to run Oracle
the "out of the box" experience of Oracle used to be atrocious (i haven't installed an oracle system in years; no idea how it currently behaves), while mysql is very nice
Oracle is a beast that really needs DBA knowledge. I concur with those who say 500k records are nothing. It's not worth the complexity of Oracle if it's simple numeric/text data.
On the other hand, Oracle is extremely efficient with blobs. If each of your records was a 100MB binary file, you'd need a fortune to run it on Oracle (I'd recommend a 3-node RAC cluster with a good SAN).
I have a project that sends data (~10M rows, 1.2GB of data) to three different databases, 2 Oracle and 1 MySQL. I haven't had problems working with either system, nor have I seen any major advantages on either side. If you're in a place that already uses Oracle for other projects, adding on one new database shouldn't be too much of a problem, but if you're thinking of setting up a new database server and don't have anything in place already, MySQL will save you the money.
Oracle Enterprise assumes that there is an Enterprise to support it, ie, a real Oracle DBA. A novice (but competent) DBA should be able to secure MySQL much more easily than Oracle, just because Oracle is inherently more complex. Of course, Oracle has the Enterprise monitoring tools beyond what MySQL currently features (as far as I've seen) but the DBA needs to be able use them to be effective.
Such a small database as you describe could be handled by most anything so I can't see that Oracle would be warranted unless the infrastructure was already in place. Both have replication, transactions and warm-backups so either would serve well.
The answer depends entirely on how you configure each DBMS.
Both are capable of handling 500,000 records many times over.
Oracle is a lot beefier. Many of its features would only be looked for in a larger enterprise or high-performance setting. They're mainly features to do with scaling, replication and load balancing.
For small DBs, consider SQLite. For small-medium, look at MySQL or PostgreSQL. For the largest, look at MSSQL, Oracle, DB2, etc.
Edit: Having read the other answer, I'll add that if your data is really, really critical, you'll want a replicated setup and you'll probably want to look to one of the big DB providers for something like that.
If you can sacrifice potential (exceedingly rare) data losses and would prefer improved performance, look at some of the lighter-weight options.
It's true that Oracle is a beast.
It is also true that Oracle is widely considered the most secure major database.
The problem is that Oracle's devs don't appear to grasp critical security consepts. Oracle is the least secure database server on the market (According to independent security researchers)
http://itic-corp.com/blog/2010/09/sql-server-most-secure-database-oracle-least-secure-database-since-2002/
MySQL is actually fairly secure according to these researchers. I don't know much about the tools available for it. What's most amusing about this research is that the same people who would call Microsoft SQL server a toy would have their data stolen by attackers that MSSQL would thwart because they are using a beast that has a terrible security model rather than a "toy" that is secure.
I'm using Oracle/SQL Server/MySql for different applications and site
No Database beat can Oracle in many different area, but it's the most database that require deep knowledge for the administration.
and if you found a problem with oracle, may spend few times to solve it even with good DBAs guys.
You can go with MySql for 500K or millions of records, it's more light than other DB, and require zero administration work, and will not take a lot of your computer resources, I always have it in my development PC, and never had faced any serious problem with it.
I would require you go with MySql or PostgreSQL if you don't need the advanced featuers of Oracle.