(My)SQL Basics and much more - mysql

I've been using MySQL for quite some time now. Most of that time I used it with PHP, for Joomla development. Up until now, I didn't pay very much attention to optimization, since I was usually asked to finish stuff ASAP.
Now, while I know that ASAP factor is a reality, I would like to improve my knowledge of relational DBs, together with good introspection to query and db optimization. I'm planning to start working with some rather large dbs, for which my usual approach will not be possible.
Any recommendations for some good books from the area?
Thx in advance.

Joe Celko's SQL for smarties, 4th ed.
The Art Of SQL
Refactoring SQL Applications
I would not recommend you to devote yourself to MySQL only. Instead, if you can, try to gain some experience with other DBMSs, where the advanced optimizers make your job easier.

If using linux shell I recommend mtop application to watch what is happening.
In mysql configuration you can specify logging slow queries:
http://www.webdevelopmentstuff.com/112/optimizing-mysql-log-slow-queries.html
There also is a parameter that defines what is a long query. Set it to 0 when desperate :) I once had when debugging a CMS that kept sending thousands of requests that took 0.00001s each.
I also found this: http://dev.mysql.com/tech-resources/articles/using-new-query-profiler.html
And I recommend a bit of reading on indexes.
For php+mysql with slow-queries log it's also useful to know Apache Bench command:
ab -c10 -n50 http://... calls the adress 50 times with up to 10 concurrent request.
That's just a list of tips. It's not complete in any way.

Related

Database Management System for db-heavy, busy website/application?

EDIT: I've learnt, and it's probably true that YouTube uses MySQL. But it probably would be the enterprise edition and not free edition. The only alternative seems to be PostgreSQL. Long question short - - Can PostgreSQL used instead of MySQL? Is it a very good alternative in any case?
Firstly, I noticed that these are the most common names when it comes to (relational) database management systems - - DB2 (IBM), Oracle Database, Microsoft SQL, Ingres, MySQL, PostgreSQL and FireBird. So, should I presume these are the best?
Okay, of the above - - DB2 (IBM), Oracle Database and Microsoft SQL, the so-called Enterprise DBMSs, come with a bill; while MySQL (exclude enterprise version), PostgreSQL and FireBird are open source and free.
As should be clear from my previous questions here, I plan to build a photo-sharing site (something like Flickr, Picasa), and like any other, it's going to be database-heavy and (hopefully) busy.
Here's what I would love to know: (1) does any one of the free DBMSs stand up to the mark with the paid enterprise DBMSs? (2) Can any of the free DBMSs scale and perform well for enormous and busy databases without too much headbanging and facepalming?
Things in my mind w.r.t the DB:
Mature
Fast
Perform great/fine under heavy load
Perform great/fine as database grows
Scalable (smooth transition)
support for languages (preferably Python, PHP, JS, C++)
Feature-rich
etc (whatever I am missing)
PLZ NOTE: I know Facebook, Twitter etc use (or at least used) MySQL, and I see reports from time to time, how their sysadmins cry over that decision. So, please don't say, XXX uses it, so why can't you. They've started small, I am too. They've made mistakes, I don't want to. I want to keep the scaling-transition smooth. I hope I am not asking too much. Thanks.
"Which is the best database" is a huge question and is the subject of much contention. I've noticed on StackOverflow there is a tendency to close such questions; although the question is interesting, it is also quite unresolvable ;-)
FWIW, I would go with this:
Use what you know
If it doesn't conflict too heavily with the first rule, use something that is free of charge
Use what works with other parts of your stack
Use what you can hire for at reasonable cost (so, maybe not Oracle unless you really have to)
Don't optimise too early. Working slowly is much better than an unfinished, efficient website.
Also, scalability is not really to do with your db platform, but to do with how you design your site. Note also that some platforms scale better when adding more servers (MySQL) and others do better when increasing your server resources (PostgreSQL).
Please note as of today, MySql is not a free project aka as free as postgresql. One of the main reason why i had to switch over to PG. (Thankx to NPGSQL and PgAdmin III, it was a lot easier than it was rumoured)
However MySql does have number of advantages related to applications,addons,forums and looked good on resume.
PostgreSql is a much mature DBMS. It is a objectRDBMS. It has been around for more than 15 years. It is not known to have defaulted on any major issues. It is well known to handle transactions running in millions of rows successfully. The most important is, it's high rate of compliance with SQL standards. Infact in professional circles, it is more of an Oracle of Free RDBMS rather than MySql of popular applications.

How-to's for MySQL InnoDB (insert) performance optimization?

I'm still struggling with the performance of my MySQL database using the InnoDB engine. Especially the performance of data insertion, minor the performance of running queries.
I've been googling for information, how-to's and so on, but I found most information rather profound matter. Can I find somewhere on the net some basic information for "newbies", a starting point for performance optimization? The first, most import steps for InnoDB optimization, explained in a less complicated way.
I'm using the Windows platform
I used to manage a couple very large MySQL Databases (like, 1TB+). They were huge, unforgiving beasts with an endless appetite to cause me stomach problems.
I read everything I could find on MySQL Performance Tuning and innodb. Here's a summary of what helped me:
The book High Performance MySQL is good, but only gets you so far.
The blog MySQL Performance Blog (this link is to their posts tagged 'innodb') was the most useful overall resource I found on the net. They go into detail on a lot of innodb tuning issues. It gets 'ranty' at times, but overall it's great. Here's another link there on InnoDB Performance Optimization Basics that's good.
The last main thing I did to learn it was to simply read the MySQL Docs themselves. I read how every last parameter works, changed them on my server and then did some basic profiling. After a while you figure out what works by running big queries and seeing what happens. Here's a good place to start:
InnoDB Performance Tuning and Troubleshooting
In the end, it's just experimenting and working through things until you gain enough knowledge to know what works.
For newbies: innodb_flush_log_at_trx_commit=0, if you can afford to lose up to 1 second of your work if server crashes. This is the performace vs reliability tradeoff, but it will improve your write performance hugely. If you can afford battery backed write cache, use it.
Specifically on Windows, and for write performance, MariaDB 5.3 might be a better idea than stock MySQL from Oracle, since MariaDB is able to better utilize asynchronous IO on Windows. I wrote a note about it some time ago here, on standard synthetic benchmark it performs up to 500% better than stock MySQL 5.5 (see pictures at the end of the note).
However, the first and foremost thing that kills performance is the disk flushing. This is solvable if you relax durability with *innodb_flush_log_at_trx_commit* parameter, of with battery backed write cache. Also you might consider using larger transactions, they reduce the amount of disk flushes.
Try the MySQL Primer script: http://day32.com/MySQL/
I didn't use the 'net, I used books. :)
The book I used to learn MySQL is "Beginning MySQL" from Wrox Press, by Robert Sheldon and Geoff Moes. Chapter 15 goes into some basics of optimization. I liked this book a lot and think it would be good reading and has been my #1 reference. But it isn't very storage engine specific.
I have another book, Pro MySQL from apress that goes into a lot more detail about particular storage engines, but it is also much harder to read. Still a good reference though.

Running repetitive maintenance processes on LAMP

I am developing an auction site that requires maintenance scripts to be run in the background to ensure smooth running. Things such as ending auctions, starting auctions, etc...
There seem to be many options and no definite answers when I research the subject.
Is there a standard for doing this sort of thing? My research so far has uncovered these possibilities:
PHP and CRON, too slow, no evidence of anyone else using this method?
MYSQL stored procedures: don't want to deal with MYSQL language
BASH script???
C script???
I hope that someone with experience can inform me of pros and cons that I am missing, other things to think about when deciding which method to use, etc...
Speed (and efficiency) are very important.
Thanks!
I have a site that does some pretty intensive maintenance using PHP + Cron. I would recommend that since you can reuse libraries from the main application. In what way is php too slow?
Unless you doing som serious number crunching in the application, most of the burden is on the database. In that case only Bash/cscript would have worse performance than php.
This might be more of a http://serverfault.com question.
If you need something custom because the above won't work for you, pick something that uses the technologies you already have libraries and things written in, i.e. perl, python, C, php, whatever (so that you don't have to port work you've already had to do, if you need to expand it later to access things before making timing decisions), and write a custom program that does it for you. Then have cron make sure that it's always up, and it can do the more time-intensive things itself rather than in cron.
I have developed and manage a massive database system with 700+ databases [don't ask...], which need plenty of maintenance in terms of updates, summaries, synchronisation of structures, etc.
All of this is done through Bash and PHP scripts running regularly through cron. Some every 10 minutes, some hourly, some daily, some monthly - never have I had any issues with speed / performance, so long as you code the scripts and SQL statements so that they are efficient when run!
One of the most important things is to tune your MySQL indexes to ensure that your regular scheduled jobs run quickly, minimising the regular CPU hits that occur when the cron's run.

Which is the Best database for Rails application?

I am developing a Rails application that will access a lot of RSS feeds or crawl sites for data (mostly news). It will be something like Google News but with a different approach, so I'll store a lot of news (or news summaries), classify them in different categories and use ranking and recommendation techniques.
Should I go with MySQL?
Is it worthwhile using IBM DB2
purexml to store the doucuments?
Also Ruby search implementations
(Ferret, Ultrasphinx and others) are
not needed If I choose DB2. Is that correct?
What are the advantages of
PostreSQL in this?
Does it makes sense to use Couch DB in
this scenario?
I'd like to choose the best option but without over-complicating the solution. So I discarded the idea to use two different storage solutions (one for the news documents and other for the rest of the data). I'm also considering only "free" options, so I didn't look at Oracle or MS SQL Server.
purexml is heavier than SQL, so you pay more for your roundtrip between webserver and DB. If you plan to have lots of users, I'd avoid it, your better off letting your webserver cache the requests, thus avoiding creating xml(rss) everytime, if that is what you are thinking about.
I'd go with MySQL because its really good at serving and its totally free, well PostgreSQL is too, but haven't used it so I can't say.
CouchDB could make sense, but not if you plan on doing OLAP (Offline Analysis) of your data, a normal RDBMS will be better at it.
Admitting firstly that I generally don't like mysql, I will say that there has been writing on this topic regarding postgres:
http://oldmoe.blogspot.com/2008/08/101-reasons-why-postgresql-is-better.html
This is always my choice when I need a pure relational database. I don't know whether a document database would be more appropriate for your application without knowing more about it. It does sound like it's something you should at least investigate.
MySQL is probably one of the best options out there; light, easy to install and maintain, multiplatform and free. On top of that there are some good free client tools.
Something to think about; because of the nature of your system you will probably have some tables that will grow quite a lot very quickly so you might want to think about performance.
Thus, MySQL supports vertical partitioning but only from V 5.1.
It sounds to me the application you will build can easily become a large-scale web app. I would suggest PostgreSQL, for it has been known for its reliability.
You can check out the following link -- Bob Ippolito from MochiMedia tells us why they ditched MySQL for PostgreSQL. Although the posts are more than 3 years old, the issues MySQL 5.1 has recently tend to prove that they are still relevant.
http://bob.pythonmac.org/archives/category/sql/mysql/
MySQL is good in production. I haven't used PostgreSQL for rails, but it's a good solution as well.
In the dev and test environments I'd start out with SQLite (default), and perhaps migrate to your target DB in the test environment as you move closer to completion.

Where to find a good reference when choosing a database?

I and two others are working on a project at the university.
In the project we are making a prototype of a MMORPG.
We have decided to use PostgreSQL as our database. The other databases we considered were MS SQL-server and MySQL.
Does somebody have a good reference which will justify our choice? (preferably written during the last year)
Someone recently recommended me wikivs.com: MySQL vs. PostgreSQL - it is a quite detailed comparison of those two, and might be of help to you.
the most mentioned difference between MySQL and PostgreSQL is about your reading/writing ratios. If you read a lot more than you write, MySQL is usually faster; but if you do a lot of heavy updates to a table, as often as other threads have to read, then the default locking in MySQL is not the best, and PostgreSQL can be a better choice, performance-wise.
IOW, PostgreSQL scales better regarding to DB writes.
that's why it's usually said that MySQL is best for webapps, while PostgreSQL is more 'enterprisey'.
Of course, the picture is not so simple:
InnoDB tables on MySQL have a very different performance behaviour
At the load levels where PostgreSQL's better locks overtake MySQL's, other parts of your platform could be the bottlenecks.
PostgreSQL does comply better with standards, so it can be easier to replace later.
in the end, the choice has so many variables that no matter which way you go, you'll find some important issue that makes it the right choice.
Go with something that someone in your team has actual experience of using in production. All databases have issues which frequent users are aware of.
I cannot stress enough that someone in the team needs PRODUCTION experience of using it. Not using it for their homework, or to keep their list of CDs in.
All of these databases have their advantages and disadvantages. Which is better is dependent on:
Your teams experience
Your exact requirements
Your current environemnt e.g. whats your app written in and going to be hosted on?
SQL servers main problem is the cost unless you use express edition which has performance limitations however its very easy to use and has a number of good tools.
There is a comparison of the different sql versions at:
http://www.microsoft.com/sql/prodinfo/features/compare-features.mspx
You could then compare these with MySQL and PostGre.
If the purpose of this comparison is a theoretical one for your essay then you can reference web pages such as the microsoft link and compare performance, cost etc.
Postgresql has a page of case studies that you can quote and link to.
Really, any of the above would have worked for you. I personally like PostgreSQL. One solid advantage it has over MSSQL (even assuming you can get it for "free") is that PostgreSQL is non-proprietary. If you're going to introduce a dependency into your project (and re-inventing an RDBMS would be crazy), you don't want it to be a black box.