Is SQLite suitable for use in a production website? - mysql

I'm rewriting a PHP+MySQL site that averages 40-50 hits a day using Django.
Is SQLite a suitable database to use here? Are there any advantages/disadvantages between them?
I'm just using the db to store a blog and the users who can edit it. I am using fulltext search for the blog search, but no complex joins anywhere.

40-50 hits per day is very small and SQLLite can be used without any problem.
MySql might be better once you will get more hit because it handles in a better way multiple connexion (lock isn't the same with MySql and SqlLite).

The major problem with sqlite is concurrency. If you expect 40-50 hits a day, that's probably a non-issue. However, if that load increases you should be ready to migrate to a database daemon such as MySQL - better abstract your database specific code to make such a switch as painless as possible.
The performance section of the SQLite wiki might be of use to you.

Since you're already using an adequate database, I don't see a reason to migrate to a smaller one.
While sqlite might be perfectly adequate, too - changing to a less-capable platform from a more-capable one doesn't seem the best choice :)

SQLite will work just fine for you. It sounds as though you're largely using the database as read-only (with occasional writes to update the content). SQLite excels at this kind of access pattern. The only place where SQLite chokes is when you have a lot of writes to a database, because once a process attempts to write the file is locked until the write is complete. Also, if you do lots of writes (like updating rows in a loop) you should look into putting all those writes into a transaction - while the file is locked once the transaction hits a write query, the updates themselves take much less time because they're written to the file at once and not individually.

SQLite would be fine for this level of traffic. It actually performs quite well, the only thing that it is lacking is caching of data and queries because it needs to be spun up every time your page is accessed. That said, it is still very quick and it shouldn't be too hard to migrate to MySQL later if need be.

Related

PostgreSQL and PQC

I'm using MySQL with Memcached, but I'm planning to start using PostgreSQL instead of MySQL.
I know Memcached can work with PostgreSQL, but I found this online: PostgreSQL Query Cache. I've seen a presentation online, and it says memcached is used in this. But I don't understand: memcached, I have to "program" in my PHP-code, and PQC, not?
What's it all about? Is PQC the same as memcached, and could it replace memcached? For example: I have a table with all countries. It never changes, so I want to cache this instead of retrieving it from the database every time. Will PQC do this automatically?
PQC is an implementation of caching that uses Memcached. It sits in front of your database server and caches query results for you. If you are running a lot of identical queries, this will make your database load a whole lot less and your return times a whole lot faster. It is not a substitute for good design of your application, but it can certainly help, and the cost of implementing it is extremely low since it takes advantage of an existing layer of abstraction.
Memcached is a lower level tool. A well designed application will leave you a nice place to put code between the business logic and the database layer to cache results, and this is where you put your memcached calls. In other words, if your code is designed to allow this abstraction, fantastic. Otherwise, you're looking at a lot more work to implement.

How to make my MySQL databases available at all times? Some expert DB advice needed!

I've been doing a lot of research, reading on replication, etc but just not sure as to what mysql solution would work.
This is what I'm looking at:
when my mysql fails for some reason or there are certain queries that are taking really long to execute and locking some tables, I want the other insert/update/select queries to still function at normal speed without having to wait for locks to be released or for the main database to be back up. I'm thinking there should be a second mysql server for this to happen, but is what I mentioned possible even if there is and would it involve a lot of change in my existing programming logic?
when my database is being backed up, I would still like my site to function normally, all inserts/selects/updates should function as normal.
when I need to alter a large table, I wouldn't like it to affect my application, there should be a backup server to work from.
So what do I need to do to get all this done and also would it require changing plenty of existing coding to suit the new set up? [My site has a lot of reads and writes]
There's no easy way. You're asking for a highly-available MySQL-based setup, and that requires a lot of work at the server and client ends.
Some issues, for example:
when I need to alter a large table, I wouldn't like it to affect my application, there should be a backup server to work from.
If you're altering the table, you can't trivially create a copy to work from during the update. What about the changes that are made to your copy while the first update is taking place?
Have a search for "High Availability MySQL". It's mostly a solved problem, but the solution depends heavily on your exact requirements. You cannot just ask for "I want my SQL server to run at full speed always forever no matter what I throw at it".
Not a MySQL specific answer, but a general one. Have a read only copy of your DB for site to render, which will be synced to the master DB regularly. This way, you can keep your site working even if the master DB is under load/lock due to insert/delete. For efficiency, keep this copy as denormalized as you can.

SQLite concurrency issue a deal breaker?

I am looking at databases for a home project (ASP.NET MVC) which I might host eventually. After reading a similar question here on Stack Overflow I have decided to go with MySQL.
However, the easy of use & deployment of SQLite is tempting, and I would like to confirm my reasons before I write it off completely.
My goal is to maintain user status messages (like Twitter). This would mean mostly a single table with user-id/status-message couples. Read / Insert / Delete operation for status message. No modification is necessary.
After reading the following paragraph I have decided that SQLite can't work for me. I DO have a simple database, but since ALL my transaction work with the SAME table I might face some problems.
SQLite uses reader/writer locks on the entire database file. That means if any process is reading from any part of the database, all other processes are prevented from writing any other part of the database. Similarly, if any one process is writing to the database, all other processes are prevented from reading any other part of the database.
Is my understanding naive? Would SQLite work fine for me? Also does MySQL offer something that SQLite wouldn't when working with ASP.NET MVC? Ease of development in VS maybe?
If you're willing to wait half a month, the next SQLite release intends to support write-ahead logging, which should allow for more write concurrency.
I've been unable to get even the simple concurrency SQLite claims to support to work - even after asking on SO a couple of times.
Edit
Since I wrote the above, I have been able to get concurrent writes and reads to work with SQLite. It appears I was not properly disposing of NHibernate sessions - putting Using blocks around all code that created sessions solved the problem.
/Edit
But it's probably fine for your application, especially with the Write-ahead Logging that user380361 mentions.
Small footprint, single file installation, fast, works well with NHibernate, free, public domain - a very nice product in almost all respects!

What's the fastest way to import a large mysql database backup?

What's the fastest way to export/import a mysql database using innodb tables?
I have a production database which I periodically need to download to my development machine to debug customer issues. The way we currently do this is to download our regular database backups, which are generated using "mysql -B dbname" and then gzipped. We then import them using "gunzip -c backup.gz | mysql -u root".
From what I can tell from reading "mysqldump --help", mysqldump runs wtih --opt by default, which looks like it turns on a bunch of the things that I can think of that would make imports faster, such as turning off indexes and importing tables as one massive import statement.
Are there better ways to do this, or further optimizations we should be doing?
Note: I mostly want to optimize the time it takes to load the database onto my development machine (a relatively recent macbook pro, with lots of ram). Backup time and network transfer time currently aren't big issues.
Update:
To answer some questions posed in the answers:
The production database schema changes up to a couple times a week. We're running rails, so it's relatively easy to run the migrate scripts on stale production data.
We need to put production data into a development environment potentially on a daily or hourly basis. This entirely depends on what a developer is working on. We often have specific customer issues that are the result of some data spread across a number of tables in the db, which needs to be debugged in a development environment.
I honestly don't know how long mysqldump takes. Less than 2 hours, since we currently run it every 2 hours. However, that's not what we're trying to optimize, we want to optimize the import onto the developer workstation.
We don't need the full production database, but it's not totally trivial to separate what we do and don't need (there are a lot of tables with foreign key relationships). This is probably where we'll have to go eventually, but we'd like to avoid it for a bit longer if we can.
It depends on how you define "fastest".
As Joel says, developer time is expensive. Mysqldump works and handles a lot of cases you'd otherwise have to handle yourself or spend time evaluating other products to see if they handle them.
The pertinent questions are:
How often does your production database schema change?
Note: I'm referring to adding, removing or renaming tables, columns, views and the like ie things that will break actual code.
How often do you need to put production data into a development environment?
In my experience, not very often at all. I've generally found that once a month is more than sufficient.
How long does mysqldump take?
If it's less than 8 hours it can be done overnight as a cron job. Problem solved.
Do you need all the data?
Another way to optimize this is to simply get a relevant subset of data. Of course this requires a custom script to be written to get a subset of entities and all relevant related entities but will yield the quickest end result. The script will also need to be maintained through schema changes so this is a time-consuming approach that should be used as an absolute last resort. Production samples should be large enough to include a sufficiently broad sample of data and identify any potential performance problems.
Conclusion
Basically, just use mysqldump until you absolutely can't. Spending time on another solution is time not spent developing.
Consider using replication. That would allow you to update your copy in real time, and MySQL replication allows for catching up even if you have to shut down the slave. You could also use a parallell MySQL instance on your normal server that replicates the data to a MyISAM table, which supports online backup. MySQL allows for this as long as the tables have the same definition.
Another option that might be worth looking into is XtraBackup from renowned MySQL performance specialists Percona. It's an online backup solution for InnoDB. Haven't looked at it myself, though, so I won't vouch for it's stability or that it's even a workable solution for your problem.

Concurrency handling using the filesystem VS an RDMBS (MySQL)

I'm building an English web dictionary where users can type in words and get definitions. I thought about this for a while and since the data is 100% static and I was only to retrieve one word at a time I was better off using the filesystem (ext3) as the database system instead of opting to use MySQL to store definitions. I figured there would be less overhead considering that you have to connect to MySQL and that in itself is a very slow operation.
My fear is that if my system were to get bombarded by let's say 500 word retrievals/sec, would I still be better off using the filesystem as the database? or will the increased filesystem reads hinder performance as opposed to something that MySQL might be doing under the hood?
Currently the hierarchy is segmented by first letter, second letter and third letter of the word. So if you were to search for the definition of "water", the script (PHP) will try to read from "../dict/w/a/t/water.word" (after cleaning up the word of problematic characters and lowercasing it)
Am I heading in the right direction with this or is there a faster solution (not counting storing definitions in memory using something like memcached)? Will the amount of files stored in any directory factor in performance? What's the rough benchmark for the number of files that I should store in a directory?
What are your grounds for your belief that this decision will matter to the overall performance of the solution? WHat does it do other than provide definitions?
Do you have MySQL as part of the solution anyway, or would you need to add it should you select it as the solution here?
Where is the definitive source of definitions? The (maybe replicated) filesystem, or some off line DB?
It seems like something that should be in a DB architecturally - filesystems are a strange place to map a large number of names to values (as is evidenced by your file system structure breaking things down by initial letters)
If it's in the DB, answering questions like "how many definitions are there?" is a lot easier, but if you don't care about such things for your application, this may not matter.
So to some extent this feels like looking to hyper optimise the performance of something whose performance won't actually make much difference to the overall solution.
I'm a fan of "make it correct, then make it fast", and "correct" would be more straightforward to achieve with a DB.
And of course, the ultimate answer would to be try both and see which one works best in your situation.
Paul
The type of lookups that a dictionary requires is exactly what a database is good at. I think the filesystem method you describe will be unworkable. Don't make it hard! Use a Database.
You can keep a connection pool around to speed up connecting to the DB.
Also, if this application needs to scale to multiple servers, the file system may be tricky to share between servers.
So, I third the suggestion. Use a DB.
But unless it's a fabulously large dictionary, caching would mean you're nearly alwys getting stuff from local memory, so I don't think this is going to be the biggest issue for your application :)
A DB sounds perfect for your needs.
I also don't see why memcached is relevant (how big is your data? Can't be more than a few GB... right?)
The data is approximately a couple of GBs. And my goal is speed, speed, speed (definitions will be loaded using XHR). The data as I said is static and is never going to change, and in no where would I using anything other than a single read operation for each request. So I'm having a pretty hard time getting convinced of using MySQL and all its bloat.
Which would be first to fail under high load using this strategy, the filesystem or MySQL? As for scaling replication is the answer since the data will never change and is only a couple of GBs.
Make it work first. Premature optimisation is bad.
Using a database enables easier refactoring of your schema, and you don't have to write an implementation of an index-based lookup, which in actual fact is nontrivial.
Saying that connecting to a database "is a very slow operation" overstates the problem. Actually connecting should not take very long, plus you can reuse connections anyway.
If you are worried about read-scaling, a 1G database is very small, so you can push readonly replicas of it to each web server and they can each read from their local copy. Provided the writes stay at a level which doesn't impact read performance, that gives you almost perfect read-scalability.
Moreover, 1G of data will fit into ram easily, so you can make it fast by loading the entire database into memory at startup time (before that node advertises itself to the load balancer).
500 lookups per second is trivially small. I would start worrying about 5000 per second per server, maybe. If you can't achieve 5000 key lookups per second on modern hardware (from a database which fits in RAM?!!), there is something seriously wrong with your implementation.
Agreeing that this is premature optimization, and that MySQL surely will be performant enough for this use case. I must add you can also use a file based database, like the very fast Tokyo Cabinet as a compromise. Sadly it doesn't have a PHP binding so you could use its grandfather, DBM.
That said, do not use a filesystem, there's no good reason to, as far as I can see.
Use a virtual Drive in your ram (google it for a how to for your distro) or if your data is provided by PHP use APC, memcache might work well with mysql. Personally I don't think the optimization you are doing here is really where you should be spending your time. 500 requests a second is massive, I think using mysql would give you better forward features for later. I think you need to concentrate on features and not speed if you want to differentiate yourself from your competitors. Also there are a few good talks about UI for the web, the server speed is only a small factor in the whole picture.
Good luck
You might also think about a no-sql database (like riak, mongo, or even redis) for something like this. They are all super-fast and help out with your replication. Mysql might be over-kill and hard-to-scale in an instance like this, but the other ones have some robust tools