Should I use SQLite instead of MySQL? - mysql

I need to improve a PHP-MySQL web application, which only uses MySQL for REPL operations (and some search functions). 99% of the applications that I worked with never used advanced MySQL features, like replication, cross-table constraints, locking etc.
To my understanding I should instead use SQLite.
Are there any practical benefits if I do this?
Will I see a significant (>100ms) speed boost?
Should I expect problems with tables with more than 1,000,000 rows?

There is no catch-all answer to that, but there is a main point to consider: A very good rule of thumb is, that the higher your degree of concurrency is, the more you'll profit from MySQL and vice versa.
This means that in a scenario where database requests never ever are concurrent, you might see a speedup by using SQlite, though I doubt it would be in the 100ms order of magnitude.
The reason behind this is (very roughly):
In a database server environment, such as MySQL, PostgreSQL, MS SQL, Oracle and friends, a dedicated process (or a group of processes) exclusively touch the database files - the important part being dedicated. This means, that concurrency issues can be resolved in-process.
In a file-based database, such as SQlite, MS Access (Jet Engine) and friends, multiple processes will touch the DB files without knowing of each other - this implies that concurrency issues have to be resolved by writing them to the DB or helper file(s). This is typically much slower and less robust. In exchange for that, the overhead of communication between the database client (the web app) and the database server (which is in-process) is nonexistent.
Edit
After comment I want to make it more clear, that I am talking of concurrent writes, not concurrent reads. Concurrent reads of an unchanging dataset is not a hard problem - it doesn't need any locking at all.

The principal advantage of SQLite is that it is a file-based relational database that uses SQL as its query language. Being file-based tremendously simplifies deployment, making it very good for the case where an application needs a little database but must be run in an environment where having a database server would be problematic. (For example, many browsers use SQLite to manage their cookie stores; using a database server for that problem would be verging on the insane in many ways.)
The principal advantage of MySQL (with a sane table type) is that it is a database server that uses SQL as its query language. Being server-based allows for many features that a file-based system can't handle simply (such as replication) but does make things quite a bit more complex to deploy.
Whether the benefits of the additional complexity of a database server (e.g., MySQL) outweigh the costs (relative to a file-based database engine like SQLite) depends on a great many factors, notably including how many installations are expected and who is expected to perform those installations.

Related

Sharding on MySQL vs PostgreSQL

It seems like most large companies that have to shard their databases choose MySQL over PostgreSQL. What are the major advantages that MySQL has over PostgreSQL when it comes to distributed database? I don't see any major downside to Postgres that will prevent a successful implementation of sharding at the application level, but the sheer number of companies that choose MySQL over Postgres is giving me pause and making me wonder if I'm missing something.
PARTITIONing involves a single server; Sharding involves many servers. They solve (or fail to solve) different problems. Partitioning provides very few use cases to justify its existence; sharding provides write scaling at the cost of complexity.
MySQL's has no built-in sharding capability. There are 3rd party packages that assist with such, but there is still a large burden on the DBA. (See Spider and various Proxy servers.)
So, I see no reason why Postgres (or any other RDBMS) could not be sharded. After all, you do most of the work; the RDBMS sits on multiple machines not realizing that there are siblings with other chunks of the data.
(Disclaimer: I am very familiar with MySQL, and not familiar with Postgres.)

Data Base for handle large data

We have started a new project using MySQL, spring boot, and Angular js. Initially, we did not realize our DB is going to handle large data.
The number of tables will not be large (<130), only 10 to 20 tables will be contained in more data, which is almost inserted/ read/ update.
The estimated amount of data in that 10 table is going to grow at 12,00,000 records in a month, and we should not delete those data be able to do various reports.
There needs to be (read-only) replicated database as a backup/failover, and maybe for offloading reports in peak time.
I don't have first-hand experience with that large databases, so I'm asking the ones that have which DB is the best choice in this situation. as we have completed 100% coding and development but now we realize this. I have doubts may be MYSQL going to handle large data. I know that Oracle is the safe bet, interested if Mysql with a similar setup. But it is bound only in MySQL I am ok with any DB based on you all feedback I can take a call.
Open source DB more preferable but it's not mandatory we can go for paid DB also.
Handling Large Data
MySQL is more than capable of handling such loads. In fact, it is capable of handling much much more load than what you are talking about. You just have to create the right kind of tables. You can do that by choosing
the correct storage engine for your use-case
the correct character set
the optimal data type for your column
the right indexing strategy - creating indexes thoughtfully
the right partitioning strategy (if the data in the table exceeds tens of millions of records)
EDIT: You've also got to choose the right kind of data modelling and normalization strategy for your use-case. Most of OLTP applications require some level of normalization. But if you want to do analytics and aggregates on heavy tables, you should either have a Data Warehouse of have highly denormalized tables to avoid joins and/or have a column-oriented database to support such queries.
MySQL is open-source and has a very strong community support so you will find a lot of literature around any issue that you face. You can also find all the filed bugs (resolved and unresolved) here.
As far as the number of tables are concerned, there's really no cap on that. See here, MySQL permits 4 billion tables if you're using InnoDB as the engine.
A lot of very big companies with scale use MySQL in some capacity. Facebook is one of them.
Native JSON Support
With the growing popularity of JSON as the de facto data exchange format across the internet, MySQL has also provided native JSON support in 5.7, so now you can store and query JSON from your APIs, if required.
HA and Replication
MySQL Replication works! Earlier, MySQL used to support coordinate replication only but now it supports GTID replication which makes it easier to maintain and fix replication issues. There are third-party replicators also available in the market. For instance, Continuent's Tungsten is a replicator written in Java and is a replacement for native replication. It comes with a lot of configuration options which are not available with native MySQL replication.
I agree with MontyPython, MySql can do it and the design is critical. Fortunately MySql allows you to be flexible over time as needed.
I've had history tables needed used in daily reporting that grew to over a billion records in plain MySql and had no problems.
I've also used MySql Merge tables to divide up tables with big-ish rows (100KB+) to speed things up. Basically keeping the individual merge table file sizes under 30GB each. However that solution increases the open file count (in the system) per client - might be a bigger deal on a clustered system. That one was not.
That said, I like to give Honorable Mention to:
MariaDB - MySql but with contributions from Facebook, Alibaba, Google, and more.
I've moved most of my MySql community edition projects over to MariaDB and have been very happy. It's an almost transparent upgrade.
They offer an interesting enterprise Big Data Analytics (MariaDB AX) package, but with your current requirements its probably overkill and the standard community edition will fulfill your needs.
For example, here's an informative tutorial on how to set up a scalable Cluster (Galera) and adding MaxScale for High Availability:
https://mariadb.com/resources/blog/getting-started-mariadb-galera-and-mariadb-maxscale-centos
Another interesting option is Vitesse - developed at Youtube, which allows for sharded mysql through a (mostly) driver based solution. It solves the problem of needing to have available access to huge amounts of data and always yield good performance. As such, it goes beyond high availability and focuses on a solution wherein no single query (ie. a report against millions of rows of historical data) can negatively impact the other queries needing to be performed.

SQLite faster than MySQL?

I want to set up a teamspeak 3 server. I can choose between SQLite and MySQL as database. Well I usually tend to "do not use SQLite in production". But on the other hand, it's a teamspeak server. Well okay, just let me google this... I found this:
Speed
SQLite3 is much faster than MySQL database. It's because file database is always faster than unix socket. When I requested edit of channel it took about 0.5-1 sec on MySQL database (127.0.0.1) and almost instantly (0.1 sec) on SQLite 3. [...]
http://forum.teamspeak.com/showthread.php/77126-SQLite-vs-MySQL-Answer-is-here
I don't want to start a SQLite vs MySQL debate. I just want to ask: Is his argument even valid? I can't imagine it's true what he says. But unfortunately I'm not expert enough to answer this question myself.
Maybe TeamSpeak dev's have some major differences in their db architecture between SQLite and MySQL which explains a huge difference in speed (I can't imagine this).
At First Access Time will Appear Faster in SQLite
The access time for SQLite will appear faster at first instance, but this is with a small number of users online. SQLite uses a very simplistic access algorithm, its fast but does not handle concurrency.
As the database starts to grow, and the amount of simultaneous access it will start to suffer. The way servers handle multiple requests is completely different and way more complex and optimized for high concurrency. For example, SQLite will lock the whole table if an update is going on, and queue the orders.
RDBMS's Makes a lot of extra work that make them more Scalable
MySQL for example, even with a single user will create an access QUEUE, lock tables partially instead of allowing only single user-per time executions, and other pretty complex tasks in order to make sure the database is still accessible for any other simultaneous access.
This will make a single user connection slower, but pays off in the future, when 100's of users are online, and in this case, the simple
"LOCK THE WHOLE TABLE AND EXECUTE A SINGLE QUERY EACH TIME"
procedure of SQLite will hog the server.
SQLite is made for simplicity and Self Contained Database Applications.
If you are expecting to have 10 simultaneous access writing at the database at a time SQLite may perform well, but you won't want an 100 user application that constant writes and reads data to the database using SQLite. It wasn't designed for such scenario, and it will trash resources.
Considering your TeamSpeak scenario you are likely to be ok with SQLite, even for some business it is OK, some websites need databases that will be read only unless when adding new content.
For this kind of uses SQLite is a cheap, easy to implement, self contained, perfect solution that will get the job done.
The relevant difference is that SQLite uses a much simpler locking algorithm (a simple global database lock).
Using fine-grained locking (as MySQL and most other DB servers do) is much more complex, and slower if there is only a single database user, but required if you want to allow more concurrency.
I have not personally tested SQLite vs MySQL, but it is easy to find examples on the web that say the opposite (for instance). You do ask a question that is not quite so religious: is that argument valid?
First, the essence of the argument is somewhat specious. A Unix socket would be used to communicate to a database server. A "file database" seems to refer to the fact that communication is through a compiled-in interface. In the terminology of SQLite, it is server-less. Most databases store data in files, so the terminology "file database" is a little misleading.
Performance of a database involves multiple factors, such as:
Communication of query to the database.
Speed of compilation (ability to store pre-compiled queries is a plus here).
Speed of processing.
Ability to handle complex processing.
Compiler optimizations and execution engine algorithms.
Communication of results back to the application.
Having the interface be compiled-in affects the first and last of these. There is nothing that prevents a server-less database from excelling at the rest. However, database servers are typically millions of lines of code -- much larger than SQLite. A lot of this supports extra functionality. Some of it supports improved optimizations and better algorithms.
As with most performance questions, the answer is to test the systems yourself on your data in your environment. Being server-less is not an automatic performance gain. Having a server doesn't make a database "better". They are different applications designed for different optimization points.
In short:
For Local application databses, single user applications, and little simple projects keeping small data SQLite is winner.
For Network database applications, multiuser and concurrency, load balancing and growing data managements, security and roll based authentications, big projects and widely used services you should choose MySql.
In your question I do not know much about teamspeak servers and what kind of data it actually needs to keep in its database but if it just needs a local DBMS and not needs to proccess lots of concurrency and managements SQLite will be my choice.

Which are the RDBMS that minimize the server roundtrips? Which RDBMS are better (in this area) than MS SQL?

IMPORTANT NOTE: I recieved many answers and I thank you all. But all the answers are more comments than answers. My question is related on the number of roundtrips per RDBMS. An experienced person told me that MySQL has less roundtrips than Firebird. I would like that the answer stays in the same area. I agree that this is not the first thing to consider, there are many others (application design, network settings, protocol settings...), anyway I 'd like to recieve an answer to my question, not a comment. By the way I found the comments all very useful. Thanks.
When the latency is high ("when pinging the server takes time") the server roundtrips make the difference.
Now I don't want to focus on the roundtrips created in programming, but the roundtrips that occur "under the hood" in the DB engine+Protocol+DataAccessLayer.
I have been told that FireBird has more roundtrips than MySQL. But this is the only information I know.
I am currently supporting MS SQL but I'd like to change RDBMS, so to make a wise choice I would like to include also this point into "my RDBMS comparison feature matrix" to understand which is the best RDBMS to choose as an alternative to MS SQL.
So the bold sentence above would make me prefer MySQL to Firebird (for the roundtrips concept, not in general), but can anyone add informations?
And MS SQL where is it located? Is someone able to "rank" the roundtrip performance of the main RDBMS, or at least:
MS SQL, MySql, Postegresql, Firebird (I am not interested in Oracle since it is not free, and if I have to change I would change to a free RDBMS).
Anyway MySql (as mentioned several times on stackoverflow) has a not clear future and a not 100% free license. So my final choice will probably dall on PostgreSQL or Firebird.
Additional info:
somehow you can answer my question by making a simple list like:
MSSQL:3;
MySQL:1;
Firebird:2;
Postgresql:2
(where 1 is good, 2 average, 3 bad). Of course if you can post some links where the roundtrips per RDBMSs are compared it would be great
Update:
I use Delphi and I plan to use DevArt DAC (UNIDAC), so somehow the "same" Data Access component is used, so if there are significant roundtrip differences they are due to the different RDBMS used.
Further update:
I have a 2 tier application (inserting a middle tier is not an option), so by choosing a RDBMS that is optimized "roundtrip-side" I have a chance to further improve the performance of the application. This kind of "optimization" is like "buy a faster internet connection" or "put more memory on the server" or "upgrade the server CPUs". Anyway also those "optimizations" are important.
Why are you concentrating on roundtrips? Normally they shouldn't affect your performance unless you had a very slow and unreliable network. For example, the difference between ODBC and OLEDB drivers for any database is nearly an order of magnitude in favor of OLEDB.
If you go to either MySQL or Firebird using ODBC instead of OLEDB/ADO.NET drivers you incur an overhead several orders of magnituted greater than the roundtrips you might save.
How your application is coded and how and when data are accessed and transferred have a much greater impact in slow connection or high latency situations than the db network protocol itself. Some database protocols may be tuned to better work in uncommon scenarios, i.e. increasing or decreasing the data packet size.
You may also encounter slow down at the TCP/IP layer itself, which could require TCP/IP tuning as well.
Until v2.1, Firebird certainly creates more traffic than MS SQL Server. I have a friend which developed a MSSQL C/S application here in Brazil where the db is hosted in a datacenter. The client apps runs from many stores directly connecting on server over VPN/Internet using end-user broadband connections (1Mbps, mostly) for 5+ years and no trouble with it. The distances involved range from few hundred to thousands of kilometers from datacenter.
After v2.1, I can't figure out if this remains true, because I haven't made a fair comparison since and Firebird's remote protocol had been changed to optimize network traffic on slow connections. More on FirebirdSQL site.
Can't say on PostGres ou MySQL, since I didn't used any.
I can't give round trip details but i was in a very similar situation a while back when i was trying to find alternatives to MS SQL due to budgeting. myself and 4 others spent some time comparing MySQL, Postgres, and FireBird.
Having worked with MySQL for a long time we quickly ruled it out for most of our larger projects. The decision fell between Postgres and FireBird. One thing just starting off was the lack of popular support/documentation with FireBird in contrast to Postgres. Our bench tests always either had Postgres on top or on level with FireBird, never under. In terms of features; Postgres again answered our needs while FiredBird had us needing to come up with creative solutions.
Below is a feature comparison chart. i'll admit it is now a bit dated but still very helpful:
Here is also a long forum thread discussing the difference
Good luck!
Sometimes the "roundtrips" are also in the protocol or data access layer, not the "DB engine"
I will not rank the client-server DBMS's from the roundtrips side. There are a lot of options to make one DBMS the best (ask SQL Server to use the default cursor), and other the worse (create an Oracle cursor with nested datasets).
What you are looking for is, probably, the general approach, oriented on the trafic minimization and the independent work of a client from a server. That are the middle-tier data access libraries.
So, if your application is so sensitive to the trafic optimization, then look for such libraries like the DataAbstract, kbmMW or ThinDAC.

MySQL vs PostgreSQL? Which should I choose for my Django project?

My Django project is going to be backed by a large database with several hundred thousand entries, and will need to support searching (I'll probably end up using djangosearch or a similar project.)
Which database backend is best suited to my project and why? Can you recommend any good resources for further reading?
For whatever it's worth the the creators of Django recommend PostgreSQL.
If you're not tied to any legacy
system and have the freedom to choose
a database back-end, we recommend
PostgreSQL, which achives a fine
balance between cost, features, speed
and stability. (The Definitive Guide to Django, p. 15)
As someone who recently switched a project from MySQL to Postgresql I don't regret the switch.
The main difference, from a Django point of view, is more rigorous constraint checking in Postgresql, which is a good thing, and also it's a bit more tedious to do manual schema changes (aka migrations).
There are probably 6 or so Django database migration applications out there and at least one doesn't support Postgresql. I don't consider this a disadvantage though because you can use one of the others or do them manually (which is what I prefer atm).
Full text search might be better supported for MySQL. MySQL has built-in full text search supported from within Django but it's pretty useless (no word stemming, phrase searching, etc.). I've used django-sphinx as a better option for full text searching in MySQL.
Full text searching is built-in with Postgresql 8.3 (earlier versions need TSearch module). Here's a good instructional blog post: Full-text searching in Django with PostgreSQL and tsearch2
large database with several hundred
thousand entries,
This is not large database, it's very small one.
I'd choose PostgreSQL, because it has a lot more features. Most significant it this case: in PostgreSQL you can use Python as procedural language.
Go with whichever you're more familiar with. MySQL vs PostgreSQL is an endless war. Both of them are excellent database engines and both are being used by major sites. It really doesn't matter in practice.
All the answers bring interesting information to the table, but some are a little outdated, so here's my grain of salt.
As of 1.7, migrations are now an integral feature of Django. So they documented the main differences that Django developers might want to know beforehand.
Backend Support
Migrations are supported on all backends that Django ships with, as
well as any third-party backends if they have programmed in support
for schema alteration (done via the SchemaEditor class).
However, some databases are more capable than others when it comes to schema migrations; some of the caveats are covered below.
PostgreSQL
PostgreSQL is the most capable of all the databases here in terms of schema support.
MySQL
MySQL lacks support for transactions around schema alteration operations, meaning that if a migration fails to apply you will have to manually unpick the changes in order to try again (it’s impossible to roll back to an earlier point).
In addition, MySQL will fully rewrite tables for almost every schema operation and generally takes a time proportional to the number of rows in the table to add or remove columns. On slower hardware this can be worse than a minute per million rows - adding a few columns to a table with just a few million rows could lock your site up for over ten minutes.
Finally, MySQL has relatively small limits on name lengths for columns, tables and indexes, as well as a limit on the combined size of all columns an index covers. This means that indexes that are possible on other backends will fail to be created under MySQL.
SQLite
SQLite has very little built-in schema alteration support, and so
Django attempts to emulate it by:
Creating a new table with the new schema
Copying the data across
Dropping the old table
Renaming the new table to match the original name
This process generally works well, but it can be slow and occasionally
buggy. It is not recommended that you run and migrate SQLite in a
production environment unless you are very aware of the risks and its
limitations; the support Django ships with is designed to allow
developers to use SQLite on their local machines to develop less
complex Django projects without the need for a full database.
Even if Postgresql looks better, I find it has some performances issues with Django:
Postgresql is made to handle "long connections" (connection pooling, persistant connections, etc.)
MySQL is made to handle "short connections" (connect, do your queries, disconnect, has some performances issues with a lot of open connections)
The problem is that Django does not support connection pooling or persistant connection, it has to connect/disconnect to the database at each view call.
It will works with Postgresql, but connecting to a Postgresql cost a LOT more than connecting to a MySQL database (On Postgresql, each connection has it own process, it's a lot slower than just popping a new thread in MySQL).
Then you get some features like the Query Cache that can be really useful on some cases. (But you lost the superb text search of PostgreSQL)
When a migration fails in django-south, the developers encourage you not to use MySQL:
! The South developers regret this has happened, and would
! like to gently persuade you to consider a slightly
! easier-to-deal-with DBMS (one that supports DDL transactions)
Having gone down the road of MySQL because I was familiar with it (and struggling to find a proper installer and a quick test of the slow web "workbench" interface of postgreSQL put me off), at the end of the project, after a few months after deployment, while looking into back up options, I see you have to pay for MySQL's enterprise back up features. Gotcha right at the very end.
With MySql I had to write some ugly monster raw SQL queries in Django because no select distinct per group for retrieving the latest per group query. Also looking at postgreSQL's full-text search and wishing I had used postgresSQL.
I recommend PostgreSQL even if you are familiar with MySQL, but your mileage may vary.
UPDATE: DBeaver is a great equivalent of MySql Workbench gui tool but works with PostgreSQL very nicely (and many others as its a universal DB tool).
To add to previous answers :
"Full text search might be better supported for MySQL"
The FULLTEXT index in MySQL is a joke.
It only works with MyISAM tables, so you lose ACID, Transactions, Constraints, Relations, Durability, Concurrency, etc.
INSERT/UPDATE/DELETE to a largish TEXT column (like a forum post) will a rebuild a large part of the index. If it does not fit in myisam_key_buffer, then large IO will occur. I've seen a single forum post insertion trigger 100MB or more of IO ... meanwhile the posts table is exclusiely locked !
I did some benchmarking (3 years ago, may be stale...) which showed that on large datasets, basically postgres fulltext is 10-100x faster than mysql, and Xapian 10-100x faster than postgres (but not integrated).
Other reasons not mentioned are the extremely smart query optimizer, large choice of join types (merge, hash, etc), hash aggregation, gist indexes on arrays, spatial search, etc which can result in extremely fast plans on very complicated queries.
Will this application be hosted on your own servers or by a hosting company? Make sure that if you are using a hosting company, they support the database of choice.
There is a major licensing difference between the two db that will affect you if you ever intend to distribute code using the db. MySQL's client libraries are GPL and PostegreSQL's is under a BSD like license which might be easier to work with.