MySQL vs SQLite + UNIQUE Indexes - mysql

For reasons that are irrelevant to this question I'll need to run several SQLite databases instead of the more common MySQL for some of my projects, I would like to know how SQLite compares to MySQL in terms of speed and performance regarding disk I/O (the database will be hosted in a USB 2.0 pen drive).
I've read the Database Speed Comparison page at http://www.sqlite.org/speed.html and I must say I was surprised by the performance of SQLite but since those benchmarks are a bit old I was looking for a more updated benchmark (SQLite 3 vs MySQL 5), again my main concern is disk performance, not CPU/RAM.
Also since I don't have that much experience with SQLite I'm also curious if it has anything similar to the TRIGGER (on update, delete) events in the InnoDB MySQL engine. I also couldn't find any way to declare a field as being UNIQUE like MySQL has, only PRIMARY KEY - is there anything I'm missing?
As a final question I would like to know if a good (preferably free or open source) SQLite database manager exists.

A few questions in there:
In terms of disk I/O limits, I wouldn't imagine that the database engine makes a lot of difference. There might be a few small things, but I think it's mostly just whether the database can read/write data as fast as your application wants it to. Since you'd be using the same amount of data with either MySQL or SQLite, I'd think it won't change much.
SQLite does support triggers: CREATE TRIGGER Syntax
SQLite does support UNIQUE constraints: column constraint definition syntax.
To manage my SQLite databases, I use the Firefox Add-on SQLite Manager. It's quite good, does everything I want it to.

In terms of disk I/O limits, I wouldn't imagine that the database engine makes
a lot of difference.
In Mysql/myISAM the data is stored UNORDERED, so RANGE reads ON PRIMARY KEY will theoretically need to issue several HDD SEEK operations.
In Mysql/InnoDB the data is sorted by PRIMARY KEY, so RANGE reads ON PRIMARY KEY will be done using one DISK SEEK operation (in theory).
To sum that up:
myISAM - data is written on HDD unordered. Slow PRI-KEY range reads if pri key is not AUTO INCREMENT unique field.
InnoDB - data ordered, bad for flash drives (as data needs to be re-ordered after insert = additional writes). Very fast for PRI KEY range reads, slow for writes.
InnoDB is not suitable for flash memory. As seeks are very fast (so you won't get too much benefit from reordering the data), and additional writes needed to maintain the order are damaging to flash memory.
myISAM / innoDB makes a huge difference for conventional and flash drives (i don't know what about SQLite), but i'd rather use mysql/myisam.

I actually prefer using SQLiteSpy http://www.portablefreeware.com/?id=1165 as my SQLite interface.
It supports things like REGEXP which can come in handy.

Related

Improving MySQL performance on RDS by partitioning

I am trying to improve a performance of some large tables (can be millions of records) in a MySQL 8.0.20 DB on RDS.
Scaling up DB instance and IOPS is not the way to go, as it is very expensive (the DB is live 24/7).
Proper indexes (including composite ones) do already exist to improve the query performance.
The DB is mostly read-heavy, with occasional massive writes - when these writes happen, reads can be just as massive at the same time.
I thought about doing partitioning. Since MySQL doesn't support vertical partitioning, I considered doing horizontal partitioning - which should work very well for these large tables, as they contain activity records from dozens/hundreds of accounts, and storing each account's records in a separate partition makes a lot of sense to me.
But these tables do contain some constraints with foreign keys, which rules out using MySQL's horizontal partitioning : Restrictions and Limitations on Partitioning
Foreign keys not supported for partitioned InnoDB tables. Partitioned tables using the InnoDB storage engine do not support foreign keys. More specifically, this means that the following two statements are true:
No definition of an InnoDB table employing user-defined partitioning may contain foreign key references; no InnoDB table whose definition contains foreign key references may be partitioned.
No InnoDB table definition may contain a foreign key reference to a user-partitioned table; no InnoDB table with user-defined partitioning may contain columns referenced by foreign keys.
What are my options, other than doing "sharding" by using separate tables to store activity records on a per account basis? That would require a big code change to accommodate such tables. Hopefully there is a better way, that would only require changes in MySQL, and not the application code. If the code needs to be changed - the less the better :)
storing each account's records in a separate partition makes a lot of sense to me
Instead, have the PRIMARY KEY start with acct_id. This provides performance at least as good as PARTITION BY acct_id, saves disk space, and "clusters" an account's data together for "locality of reference".
The DB is mostly read-heavy
Replicas allows 'infinite' scaling of reads. But if you are not overloading the single machine now, there may be no need for this.
with occasional massive writes
Let's discuss techniques to help with that. Please explain what those writes entail -- hourly/daily/sporadic? replace random rows / whole table / etc? keyed off what? Etc.
Proper indexes (including composite ones) do already exist to improve the query performance.
Use the slowlog (with long_query_time = 1 or lower) to verify. Use pt-query-digest to find the top one or two queries. Show them to us -- we can help you "think out of the box".
read-heavy
Is the working set size less than innodb_buffer_pool_size? That is, are you CPU-bound and not I/O-bound?
More on PARTITION
PRIMARY KEY(acct_id, ..some other columns..) orders the data primarily on acct_id and makes this efficient: WHERE acct_id=123 AND ....
PARTITION BY .. (acct_id) -- A PARTITION is implemented as a separate "table". "Partition pruning" is the act of deciding which partition(s) are needed for the query. So WHERE acct_id=123 AND ... will first do that pruning, then look for the row(s) in that "table" to handle the AND .... Hopefully, there is a good index (perhaps the PRIMARY KEY) to handle that part of the filtering.
The pruning is sort of takes the place of one level of BTree. It is hard to predict which will be slower or faster.
Note that when partitioning by, say, acct_id, there is usually not efficient to start the index with that column. (However, it would need to be later in the PK.)
Big Deletes
There are several ways to do a "big delete" while minimizing the impact on the system. Partitioning by date is optimal but does not sound viable for your type of data. Check out the others listed here: http://mysql.rjweb.org/doc.php/deletebig
Since you say that the deletion is usually less than 15%, the "copy over what needs to be kept" technique is not applicable either.
Before sharding or partitioning, first analyze your queries to make sure they are as optimized as you can make them. This usually means designing indexes specifically to support the queries you run. You might like my presentation How to Design Indexes, Really (video).
Partitioning isn't as much a solution as people think. It has many restrictions, including the foreign key issue you found. Besides that, it only improves queries that can take advantage of partition pruning.
Also, I've done a lot of benchmarking of Amazon RDS for my current job and also a previous job. RDS is slow. It's really slow. It uses remote EBS storage, so it's bound to incur overhead for every read from storage or write to storage. RDS is just not suitable for any application that needs high performance.
Amazon Aurora is significantly better on latency and throughput. But it's also very expensive. The more you use it, the more you use I/O requests, and they charge extra for that. For a busy app, you end up spending as much as you did for RDS with high provisioned IOPS.
The only way I found to get high performance in the cloud is to forget about managed databases like RDS and Aurora, and instead install and run your own instance of MySQL on an ec2 instance with locally-attached NVMe storage. This means the i3 family of ec2 instances. But local storage is ephemeral instance storage, so if the instance restarts, you lose your data. So you must add one or more replicas and have a failover plan.
If you need an OLTP database in the cloud, and you also need top-tier performance, you either have to spend $$$ for a managed database, or else you need to hire full-time DevOps and DBA staff to run it.
Sorry to give you the bad news, but the TANSTAAFL adage remains true.

Is InnoDB (MySQL 5.5.8) the right choice for multi-billion rows?

So, one of my tables in MySQL which uses the InnoDB storage engine will contain multi-billion rows(with potentially no limit to how many will be inserted).
Can you tell me what sort of optimizations i can do to help speed up things?
Cause with a few million rows already, it will start getting slow.
Of course if you suggest to use something else. The only options i have are PostgreSQL and Sqlite3. But I've been told that sqlite3 is not a good choice for that.
As for postgresql, i have absolutely no idea how it is, as i've never used it.
I imagine though, at least about 1000-1500 inserts per second in that table.
A simple answer to your question would be yes InnoDB would be the perfect choice for a multi-billion row data set.
There is a host of optimization that is possbile.
The most obvious optimizations would be setting a large buffer pool, as buffer pool is the single most important thing when it comes to InnoDB because InnoDB buffers the data as well as the index in the buffer pool. If you have a dedicated MySQL server with only InnoDB tables, then you should set upto 80% of the available RAM to be used by InnoDB.
Another most important optimization is having proper indexes on the table (keeping in mind the data access/update pattern), both primary and secondary. (Remember that primary indexes are automatically appended to secondary indexes).
With InnoDB there are some extra goodies, such as protection from data corruption, auto-recovery etc.
As for increasing write-performance, you should setup your transaction log files to be upto a total of 4G.
One other thing that you can do is partition the table.
You can eek out more performance, by setting the bin-log-format to "row", and setting the auto_inc_lock_mode to 2 (that will ensure that innodb does not hold table level locks when inserting into auto-increment columns).
If you need any specific advice you can contact me, I would be more than willing to help.
optimizations
Take care not to have too many indexes. They are expensive when inserting
Make your datatypes fit your data, as tight fit you can. (so don't go saving ip-adresses in a text or a blob, if you know what i mean). Look in to varchar vs char. Don't forget that because varchar is more flexible, you are trading in some things. If you know a lot about your data it might help to use char's, or it might be clearly better to use varchars. etc.
Do you read at all from this table? If so, you might want to do all the reading from a replicated slave, although your connection should be good enough for that amount of data.
If you have big inserts (aside from the number of inserts), make sure your IO is actually quick enough to handle the load.
I don't think there is any reason MySQL wouldn't support this. Things that can slow you down from "thousands" to "millions" to "billions" are stuff like aforementioned indexes. There is -as far as i know- no "mysql is full" problem.
Look into Partial indexes. From wikipedia (quickest source I could find, didn't check the references, but I'm sure you can manage:)
MySQL as of version 5.4 does not
support partial indexes.[3] In MySQL,
the term "partial index" is sometimes
used to refer to prefix indexes, where
only a truncated prefix of each value
is stored in the index. This is
another technique for reducing index
size.[4]
No idea on the MySQL/InnoDB part (I'd assume it'll cope). But if you end up looking at alternatives, PostgreSQL can manage a DB of unlimited size on paper. (At least one 32TB database exists according to the FAQ.)
Can you tell me what sort of optimizations i can do to help speed up things?
Your milage will vary depending on your application. But with billions of rows, you're at least looking into partitioning your data, in order to work on smaller tables.
In the case of PostgreSQL, you'd also look into creating partial indexes where appropriate.
You may want to have a look at:
http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/
http://forums.whirlpool.net.au/archive/954126
If you have a very large table (Billions of records) and need to data mine the table (queries that read lots of data), mysql can slow to a crawl.
Large databases (200+GB) are fine, but they are bound by IO/ temp table to disk and multiple other issues when attempting to read large groups that don't fit in memory.

Best Table Engine for massively updated MySQL table. MyISAM or HEAP?

I am creating an application which will store a (semi) real-time feed of a few different scales around a certain location. The weights of each scale will be put in a table with only as many rows as scales. The scale app feeds the MySQL database a new weight every second, which a PHP web app reads every 3 seconds. It doesn't seem like very much traffic that would page the hard drive very much, or if the difference would be negligible, but I'm wondering if it would be more efficient or make more sense to use a Memory/HEAP table vs a normal MyISAM table.
With anything from 100's to 1000's of concurrent read/write requests (think typical OLTP usage) innodb will out perform myisam hands down.
It's not about other people's observations, it's not about transactional/acid support, it's about the architecture of innodb which is far superior to that of the legacy myisam engine.
For example, innodb supports clustered primary key indexes http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html.
Additionally, innodb has row level locking which is far more performant under concurrent load than myisam table level locking.
I could keep going but somone's already provided a really good summary of why innodb is a better choice for OLTP: http://tag1consulting.com/MySQL_Engines_MyISAM_vs_InnoDB
Well, if you're expecting a large amount of data, I think you almost have to go MyISAM. You'll likely run out of memory if you store it all in a memory table. Not to mention that you'll lose all of your data upon power loss with a HEAP engine (Keep in mind, you may want that depending on your use case)...
I know that this question is getting dated and you've probably made a very good solution by now but I just wanted to point out to anyone who may be reading this that perhaps a relational database isn't the best way to solve this problem. To me this clearly looks like a case where a flat file database is the ideal solution. You could have saved yourself a ton of overhead by just writing these values out to a binary file and then use simple mathematical operations to select rows and fields.

Converting MyISAM to InnoDB. Beneficial? Consequences?

We're running a social networking site that logs every member's action (including visiting other member's pages); this involves a lot of writes to the db. These actions are stored in a MyISAM table and since something is starting to tax the CPU, my first thought was that it's the table locking of MyISAM that is causing this stress on the CPU.
There are only reads and writes, no updates to this table. I think the balance between reads and writes is about 50/50 for this table, would InnoDB therefore be a better option?
If I want to change the table to InnoDB and we don't use foreign key constraints, transactions or fulltext indexes - do I need to worry about anything?
Notwithstanding any benefits / drawbacks of its use, which are discussed in other threads ( MyISAM versus InnoDB ), migration is a nontrivial process.
Consider
Functionally testing all components which talk to the database if possible - difference engines have different semantics
Running as much performance testing as you can - some things may improve, others may be much worse. A well-known example is SELECT COUNT(*) on a large table.
Checking that all your code will handle deadlocks gracefully - you can get them without explicit use of transactions
Estimate how much space usage you'll get by converting - test this in a non-production environment.
You will doubtless need to change things in a large software platform; this is ok, but seeing as you (hopefully) have a lot of auto-test coverage, change should be acceptable.
PS: If "Something is starting to tax the CPU", then you should a) Find out what, in a non-production environment, b) Try various options to reduce it, in a non-production environment. You should not blindly start doing major things like changing database engines when you haven't fully analysed the problem.
All performance testing should be done in a non-production environment, with production-like data and on production-grade hardware. Otherwise it is difficult to interpret results correctly.
With regards to other potential migration problems:
1) Space - InnoDB tables often require more disk space, though the Barracuda file format for new versions of InnoDB have narrowed the difference. You can get a sense for this by converting a recent backup of the tables and comparing the size. Use "show table status" to compare the data length.
2) Full text search - only on MyISAM
3) GIS/Spatial datatypes - only on MyISAM
On performance, as the other answers and the referenced answer indicate, it depends on your workload. MyISAM is much faster for full table scans. InnoDB tends to be much faster for highly concurrent access. InnoDB can also be much faster if your lookups are based on the primary key.
Another performance issue is that MyISAM can always keep a row count, since it only does table level locking. So, if you're frequently trying to get the row count for a very large table, it may be much slower with InnoDB. Search the Internet if you need a workaround for this, as I've seen several proposed.
Depending on the size of the table(s), you may also need to update your MySQL config file. At the very least, you may want to shift bytes from key_buffer to innodb_buffer_pool_size. You won't get a fair comparison if you leave the database as being optimized for MyISAM. Read up on all the innodb_* configuration properties.
I think it's quite possible that switching to InnoDB would improve performance, but In my experience, you can't really be sure until you try it. If I were you, I would set up a test environment on the same server, convert to InnoDB and run a benchmark.
From my experience, MyISAM tables are only useful for text indexing where you need good performance with searches on big text, but you still don't need a full fledged search engine like Solr or ElasticSearch.
If you want to switch to InnoDB but want to keep indexing your text in a MyISAM table, I suggest you take a look at this: http://blog.lavoie.sl/2013/05/converting-myisam-to-innodb-keeping-fulltext.html
Also: InnoDB supports live atomic backups using innobackupex from Percona. This is godsent when dealing with production servers.

Will a MySQL table with 20,000,000 records be fast with concurrent access?

I ran a lookup test against an indexed MySQL table containing 20,000,000 records, and according to my results, it takes 0.004 seconds to retrieve a record given an id--even when joining against another table containing 4,000 records. This was on a 3GHz dual-core machine, with only one user (me) accessing the database. Writes were also fast, as this table took under ten minutes to create all 20,000,000 records.
Assuming my test was accurate, can I expect performance to be as as snappy on a production server, with, say, 200 users concurrently reading from and writing to this table?
I assume InnoDB would be best?
That depends on the storage engine you're going to use and what's the read/write ratio.
InnoDB will be better if there are lot of writes. If it's reads with very occasional write, MyISAM might be faster. MyISAM uses table level locking, so it locks up whole table whenever you need to update. InnoDB uses row level locking, so you can have concurrent updates on different rows.
InnoDB is definitely safer, so I'd stick with it anyhow.
BTW. remember that right now RAM is very cheap, so buy a lot.
Depends on any number of factors:
Server hardware (Especially RAM)
Server configuration
Data size
Number of indexes and index size
Storage engine
Writer/reader ratio
I wouldn't expect it to scale that well. More importantly, this kind of thing is to important to speculate about. Benchmark it and see for yourself.
Regarding storage engine, I wouldn't dare to use anything but InnoDB for a table of that size that is both read and written to. If you run any write query that isn't a primitive insert or single row update you'll end up locking the table using MyISAM, which yields terrible performance as a result.
There's no reason that MySql couldn't handle that kind of load without any significant issues. There are a number of other variables involved though (otherwise, it's a 'how long is a piece of string' question). Personally, I've had a number of tables in various databases that are well beyond that range.
How large is each record (on average)
How much RAM does the database server have - and how much is allocated to the various configurations of Mysql/InnoDB.
A default configuration may only allow for a default 8MB buffer between disk and client (which might work fine for a single user) - but trying to fit a 6GB+ database through that is doomed to failure. That problem was real btw - and was causing several crashes a day of a database/website till I was brought in to trouble-shoot it.
If you are likely to do a great deal more with that database, I'd recommend getting someone with a little more experience, or at least oing what you can to be able to give it some optimisations. Reading 'High Performance MySQL, 2nd Edition' is a good start, as is looking at some tools like Maatkit.
As long as your schema design and DAL are constructed well enough, you understand query optimization inside out, can adjust all the server configuration settings at a professional level, and have "enough" hardware properly configured, yes (except for sufficiently pathological cases).
Same answer both engines.
You should probably perform a load test to verify, but as long as the index was created properly (meaning indexes are optimized to your query statements), the SELECT queries should perform at an acceptable speed (the INSERTS and/or UPDATES may be more of a speed issue though depending on how many indexes you have, and how large the indexes get).