I am trying to find out which MySQL table engine is best for each of our table and requirements.
The tables with many reads(SELECT queries) are MyISAM.
The tables with many writes(INSERT/UPDATE queries) are InnoDB. These are the only two types that we used, but now we have different scenarios and we do not know which DB engine is best.
1)We have a table users that we UPDATE/SELECT very often, like 1 row every second for SELECT and 1 row every 1 second for UPDATE, but the INSERTS are rare, like 1 every 300 seconds. For this we chose MyISAM.
2)We have a table users_data where we INSERT data as often as we do it in table users, like every 300 seconds, but we do not UPDATE this table too often, but we read from it once every 1 second. For this we chose MyISAM
3)We have a table transactions where we INSERT data very often, like 1 row every 4-5 seconds, and we SELECT large packs from this every 20-30 seconds (we make many SUM's often from this table based on userid). For this we chose MyISAM.
4)We have a table transactions_logs where we store id (which is the same as transactions table), merchant name, email and we INSERT data very often, like 1 row every 4-5 seconds, but we read this very rarely. For this we chose InnoDB.
Rarely we join table transactions and transactions_logs for statistics.
5)We have a table pages where we only SELECT data very often,like 1 row per second. For this we chose MyISAM and we turned on MySQL cache.
Questions:
a)We have another table with 1 INSERT every 100000 seconds, but many SELECT/UPDATE queries per second? What type should this be? We are using MyISAM for now for this type.
We read data from it, we modify it, then we update it and we do this once per 1-2 seconds. Is MyISAM the best option for this?
b)Do you think that we should've used InnoDB for all tables? I've read that since MySQL 5.6, InnoDB is the default table type and probably it was optimised a lot.
Fundamentally, I use the following two differences between MyISAM and InnoDB to choose which one to use in a specific scenario:
InnoDB supports transactions, MyISAM does not.
InnoDB has row-level locking, MyISAM has table-level locking.
(Source: MySQL 5.7 Reference Manual)
My rule of thumb is to use MyISAM when there are a high number of select queries and low number of update/insert queries. Whenever write performance, or data integrity are of importance I'll use InnoDB.
While the above is useful as a starting point, every database, and every application, are different. The specific details of your hardware and software setup will ultimately dictate which engine choice is best. When in doubt, test!
However, I will say that, based on the numbers provided, and assuming 'modern' server hardware, you're not anywhere near the performance limits of MySQL so either engine would suffice.
MyISAM Works great for read only loads and Write and Read forever loads. It handles co-currency with locking the entire table on writes. This can make it very slow on write heavy loads.
INNODB Is a little more complicated, adds some configuration options that must be configured somewhat properly. This adds support for row level locking, which is great for rows that are added, updated less than 1 per second ideally (giving plenty of time to read/write).
Related
I've got a MySQL table that has around 1.5 million records and the table size is 1.3GB
I am using a soft delete mechanism in that table, which means I've a column deleted_at which indicates whether the row has been deleted and when. if the record is not deleted then deleted_at value is NULL
From these 1.5 million records, only 30K are not soft deleted. that means they are accessed frequently, while the other records are barely accessed, but they are, in some cases.
So this table is heavily used and queried for the none deleted records, and sometimes for the soft deleted records.
I have a BTREE index type for the deleted_at record (with cardinality of 35K). The table becomes heavier with time and obviously it is not a scalable solution.
The table engine is MyISAM. most of the other tables are InnoDB but this table is queried heavily with STORED PROCEDURE, and when I changed to InnoDB the queries were way slower.
I am looking for a solution that will not involve hardware changes. the current hardware is sufficient for that table to have good performance, but this will not be the case once this table will grow more.
Things I thought of :
partitioning, but I cannot use partitions as some of the columns are FULL TEXT indexed.
split data into two tables. one for the deleted rows and one for none deleted rows which accessed and queried frequently. this change requires a lot of infrastructure changes so I am not in a hurry to do this.
creating a new table that will sync with the original table once in 10/20min instead of splitting and will contain only the none deleted rows. that will require small infrastructure changes as well the maintenance is much easier and safe. splitting into two table could result in missing records due to queries failures, as "DELETE" operation will actually move row from one table to another, and thus requires sophisticated mechanism
What other options do I have? can I give priority to some rows in a table with MySQL? memory wise.
I've got 10.3.20-MariaDB and 32GB of RAM
MyISAM does not cache rows, it only caches indexes. It relies on the filesystem cache for buffering rows.
So you could try to ensure at least the index is fully loaded into the cache:
Increase key_buffer_size so it's at least as large as your MyISAM indexes for this table. Use SHOW TABLE STATUS to find out the index size.
If you have multiple MyISAM tables, you might need to dedicate a key cache specifically for this table. See CACHE INDEX.
Pre-load the index into the key cache at startup. See LOAD INDEX INTO CACHE.
You may also want to consider multi-column indexes tailored to your queries. Like if you have a query WHERE user_id = 1234 AND deleted_at IS NULL, you should create an index on (user_id, deleted_at).
Which indexes you need depend on the queries you want to optimize.
Frankly, I would split the table so deleted rows are in a second table. That would reduce your table size by 98%, and that might make queries run quick enough that you don't need to use MyISAM anymore.
I am currently trying to figure out why the site I am working on (Laravel 4.2 framework) is really slow at times, and I think it has to do with my database setup. I am not a pro at all so I would assume that where the problem is
My sessions table has roughly 2.2 million records in it, when I run show processlist;, all the queries that take the longest relate to that table.
Here is a picture for example:
Table structure
Surerly I am doing something wrong or it's not index properly? I'm not sure, not fantastic with databases.
We don't see the complete SQL being executed, so we can't recommend appropriate indexes. But if the only predicate on the DELETE statements is on the last_activity column i.e.
DELETE FROM `sessions` WHERE last_activity <= 'somevalue' ;
Then performance of the DELETE statement will likely be improved by adding an index with a leading column of somevalue, e.g.
CREATE INDEX sessions_IX1 ON sessions (last_activity);
Also, if this table is using MyISAM storage engine, then DML statements cannot execute concurrently; DML statements will block while waiting to obtain exclusive lock on the table. The InnoDB storage engine uses row level locking, so some DML operations can be concurrent. (InnoDB doesn't eliminate lock contention, but locks will be on rows and index blocks, rather than on the entire table.)
Also consider using a different storage mechanism (other than MySQL database) for storing and retrieving info for web server "sessions".
Also, is it necessary (is there some requirement) to persist 2.2 million "sessions" rows? Are we sure that all of those rows are actually needed? If some of that data is historical, and isn't specifically needed to support the current web server sessions, we might consider moving the historical data to another table.
We have about 60-70 databases on an RDS server, and a lot of them can be deleted.
I want to do a benchmark of size before and after, and they are all (to my knowledge) innoDB tables.
So, I'm using the information_schema table per this link: https://www.percona.com/blog/2008/03/17/researching-your-mysql-table-sizes/
and this is great, except the first query listed (and I presume the others) just runs and runs and eventually finishes after EIGHT MINUTES.
I can run this query instantly:
SELECT COUNT(*) FROM information_schema.TABLES;
And get about 12,500 tables.
I also notice - ironically enough - that information_schema.TABLES has no indexes! My instinct is not to mess with that.
My best option at this point is to dump the TABLES table, and run the query on a copy that I actually index.
My questions are:
1. how dynamic is the information_schema.TABLES table and in fact that entire database
2. why is it running so slow?
3. would it be advisable to index some key fields to optimize the queries I want to do?
4. If I do do an SQL dump, will I be getting current table size information?
Thanks, I hope this question is instructive.
information_schema is currently a thin layer on top of some older stuff. The older stuff needed to "open" each table to discover its size, etc. That involved reading at least the .frm. But it did not need to open in order to count the number of tables. Think of the difference between SHOW TABLES and SHOW TABLE STATUS.
table_open_cache and table_definition_cache probably did have all the tables in them when you did the 8 minute query. Anyway, the values for those VARIABLES may have been less than 12,500, implying that there would have been churn.
In the future (probably 5.8), all that info will probably be sitting in a single InnoDB table instead of splayed across the OS's file system. At that point, it will be quite fast. (Think of how fast a table scan of 12,500 rows can be done, especially if fully cached in RAM.)
Since the information_schema does not have "real" tables, there is no way to add INDEXes.
mysqldump does not provide the table size info. Even if it did, it would be no faster, since it would go through the same, old, mechanism.
60 is a questionably large number of databases; 12K is a large number of tables. Often this implies a schema design that chooses to create multiple tables instead of putting data into a single table?
A Database already has up to 25-30 tables and all are MyISAM. Most of these tables are related to each other meaning a lot of queries use joins on IDs and retrieve data.
One of the tables contain 7-10 Million records and it becomes slow if i want to perform a search or update or even retrieval of all data. Now i proposed a solution to my boss saying that converting tables into InnoDB might give better performance.
I also explained the benefits of InnoDB:
Since we anyways join multiple tables on keys and they are related, it will be better to use foreign keys and have relational database which will avoid Orphan Rows. I found around 10-15k orphan rows in one of the big tables and had to manually remove them.
Support for transactions, we perform big updates from time to time and if one of them fails on the way we have to replace the entire table with the backed-up one and run the update again to make sure that all queries were executed. With InnoDB we can revert back any changes from query 1 if query 2 fails.
Now the response i got from my boss is that I need to prove that InnoDB will run faster than MyISAM. My question is, wont above 2 things improve the speed of the application itself by eliminating orphan rows?
In general is MyISAM faster than InnoDB?
Note: using MySQL 5.5
You should also mention to your boss probably the biggest benefit you get from InnoDB for large tables with both read/write load - You get row-level locking rather than table-level locking. This can be a great performance benefit for the application in cases where you see a lot of waits for table locks to be released.
Of course the best way to convince your boss is to prove it. Make copies of your large table and place on a testing database. Make one version of data in MyISAM and one in InnoDB. Then run load testing against it with a load mix that approximates your current DB read/write activity. Find out for yourself if it is better.
Just updated for your comment that you are on 5.5. With 5.5 it is a no brainer to use InnoDB. MyISAM engine basically has seen no improvement over the last several years and development effort has been around InnoDB. InnoDB is THE MySQL engine of choice going forward.
Our server (several Java applications on Debian) handles incoming data (GNSS observations) that should be:
immediately (delay <200ms) delivered to other applications,
stored for further use.
Sometimes (several times a day maybe) about million of archived records will be fetched from the database. Record size is about 12 double precision fields + timestamp and some ids. There are no UPDATEs; DELETEs are very rare but massive. Incoming flow is up to hundred records per second. So I had to choose storage engine for this data.
I tried using MySQL (InnoDB). One application inserts, others constantly check last record id and if it is updated, fetch new records. This part works fine. But I've met following issues:
Records are quite large (about 200-240 bytes per record).
Fetching million of archived records is unacceptable slow (tens of minutes or more).
File-based storage will work just fine (since there are no inserts in the middle of DB and selections are mostly like 'WHERE ID=1 AND TIME BETWEEN 2000 AND 3000', but there are other problems:
Looking for new data might be not so easy.
Other data like logs and configs are stored in same database and I prefer to have one database for everything.
Can you advice some suitable database engine (SQL preferred, but not necessary)? Maybe it is possible to fine-tune MySQL to reduce record size and fetch time for continious strips of data?
MongoDB is not acceptable since DB size is limited on 32-bit machines. Any engine that does not provide quick access for recently inserted data is not acceptable too.
I'd recommend using TokuDB storage engine for MySQL. It's free for up to 50GB of user data, and it's pricing model isn't terrible, making it a great choice for storing large amounts of data.
It's got higher insert speed compared to InnoDB and MyISAM and scales much better as the dataset grows (InnoDB tends to deteriorate once working dataset doesn't fit the RAM making its performance dependant on the I/O of the HDD subsystem).
It's also ACID compliant and supports multiple clustered indexes (which would be a great choice for massive DELETEs you're planning to do). Also, hot schema changes are supported (ALTER TABLE doesn't lock the tables, and changes are quick on huge tables - I'm talking gigabyte-sized tables being altered in mere seconds).
From my personal use, I experienced about 5 - 10 times less disk usage due to TokuDB's compression, and it's much, much faster than MyISAM or InnoDB.
Even though it sounds like I'm trying to advertise this product - I'm not, it's just simply amazing since you can use monolithic data-store without expensive scaling plans like partitioning across nodes to scale the writes.
There really is no getting around how long it takes to load millions of records from disk. Your 32-bit requirement means you are limited in how much RAM you can use for memory based data structures. But, if you want to use MySQL, you may be able to get good performance using multiple table types.
If you need really fast non-blocking inserts. You can use the black hole table type and replication. The server where the inserts occur has a black hole table type that replicates to another server where the table is Innodb or MyISAM.
Since you don't do UPDATEs, I think MyISAM would be better than Innodb in this scenario. You can use the MERGE table type for MyISAM (not available for Innodb). Not sure what your data set is like, but you could have 1 table per day (hour, week?), your MERGE table would then be a superset of those tables. Assuming you want to delete old data by day, just redeclare the MERGE table to not include the old tables. This action is instantaneous. Dropping old tables is also extremely fast.
To check for new data, you can look at "todays" table directly rather than going through the MERGE table.