MYSQL DB Fast insert AND select - mysql

I am new to mysql and i want to make a table that is very fast with concurrent insertion and selection .
For example,I want to store 1 million rows in about less than 1 second and also read these rows whenever they are stored.
Any suggestions about the storage engine (MYISAM or InnoDB), how to fast insert all these rows and how to read them.
Thanks

The storage engine MyISAM is primarily for read-mostly workloads, because of locking at table level. If you really need concurrent insertion and selection, you'd rather choose the storage engine InnoDB, because it uses row locking. Be aware that InnoDB is a little bit slower because of the overhead.
In any case, make sure you're using batch inserts. Try to keep the number of indices on the table as low as possible to not have index creation overhead. Also you should configure your MySQL server for good performance. For example I would use innodb_flush_log_at_trx_commit=0 in your MySQL server configuration, if you don't mind loosing one second of data when your server crashes. There are a few books on optimizing MySQL, look for "High Performance MySQL".
Besides software, also the hardware plays an important role. You're likely to be disk bound. Thus having a fast disk is essential (for example SSD or RAID).

Related

Which engine to be used for more than 100 insert query per second

Which engine to be used for more than 100 insert query per second
I read differences and pros and cons of MYISAM and Innodb.
But i am still confused for 100+ insert query in a table (basically for tracking purpose) which db should i use.
I refered What's the difference between MyISAM and InnoDB?
Based on my understanding, for each insert MYISAM will lock table and hence innodb should be used for row locking.
But on the otherhand performance of MYISAM are 100times better.So what should be the optimal and correct selection and why?
Simple code that does one-row INSERTs without any tuning maxes out at about 100 rows per second in any engine, especially InnoDB.
But, it is possible to get 1000 rows per second or even more.
The quick fix for InnoDB is to set innodb_flush_log_at_trx_commit = 2; that will uncork the main thing stopping InnoDB at 100 inserts/second using a commodity spinning disk. Setting innodb_buffer_pool_size to about 70% of available RAM is also important.
If a user is inserting multiple rows into the same table at the same time, then LOAD DATA or a batch Insert (INSERT ... VALUES (...), (...), ...) of 100 rows or more will insert ten times as fast. This applies to any Engine.
MyISAM is not 100 times as fast; it is not even 10 times as fast as InnoDB. Today (5.6 or newer), you would be hard pressed to find a well tuned application that is more than a little faster in MyISAM. You are, or will be, I/O-limited.
As for corruption -- No engine suffers from corruption except during a crash. A power failure may mangle MyISAM indexes, usually recoverably. Moreover, a batch insert could be half done. InnoDB will be clean -- the entire batch is done or none of it is done; no corruption.
ARCHIVE saves disk space, but costs CPU.
MEMORY is often faster because it has no I/O. But you have too much data for that Engine, correct?
MariaDB with TokuDB can probably run faster than anything I describe here; but you have not indicated the need for it.
100 rows inserted per second = 8M/day = 3 Billion/year. Will you be purging the data eventually? Will you be querying the data? Purging: Let's talk about PARTITION. Querying: Let's talk about Summary Tables.
Indexing: Minimize the number of indexes. If you have a 'random' index, such as a UUID, and you have a billion rows, you will be stuck with 100 rows/second, regardless of which Engine and regardless of any tuning. Do I need to explain further?
If this is a queuing system, I say "Don't queue it, just do it."
Bottom line: Use a InnoDB. Tune it. Use batch inserts. Avoid random indexes. etc.
You are correct that MyISAM is a faster choice if your operational use case is lots of insertions. But that answer can change drastically based on the kind of use you make of the data. If this is an archival application you might consider the ARCHIVE storage engine. It is best for write-once, read-rarely applications.
You should investigate INSERT DELAYED as it will allow your client programs to fire-and-forget these inserts rather than waiting for completion. This burns RAM in your mysqld process, though. If that style of operation meets your needs, this is a compelling reason to go with MyISAM.
Beware indexes in the target table of your inserts. Maintaining indexes is a big part of the server's insert workload.
Don't forget to look into MariaDB. It's a compatible fork of MySQL with some more advanced storage engines and features.
I have experience with a similar application. In our case, the application scaled up beyond the original insert rate, and the server could not keep up.(It's always good when an application workload grows!) We ended up doing two things, one after the other.
Using a message queuing system, and running just a couple of processes to actually do the inserts. The original clients wrote their logging records to the message queue rather than directly to the database. (Amazon AWS's SQS is an example of such a queuing system).
reworking the insert process to use LOAD DATA INFILE to load great gobs of log rows at once.
(You probably have figured out that this kind of workload isn't feasible on a cheap shared hosting service or an AWS micro instance.)

Will switch to MyISAM Engine help to improve the speed of reading operations?

I'm currently have a few tables with InnoDB Engine. 10-20 connections are constantly inserts data into those tables. I use MySQL RDS instance on AWS. Metric shows about 300 Write IOPS (counts/second). However, INSERT operations lock the table, and if someone want to perform a query like SELECT COUNT(*) FROM table; it could literally take a few hours for the first time before MySQL cache the result.
I'm not a DBA and my knowledge about DB are very limited. So the question is if I'll switch to MyISAM Engine will it help to improve the time of READ operations?
SELECT COUNT(*) without WHERE is bad query for InnoDB, as it does not cache the row count like MyISAM do. So if you have issue with this particular query, you have to cache the count somewhere - in a stats table for example.
After you remove this specific type of query, you can talk about InnoDB vs MyISAM read performance. Generally writes do not block reads in InnoDB - is uses MVCC for this. InnoDB performance however is very dependent of how much RAM you have set for the buffer pool.
InnoDB and MyISAM are very different in how they store data. You can always optimize for one of them and knowing the differences can help you in designing your application. Generally you can have as good performance for reading as in MyISAM in InnoDB tables - you just can use count without where clause, and you always should have a suitable index for where clauses, as in InnoDB table scan will be slower than in MyISAM.
I think you should stick with your current setup. InnoDB is supposed not to lock the table when inserting rows, since it uses the MVCC technique. On the other hand, MyISAM locks the entire table when new rows are inserted.
So, if you have many writes, you should stick with InnoDB.
Innodb is a better overall engine in general. There are some benchmarks out there that put read operations in myiasm a little ahead of innodb. However, if your site is big enough to notice this performance difference, you should be on innodb anyway because of all the other efficiencies. Innodb alone wins because of the row level locking instead if table level locking in myiasm when backing up your database.

Performance difference between Innodb and Myisam in Mysql

I have a mysql table with over 30 million records that was originally being stored with myisam. Here is a description of the table:
I would run the following query against this table which would generally take around 30 seconds to complete. I would change #eid each time to avoid database or disk caching.
select count(fact_data.id)
from fact_data
where fact_data.entity_id=#eid
and fact_data.metric_id=1
I then converted this table to innoDB without making any other changes and afterwards the same query now returns in under a second every single time I run the query. Even when I randomly set #eid to avoid caching, the query returns in under a second.
I've been researching the differences between the two storage types to try to explain the dramatic improvement in performance but haven't been able to come up with anything. In fact, much of what I read indicates that Myisam should be faster.
The queries I'm running are against a local database with no other processes hitting the database at the time of the tests.
That's a surprisingly large performance difference, but I can think of a few things that may be contributing.
MyISAM has historically been viewed as faster than InnoDB, but for recent versions of InnoDB, that is true for a much, much smaller set of use cases. MyISAM is typically faster for table scans of read-only tables. In most other use cases, I typically find InnoDB to be faster. Often many times faster. Table locks are a death knell for MyISAM in most of my usage of MySQL.
MyISAM caches indexes in its key buffer. Perhaps you have set the key buffer too small for it to effectively cache the index for your somewhat large table.
MyISAM depends on the OS to cache table data from the .MYD files in the OS disk cache. If the OS is running low on memory, it will start dumping its disk cache. That could force it to keep reading from disk.
InnoDB caches both indexes and data in its own memory buffer. You can tell the OS not to also use its disk cache if you set innodb_flush_method to O_DIRECT, though this isn't supported on OS X.
InnoDB usually buffers data and indexes in 16kb pages. Depending on how you are changing the value of #eid between queries, it may have already cached the data for one query due to the disk reads from a previous query.
Make sure you created the indexes identically. Use explain to check if MySQL is using the index. Since you included the output of describe instead of show create table or show indexes from, I can't tell if entity_id is part of a composite index. If it was not the first part of a composite index, it wouldn't be used.
If you are using a relatively modern version of MySQL, run the following command before running the query:
set profiling = 1;
That will turn on query profiling for your session. After running the query, run
show profiles;
That will show you the list of queries for which profiles are available. I think it keeps the last 20 by default. Assuming your query was the first one, run:
show profile for query 1;
You will then see the duration of each stage in running your query. This is extremely useful for determining what (e.g., table locks, sorting, creating temp tables, etc.) is causing a query to be slow.
My first suspicion would be that the original MyISAM table and/or indexes became fragmented over time resulting in the performance slowly degrading. The InnoDB table would not have the same problem since you created it with all the data already in it (so it would all be stored sequentially on disk).
You could test this theory by rebuilding the MyISAM table. The easiest way to do this would be to use a "null" ALTER TABLE statement:
ALTER TABLE mytable ENGINE = MyISAM;
Then check the performance to see if it is better.
Another possibility would be if the database itself is simply tuned for InnoDB performance rather than MyISAM. For example, InnoDB uses the innodb_buffer_pool_size parameter to know how much memory should be allocated for storing cached data and indexes in memory. But MyISAM uses the key_buffer parameter. If your database has a large innodb buffer pool and a small key buffer, then InnoDB performance is going to be better than MyISAM performance, especially for large tables.
What are your index definitions, there are ways in which you can create indexes for MyISAM in which your index fields will not be used when you think they would.

MySQL: Has anyone used the TokuDB storage engine?

Has anyone used the TokuDB storage engine for MySQL?
The product web site claims to have a 50x performance increase over other MySQL storage engines (e.g. Innodb, MyISAM, etc). Here are the performance claims http://tokutek.com/downloads/tokudb-performance-brief.pdf
Is this true?
Any personal experiences with this storage engine in use with MySQL?
If you are storing blobs such as images then don't use tokudb. It has a smaller row size limit.
If you have data that's over 100 million rows, use tokudb.
If you are sensitive to UPDATE speed, don't use tokudb. It has very fast insert but as compared to innodb, slower UPDATE speed and especially if you use INSERT ON DUPLICATE statements.
If you are storing log entries, use tokudb.
If you want to shrink your myisam/innnodb's data usage by more than 5x, then use tokudb. I have personally confirmed that their fractal tree + compression data backend is extremely space efficient.
Rule of thumb, use the best tool for the job. Tokudb blows innodb and myisam out of the waters in specific situations but is not a general replacement db engine for everything under the sky.
Although TokuDB is slow on UPDATE as commented above, it is extremely fast on REPLACE. Usually you can substitute UPDATEs with REPLACE INTO instead. I use TokuDB on tables of up to 18 Billion rows and nothing else comes close, it's at least 100 times faster than innodb for random inserts on big tables.
I have the same question. I did find a fairly decent comparison of TokuDB against Innodb
http://www.pythian.com/news/5139/testing-tokudb-faster-and-smaller-for-large-tables/
However, I am interested in any other experiences that others may have had with TokuDB or any other similar storage engine for MySQL.
Another review here
http://www.mysqlperformanceblog.com/2009/04/28/detailed-review-of-tokutek-storage-engine/

MySQL transaction support with mixed tables

It seems like I will be needing transaction with MySQL and I have no idea how should I manage transactions in Mysql with mixed InnoDB/MyISAM tables, It all seems like a huge mess.
You might ask why would I ever want to mix the tables together... the anwer is PERFORMANCE. as many developers have noticed, InnoDB tables generally have bad performance, but in return give higher isolation level etc...
does anyone have any advice regarding this issue?
I think you are overrating the performance difference between MyISAM and InnoDB. MyISAM is faster in data warehousing situations (such as full table scan reporting, etc..), but InnoDB can actually be faster in many cases with normal OLTP queries.
InnoDB is harder to tune since it has more knobs, but a properly tuned InnoDB system can often have higher throughput than MyISAM due to better locking and better I/O patterns.
Given that you can't have transactions in MyISAM tables, I am not sure what the actual problem is. Any data you need transactions for must be in an InnoDB table and you manage the transactions using whatever access library you are using or with manual SQL commands.
There are definite performance benefits of using exactly one engine.
A server tuned for one engine won't be tuned for the other - both require that you allocate a substantial amount of RAM to its exclusive use - therefore, you can't give them both an optimal amount.
Say you have 8G of ram on your (obviously 64-bit, but still relatively small) database server, you might want to assign about 3/4 of it to your innodb page cache. Alternatively, if you're using MyISAM, you may want about half of it to be your key_buffer. You can't do both.
Pick an engine and use it exclusively. There are ways of getting around performance problems - most of them aren't easy though (i.e. they require redesigning your data structure or your application).
The short answer is that there is no transaction support in MyISAM. If you start a transaction, add or modify data in some InnoDB tables, add or modify data in a MyISAM table, and then you have to rollback, your MyISAM change cannot be removed. To support mixed engines like that, your application has to know that changes to whatever data is stored MyISAM happens "outside" of the transaction.
If you need transactions for some processes, then isolate the data that must be transactionable and put all that data in InnoDB.