InnoDB VS Myisam Regarding Logs Data & Logs Retrieval on every minute - mysql

Well, I know already that:
1. InnoDB is faster for data insertion but slower on data retrieval.
2. MyISAM is faster for data retrieval but slower for data insertion.
My situation is a bit different, and I just cant figure out what settings are good for me, let me explain:
My software inserts each user's hit's data (IP, Host, Referral data etc) to a Logs table at run-time. Previously, I used to write this data to a .csv file and then import it to the DB after predefined minutes/hours, but it was not good for me, I need real-time data.
I have several auto processes that run each minute, getting data from the Logs table, hence I need this to be fast.
My question is, what type of MySQL engine should I use for the Logs table, InnoDB or MyISAM?
currently, I'm using InnoDB cause it's faster for insertion, however, should I leave it this way, or switch back to MyISAM?
Thanks

InnoDB is faster for data insertion but slower on data retrieval. 2. MyISAM is faster for data retrieval but slower for data insertion.
Not true. In fact, under most workloads it's just the opposite.
That prevailing wisdom you cite is based on InnoDB of about 2004.
My question is, what type of MySQL engine should I use for the Logs table, InnoDB or MyISAM?
If you care about your data not getting corrupted, use InnoDB.

Related

MYSQL DB Fast insert AND select

I am new to mysql and i want to make a table that is very fast with concurrent insertion and selection .
For example,I want to store 1 million rows in about less than 1 second and also read these rows whenever they are stored.
Any suggestions about the storage engine (MYISAM or InnoDB), how to fast insert all these rows and how to read them.
Thanks
The storage engine MyISAM is primarily for read-mostly workloads, because of locking at table level. If you really need concurrent insertion and selection, you'd rather choose the storage engine InnoDB, because it uses row locking. Be aware that InnoDB is a little bit slower because of the overhead.
In any case, make sure you're using batch inserts. Try to keep the number of indices on the table as low as possible to not have index creation overhead. Also you should configure your MySQL server for good performance. For example I would use innodb_flush_log_at_trx_commit=0 in your MySQL server configuration, if you don't mind loosing one second of data when your server crashes. There are a few books on optimizing MySQL, look for "High Performance MySQL".
Besides software, also the hardware plays an important role. You're likely to be disk bound. Thus having a fast disk is essential (for example SSD or RAID).

Which engine to be used for more than 100 insert query per second

Which engine to be used for more than 100 insert query per second
I read differences and pros and cons of MYISAM and Innodb.
But i am still confused for 100+ insert query in a table (basically for tracking purpose) which db should i use.
I refered What's the difference between MyISAM and InnoDB?
Based on my understanding, for each insert MYISAM will lock table and hence innodb should be used for row locking.
But on the otherhand performance of MYISAM are 100times better.So what should be the optimal and correct selection and why?
Simple code that does one-row INSERTs without any tuning maxes out at about 100 rows per second in any engine, especially InnoDB.
But, it is possible to get 1000 rows per second or even more.
The quick fix for InnoDB is to set innodb_flush_log_at_trx_commit = 2; that will uncork the main thing stopping InnoDB at 100 inserts/second using a commodity spinning disk. Setting innodb_buffer_pool_size to about 70% of available RAM is also important.
If a user is inserting multiple rows into the same table at the same time, then LOAD DATA or a batch Insert (INSERT ... VALUES (...), (...), ...) of 100 rows or more will insert ten times as fast. This applies to any Engine.
MyISAM is not 100 times as fast; it is not even 10 times as fast as InnoDB. Today (5.6 or newer), you would be hard pressed to find a well tuned application that is more than a little faster in MyISAM. You are, or will be, I/O-limited.
As for corruption -- No engine suffers from corruption except during a crash. A power failure may mangle MyISAM indexes, usually recoverably. Moreover, a batch insert could be half done. InnoDB will be clean -- the entire batch is done or none of it is done; no corruption.
ARCHIVE saves disk space, but costs CPU.
MEMORY is often faster because it has no I/O. But you have too much data for that Engine, correct?
MariaDB with TokuDB can probably run faster than anything I describe here; but you have not indicated the need for it.
100 rows inserted per second = 8M/day = 3 Billion/year. Will you be purging the data eventually? Will you be querying the data? Purging: Let's talk about PARTITION. Querying: Let's talk about Summary Tables.
Indexing: Minimize the number of indexes. If you have a 'random' index, such as a UUID, and you have a billion rows, you will be stuck with 100 rows/second, regardless of which Engine and regardless of any tuning. Do I need to explain further?
If this is a queuing system, I say "Don't queue it, just do it."
Bottom line: Use a InnoDB. Tune it. Use batch inserts. Avoid random indexes. etc.
You are correct that MyISAM is a faster choice if your operational use case is lots of insertions. But that answer can change drastically based on the kind of use you make of the data. If this is an archival application you might consider the ARCHIVE storage engine. It is best for write-once, read-rarely applications.
You should investigate INSERT DELAYED as it will allow your client programs to fire-and-forget these inserts rather than waiting for completion. This burns RAM in your mysqld process, though. If that style of operation meets your needs, this is a compelling reason to go with MyISAM.
Beware indexes in the target table of your inserts. Maintaining indexes is a big part of the server's insert workload.
Don't forget to look into MariaDB. It's a compatible fork of MySQL with some more advanced storage engines and features.
I have experience with a similar application. In our case, the application scaled up beyond the original insert rate, and the server could not keep up.(It's always good when an application workload grows!) We ended up doing two things, one after the other.
Using a message queuing system, and running just a couple of processes to actually do the inserts. The original clients wrote their logging records to the message queue rather than directly to the database. (Amazon AWS's SQS is an example of such a queuing system).
reworking the insert process to use LOAD DATA INFILE to load great gobs of log rows at once.
(You probably have figured out that this kind of workload isn't feasible on a cheap shared hosting service or an AWS micro instance.)

MySQL | Massive Data Insertion

We are using MySQL 5.5 InnoDB Engine for managing our database, one of the table which has equally SELECT/INSERT operations over it will be having 100-150 million Insertion operations on a daily basis.
I have already read about MySQL Partitioning, and was planning to implement but before I implement I'd love to take thoughts. So What is the best way to deal with this kind of challenge without compromising user's response time?
First of all, make sure the primary key is auto-increment, as it's clustering index for InnoDB tables. This means that if it's auto-increment, the insertion is append-only operation, if not - it's random write, and this is major performance killer. Make sure the PK is small and you don't have unnecessary indexes. If possible, batch inserts, as updating the indexes is large part of the insert operation.
Make sure other I/O settings make sense, like how often the data is actually flushed to the disk; you can put the binary log file on an SSD to ensure it's written as fast as possible.
After all of this; it's common to separate reads from writes with a master-slave servers, so spikes in insert queries do not affect reading of the data (assuming it's ok to read potentially stale data)

Performance difference between Innodb and Myisam in Mysql

I have a mysql table with over 30 million records that was originally being stored with myisam. Here is a description of the table:
I would run the following query against this table which would generally take around 30 seconds to complete. I would change #eid each time to avoid database or disk caching.
select count(fact_data.id)
from fact_data
where fact_data.entity_id=#eid
and fact_data.metric_id=1
I then converted this table to innoDB without making any other changes and afterwards the same query now returns in under a second every single time I run the query. Even when I randomly set #eid to avoid caching, the query returns in under a second.
I've been researching the differences between the two storage types to try to explain the dramatic improvement in performance but haven't been able to come up with anything. In fact, much of what I read indicates that Myisam should be faster.
The queries I'm running are against a local database with no other processes hitting the database at the time of the tests.
That's a surprisingly large performance difference, but I can think of a few things that may be contributing.
MyISAM has historically been viewed as faster than InnoDB, but for recent versions of InnoDB, that is true for a much, much smaller set of use cases. MyISAM is typically faster for table scans of read-only tables. In most other use cases, I typically find InnoDB to be faster. Often many times faster. Table locks are a death knell for MyISAM in most of my usage of MySQL.
MyISAM caches indexes in its key buffer. Perhaps you have set the key buffer too small for it to effectively cache the index for your somewhat large table.
MyISAM depends on the OS to cache table data from the .MYD files in the OS disk cache. If the OS is running low on memory, it will start dumping its disk cache. That could force it to keep reading from disk.
InnoDB caches both indexes and data in its own memory buffer. You can tell the OS not to also use its disk cache if you set innodb_flush_method to O_DIRECT, though this isn't supported on OS X.
InnoDB usually buffers data and indexes in 16kb pages. Depending on how you are changing the value of #eid between queries, it may have already cached the data for one query due to the disk reads from a previous query.
Make sure you created the indexes identically. Use explain to check if MySQL is using the index. Since you included the output of describe instead of show create table or show indexes from, I can't tell if entity_id is part of a composite index. If it was not the first part of a composite index, it wouldn't be used.
If you are using a relatively modern version of MySQL, run the following command before running the query:
set profiling = 1;
That will turn on query profiling for your session. After running the query, run
show profiles;
That will show you the list of queries for which profiles are available. I think it keeps the last 20 by default. Assuming your query was the first one, run:
show profile for query 1;
You will then see the duration of each stage in running your query. This is extremely useful for determining what (e.g., table locks, sorting, creating temp tables, etc.) is causing a query to be slow.
My first suspicion would be that the original MyISAM table and/or indexes became fragmented over time resulting in the performance slowly degrading. The InnoDB table would not have the same problem since you created it with all the data already in it (so it would all be stored sequentially on disk).
You could test this theory by rebuilding the MyISAM table. The easiest way to do this would be to use a "null" ALTER TABLE statement:
ALTER TABLE mytable ENGINE = MyISAM;
Then check the performance to see if it is better.
Another possibility would be if the database itself is simply tuned for InnoDB performance rather than MyISAM. For example, InnoDB uses the innodb_buffer_pool_size parameter to know how much memory should be allocated for storing cached data and indexes in memory. But MyISAM uses the key_buffer parameter. If your database has a large innodb buffer pool and a small key buffer, then InnoDB performance is going to be better than MyISAM performance, especially for large tables.
What are your index definitions, there are ways in which you can create indexes for MyISAM in which your index fields will not be used when you think they would.

Storage engine for large amounts of constantly inserted data which should be available instantly

Our server (several Java applications on Debian) handles incoming data (GNSS observations) that should be:
immediately (delay <200ms) delivered to other applications,
stored for further use.
Sometimes (several times a day maybe) about million of archived records will be fetched from the database. Record size is about 12 double precision fields + timestamp and some ids. There are no UPDATEs; DELETEs are very rare but massive. Incoming flow is up to hundred records per second. So I had to choose storage engine for this data.
I tried using MySQL (InnoDB). One application inserts, others constantly check last record id and if it is updated, fetch new records. This part works fine. But I've met following issues:
Records are quite large (about 200-240 bytes per record).
Fetching million of archived records is unacceptable slow (tens of minutes or more).
File-based storage will work just fine (since there are no inserts in the middle of DB and selections are mostly like 'WHERE ID=1 AND TIME BETWEEN 2000 AND 3000', but there are other problems:
Looking for new data might be not so easy.
Other data like logs and configs are stored in same database and I prefer to have one database for everything.
Can you advice some suitable database engine (SQL preferred, but not necessary)? Maybe it is possible to fine-tune MySQL to reduce record size and fetch time for continious strips of data?
MongoDB is not acceptable since DB size is limited on 32-bit machines. Any engine that does not provide quick access for recently inserted data is not acceptable too.
I'd recommend using TokuDB storage engine for MySQL. It's free for up to 50GB of user data, and it's pricing model isn't terrible, making it a great choice for storing large amounts of data.
It's got higher insert speed compared to InnoDB and MyISAM and scales much better as the dataset grows (InnoDB tends to deteriorate once working dataset doesn't fit the RAM making its performance dependant on the I/O of the HDD subsystem).
It's also ACID compliant and supports multiple clustered indexes (which would be a great choice for massive DELETEs you're planning to do). Also, hot schema changes are supported (ALTER TABLE doesn't lock the tables, and changes are quick on huge tables - I'm talking gigabyte-sized tables being altered in mere seconds).
From my personal use, I experienced about 5 - 10 times less disk usage due to TokuDB's compression, and it's much, much faster than MyISAM or InnoDB.
Even though it sounds like I'm trying to advertise this product - I'm not, it's just simply amazing since you can use monolithic data-store without expensive scaling plans like partitioning across nodes to scale the writes.
There really is no getting around how long it takes to load millions of records from disk. Your 32-bit requirement means you are limited in how much RAM you can use for memory based data structures. But, if you want to use MySQL, you may be able to get good performance using multiple table types.
If you need really fast non-blocking inserts. You can use the black hole table type and replication. The server where the inserts occur has a black hole table type that replicates to another server where the table is Innodb or MyISAM.
Since you don't do UPDATEs, I think MyISAM would be better than Innodb in this scenario. You can use the MERGE table type for MyISAM (not available for Innodb). Not sure what your data set is like, but you could have 1 table per day (hour, week?), your MERGE table would then be a superset of those tables. Assuming you want to delete old data by day, just redeclare the MERGE table to not include the old tables. This action is instantaneous. Dropping old tables is also extremely fast.
To check for new data, you can look at "todays" table directly rather than going through the MERGE table.