MySQL optimization for insert and retrieve only - mysql

Our applications read data from sensor complexes and write them to a database, together with their timestamp. New data are inserted about 5 times per second per sensor complex (1..10 complexes per database server; data contain 2 blobs of typically 25kB and 50kB, resp.), they are read from 1..3 machines (simple reads like: select * from table where sensorId=?sensorId and timestamp>?lastTimestamp). Rows are never updated; no reports are created on the database side; old rows are deleted after several days. Only one of the tables receives occasional updates.
The primary index of that main table is an autogenerated id, with additional indices for sensorid and timestamp.
The performance is currently abysmal. The deletion of old data takes hours(!), and many data packets are not sent to the database because the insertion process takes longer than the interval between sensor reads. How can we optimize the performance of the database in such a specific scenario?
Setting the transaction isolation level to READ_COMMITTED looks promising, and also innodb_lock_timeout seems useful. Can you suggest further settings useful in our specific scenario?
Can we gain further possibilities when we get rid of the table which receives updates?

Deleting old data -- PARTITION BY RANGE(TO_DAYS(...)) lets you DROP PARTITION a looooot faster than doing DELETEs.
More details: http://mysql.rjweb.org/doc.php/partitionmaint
And that SELECT you mentioned needs this 'composite' index:
INDEX(sensorId, timestamp)

Related

Slow insert statements on SQL Server

A single insert statement is taking, occasionally, more than 2 seconds. The inserts are potentially concurrent, as it depends on our site traffic which can result in 200 inserts per minute.
The table has more than 150M rows, 4 indexes and is accessed using a simple select statement for reporting purposes.
SHOW INDEX FROM ouptut
How to speed up the inserts considering that all indexes are required?
You haven't provided many details but it seems like you need partitions.
An insertion operation in an database index has, in general, an O(logN) time complexity where N is the number of rows in the table. If your table is really huge even logN may become too much.
So, to address that scalability issue you can make use of index partitions to transparently split up your table indexes in smaller internal pieces and reduce that N without changing your application or SQL scripts.
https://dev.mysql.com/doc/refman/5.7/en/partitioning-overview.html
[EDIT]
Considering information initially added in the comments and now updated in the question itself.
200 potentially concurrent inserts per minute
4 indexes
1 select for reporting purposes
There are a few not mutually exclusive improvements:
Check the output of EXPLAIN for that SELECT and remove indexes not being used, or, otherwise, combine them in a single index.
Make the inserts in batch
https://dev.mysql.com/doc/refman/5.6/en/insert-optimization.html
https://dev.mysql.com/doc/refman/5.6/en/optimizing-innodb-bulk-data-loading.html
Partitioning still an option.
Alternatively, change your approach: save the data to a nosql database like redis and populate the mysql table asynchronously for reporting purpose.

How to synchronize MySQL InnoDB table data to Memory table

We use MYSQL InnoDB engine for insert and update operations, in order to improve the performance for query, we are considering using Memory table to store the latest data Ex. last two months data.
we can configure the MySQL to import data to Memory table when server start, but actual business data are updated all the time, we have to synchronize the data from InnoDB table to Meomory table frequently, but we cannot restart MySQL server every time when we want to synchronize the data.
Can anybody know how to synchronize the data without restart the MySQL?
You would typically do that with a trigger. My first idea would be to do it in two parts.
1) Create triggers for insert, update and delete (if that ever happens) on the innodb table that causes the same change in the memory table. Make sure no logic relies on that certain rows have been deleted from the memory table, it will hold the last 2 months and then some.
2) Create a background job to clear out the memory table of old data. If you have a high load against it consider a frequent job that nibbles of the old rows a few at a time.
Another solution would be to partition the innodb table by time and then make sure you include something like where time > subdate(now(), interval 2 month)

How to manage Huge operations on MySql

I have a MySql DataBase. I have a lot of records (about 4,000,000,000 rows) and I want to process them in order to reduce them(reduce to about 1,000,000,000 Rows).
Assume I have following tables:
table RawData: I have more than 5000 rows per sec that I want to insert them to RawData
table ProcessedData : this table is a processed(aggregated) storage for rows that were inserted at RawData.
minimum rows count > 20,000,000
table ProcessedDataDetail: I write details of table ProcessedData (data that was aggregated )
users want to view and search in ProcessedData table that need to join more than 8 other tables.
Inserting in RawData and searching in ProcessedData (ProcessedData INNER JOIN ProcessedDataDetail INNER JOIN ...) are very slow. I used a lot of Indexes. assume my data length is 1G, but my Index length is 4G :). ( I want to get ride of these indexes, they make slow my process)
How can I Increase speed of this process ?
I think I need a shadow table from ProcessedData, name it ProcessedDataShadow. then proccess RawData and aggregate them with ProcessedDataShadow, then insert the result in ProcessedDataShadow and ProcessedData. What is your idea??
(I am developing the project by C++)
thank you in advance.
Without knowing more about what your actual application is, I have these suggestions:
Use InnoDB if you aren't already. InnoDB makes use of row-locks and are much better at handling concurrent updates/inserts. It will be slower if you don't work concurrently, but the row-locking is probably a must have for you, depending on how many sources you will have for RawData.
Indexes usually speeds up things, but badly chosen indexes can make things slower. I don't think you want to get rid of them, but a lot of indexes can make inserts very slow. It is possible to disable indexes when inserting batches of data, in order to prevent updating indexes on each insert.
If you will be selecting huge amount of data that might disturb the data collection, consider using a replicated slave database server that you use only for reading. Even if that will lock rows /tables, the primary (master) database wont be affected, and the slave will get back up to speed as soon as it is free to do so.
Do you need to process data in the database? If possible, maybe collect all data in the application and only insert ProcessedData.
You've not said what the structure of the data is, how its consolidated, how promptly data needs to be available to users nor how lumpy the consolidation process can be.
However the most immediate problem will be sinking 5000 rows per second. You're going to need a very big, very fast machine (probably a sharded cluster).
If possible I'd recommend writing a consolidating buffer (using an in-memory hash table - not in the DBMS) to put the consolidated data into - even if it's only partially consolidated - then update from this into the processedData table rather than trying to populate it directly from the rawData.
Indeed, I'd probably consider seperating the raw and consolidated data onto seperate servers/clusters (the MySQL federated engine is handy for providing a unified view of the data).
Have you analysed your queries to see which indexes you really need? (hint - this script is very useful for this).

Puzzled by the max number records of a table in MySQL

I am working with a web site analyser which will be used to analyse our own site according to the log from tomcat.
Now,we push the log from tomcat to the database (MySQL) everyday, it works well now. However I found a potential and fatal problem !
Until now we push the log to a single table in the database,but the log items will increase rapidly soon especially when we hold more users, obviously a single table can not save so many log items (also it will result in a low performance when do the query operation from the large table).
And we use the hibernate as the persistence layer,each row in the log table is mapped to a java object of LogEntry in the application.
I have thought create a new table each month,but how to make the LogEntry map to more than one tables and query across tables?
Also,the log number of each month maybe not the same, an extreme example, how about the log number (records in the table) is greater than the max capacity of the table in db?
Then I thought set a property to limit the max number of log to be pushed when hibernate push log to db. If so I have no idea to tell the hibernate create a new table and query across table automatically.
Any ideas?
Update to Sandy:
I know your meaning, that's to say the max capability of a table is decided by the OS, and if I use the partitioning, the max capability maybe increase until it up to the max capability of my disk. However even if I use the partition, it seems that I do not need to care about the max capability of the table, but if the table hold too many records, it will result in a low performance. (BTW, we have not decide to delete the old logs yet.) Another way I thought is create more than tables with the same structure,but I am using the hibernate,all of the log inserting and querying will through the hibernate, and can the Entity (POJO) mapped to more than one table?
I have thought create a new table each month, but how to make the LogEntry map to more than one tables and query across tables?
Have a look at Hibernate Shards (database sharding is a method of horizontal partitioning). Although this suproject is not very active and has some limitations (refer to the documentation), it's stable and usable (Hibernate Shards has been contributed by Max Ross from Google who is using it internally).
Also,the log number of each month maybe not the same,a extreme example, how about the log number(records in the table) is greater than the max capacity of the table in db?
Monitor your database/tables and anticipate the required maintenance.
If so I have no idea to tell the hibernate create a new table and query accross table automatically.
Hibernate won't do that automatically, this will be part of the maintenance of the database and of the sharding configuration (see also the section about Virtual Shards).
I think you should consider horizontal partitioning.
Horizontal Partitioning
this form of
partitioning segments table rows so
that distinct groups of physical
row-based datasets are formed that can
be addressed individually (one
partition) or collectively (one-to-all
partitions). All columns defined to a
table are found in each set of
partitions so no actual table
attributes are missing. An example of
horizontal partitioning might be a
table that contains ten years worth of
historical invoice data being
partitioned into ten distinct
partitions, where each partition
contains a single year's worth of
data.data.
Increased performance - during scan
operations, the MySQL optimizer knows
what partitions contain the data that
will satisfy a particular query and
will access only those necessary
partitions during query execution. For
example, a million row table may be
broken up into ten different
partitions in range style so that each
partition contains 100,000 rows. *If a
query is issued that only needs data
from one of the partitions, and a
table scan operation is necessary,
only 100,000 rows will be accessed
instead of a million. Obviously, it is
much quicker for MySQL to sample
100,000 rows than one million so the
query will complete much sooner. The
same benefit is derived should index
access be possible as local
partitioned indexes are created for
partitioned tables. Finally, it is
possible to stripe a partitioned table
across different physical drives by
specifying different file
system/directory paths for specific
partitions. This allows physical I/O
contention to be reduced when multiple
partitions are accessed at the same
time.
Checkout this article Improving Database Performance with Partitioning
Update
It seems that the Horizontal Partitioning can handle the large table, but how about if the number of the record is greater than the max size of the table?
Actually, max size of mysql table is determined by Operating System constraints. Have a look at this, and determine yourself.
Alternative option is to purge old log records periodically, only if they are not required for analysis.
Create a cron job or any scheduled task to do the deleting.

What is the best mysql table format for high insert load?

I am in the process of adding a new feature to a system. The process will read live data from PLCĀ“s and store them in a database.
The data table will have 4 columns: variable_id (SMALLINT), timestamp (TIMESTAMP), value(FLOAT), quality(TINYINT).
The primary key is (variable_id, timestamp).
The system needs to be able to insert 1000-2000 records every second.
The data table will keep the last 24 hours of data, older data is deleted from the table.
The data table also needs to handle 5-10 select statements every second. The select statement is selecting the lastest value from the table for a specific variable and displaying it on the web.
Should I use MyISAM or InnoDB table format? Does MyISAM lock the entire table while doing inserts, thus blocking the select statements from the web interface?
Currently all the data tables in the data base are MyISAM tables.
Should I use MyISAM or InnoDB table format?
For any project with frequent concurrent reads and writes, you should use InnoDB.
Does MyISAM lock the entire table while doing inserts, thus blocking the select statements from the web interface?
With ##concurrent_insert enabled, MyISAMcan append inserted rows to the end while still reading concurrently from another session.
However, if you ever to anything but the INSERT, this can fail (i. e. the table will lock).
The system needs to be able to insert 1000-2000 records every second.
It will be better to batch these inserts and do them in batches.
InnoDB is much faster in terms of rows per second than in transactions per second.
The data table will keep the last 24 hours of data, older data is deleted from the table.
Note that InnoDB locks all rows examined, not only those affected.
If your DELETE statement will ever use a fullscan, the concurrent INSERTs will fail (since the fullscan will make InnoDB to place the gap locks on all records browsed including the last one).
MyISAM is quicker, and it locks the entire table. InnoDB is transaction-based, so it'll do row locking, but is slower.