how can i handle billion records effectivily - mysql

I have a performance issue, while handling billion records using select query,I have a table as
CREATE TABLE `temp_content_closure2` (
`parent_label` varchar(2000) DEFAULT NULL,
`parent_code_id` bigint(20) NOT NULL,
`parent_depth` bigint(20) NOT NULL DEFAULT '0',
`content_id` bigint(20) unsigned NOT NULL DEFAULT '0',
KEY `code_content` (`parent_code_id`,`content_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY KEY (parent_depth)
PARTITIONS 20 */ |
I used partition which will increase the performance by subdividing the table,but it is not usefull in my case,my sample select in this table
+----------------+----------------+--------------+------------+
| parent_label | parent_code_id | parent_depth | content_id |
+----------------+----------------+--------------+------------+
| Taxonomy | 20000 | 0 | 447 |
| Taxonomy | 20000 | 0 | 2286 |
| Taxonomy | 20000 | 0 | 3422 |
| Taxonomy | 20000 | 0 | 5916 |
+----------------+----------------+--------------+------------+
Here the content_id will be unique in respect to the parent_dept,so i used parent_depth as a key for partitioning.In every depth i have 2577833 rows to handle ,so here partitioning is not useful,i got a idea from websites to use archive storage engine but it will use full table scan and not use index in select ,basically 99% i use select query in this table and this table is going to increases its count every day.currently i am in mysql database which has 5.0.1 version.i got an idea about nosql database to use ,but is any way to handle in mysql ,if you are ssuggesting nosql means which can i use cassandra or accumulo ?.

Add an index like this:
ALTER TABLE table ADD INDEX content_id ('content_id')
You can also add multiple indexes if you have more specific SELECT criteria which will also speed things up.
Multiple and single indexes
Overall though, if you have a table like this thats growing so fast then you should probably be looking at restructuring your sql design.
Check out "Big Data" solutions as well.

With that size and volume of data, you'd need to either setup a sharded-MySQL setup in a cluster of machines (Facebook and Twitter stored massive amounts of data on a sharded MySQL setup, so it is possible), or alternatively use a Big Table-based solution that natively distribute the data amongst nodes in various clusters- Cassandra and HBase are the most popular alternatives here. You must realize, a billion records on a single machine will hit almost every limit of the system- IO first, followed by memory, followed by CPU. It's simply not feasible.
If you do go the Big Table way, Cassandra will be the quickest to setup and test. However, if you anticipate map-reduce type analytic needs, then HBase is more tightly integrated with the Hadoop ecosystem, and should work out well. Performance-wise, they are both neck to neck, so take your pick.

Related

Optimizing SQL Query from a Big Table Ordered by Timestamp

We have a big table with the following table structure:
CREATE TABLE `location_data` (
`id` int(20) NOT NULL AUTO_INCREMENT,
`dt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`device_sn` char(30) NOT NULL,
`data` char(20) NOT NULL,
`gps_date` datetime NOT NULL,
`lat` double(30,10) DEFAULT NULL,
`lng` double(30,10) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `dt` (`dt`),
KEY `data` (`data`),
KEY `device_sn` (`device_sn`,`data`,`dt`),
KEY `device_sn_2` (`device_sn`,`dt`)
) ENGINE=MyISAM AUTO_INCREMENT=721453698 DEFAULT CHARSET=latin1
Many times we have performed query such as follow:
SELECT * FROM location_data WHERE device_sn = 'XXX' AND data = 'location' ORDER BY dt DESC LIMIT 1;
OR
SELECT * FROM location_data WHERE device_sn = 'XXX' AND data = 'location' AND dt >= '2014-01-01 00:00:00 ' AND dt <= '2014-01-01 23:00:00' ORDER BY dt DESC;
We have been optimizing this in a few ways:
By adding index and using FORCE INDEX on device_sn.
Separating the table into multiple tables based on the date (e.g. location_data_20140101) and pre-checking if there is a data based on certain date and we will pull that particular table alone. This table is created by cron once a day and the data in location_data for that particular date will be deleted.
The table location_data is HIGH WRITE and LOW READ.
However, few times, the query is running really slow. I wonder if there are other methods / ways / restructure the data that allows us to read a data in sequential date manner based on a given device_sn.
Any tips are more than welcomed.
EXPLAIN STATEMENT 1ST QUERY:
+----+-------------+--------------+------+----------------------------+-----------+---------+-------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+----------------------------+-----------+---------+-------------+------+-------------+
| 1 | SIMPLE | location_dat | ref | data,device_sn,device_sn_2 | device_sn | 50 | const,const | 1 | Using where |
+----+-------------+--------------+------+----------------------------+-----------+---------+-------------+------+-------------+
EXPLAIN STATEMENT 2nd QUERY:
+----+-------------+--------------+-------+-------------------------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+-------------------------------+------+---------+------+------+-------------+
| 1 | SIMPLE | test_udp_new | range | dt,data,device_sn,device_sn_2 | dt | 4 | NULL | 1 | Using where |
+----+-------------+--------------+-------+-------------------------------+------+---------+------+------+-------------+
The index device_sn (device_sn,data,dt) is good. MySQL should use it without need to do any FORCE INDEX. You can verify it by running "explain select ..."
However, your table is MyISAM, which is only supports table level locks. If the table is heavily write it may be slow. I would suggest converting it to InnoDB.
Ok, I'll provide info that I know and this might not answer your question but could provide some insight.
There exits certain differences between InnoDB and MyISAM. Forget about full text indexing or spatial indexes, the huge difference is in how they operate.
InnoDB has several great features compared to MyISAM.
First off, it can store the data set it works with in RAM. This is why database servers come with a lot of RAM - so that I/O operations could be done quick. For example, an index scan is faster if you have indexes in RAM rather than on HDD because finding data on HDD is several magnitudes slower than doing it in RAM. Same applies for full table scans.
The variable that controls this when using InnoDB is called innodb_buffer_pool_size. By default it's 8 MB if I am not mistaken. I personally set this value high, sometimes even up to 90% of available RAM. Usually, when this value is optimized - a lot of people experience incredible speed gains.
The other thing is that InnoDB is a transactional engine. That means it will tell you that a write to disk succeeded or failed and that will be 100% correct. MyISAM won't do that because it doesn't force OS to force HDD to commit data permanently. That's why sometimes records are lost when using MyISAM, it thinks data is written because OS said it was when in reality OS tried to optimize the write and HDD might lose buffer data, thus not writing it down. OS tries to optimize the write operation and uses HDD's buffers to store larger chunks of data and then it flushes it in a single I/O. What happens then is that you don't have control over how data is being written.
With InnoDB you can start a transaction, execute say 100 INSERT queries and then commit. That will effectively force the hard drive to flush all 100 queries at once, using 1 I/O. If each INSERT is 4 KB long, 100 of them is 400 KB. That means you'll utilize 400kb of your disk's bandwith with 1 I/O operation and that remainder of I/O will be available for other uses. This is how inserts are being optimized.
Next are indexes with low cardinality - cardinality is a number of unique values in an indexed column. For primary key this value is 1. it's also the highest value. Indexes with low cardinality are columns where you have a few distinct values, such as yes or no or similar. If an index is too low in cardinality, MySQL will prefer a full table scan - it's MUCH quicker. Also, forcing an index that MySQL doesn't want to use could (and probably will) slow things down - this is because when using an indexed search, MySQL processes records one by one. When it does a table scan, it can read multiple records at once and avoid processing them. If those records were written sequentially on a mechanical disk, further optimizations are possible.
TL;DR:
use InnoDB on a server where you can allocate sufficient RAM
set the value of innodb_buffer_pool_size large enough so you can allocate more resources for faster querying
use an SSD if possible
try to wrap multiple INSERTs into transactions so you can better utilize your hard drive's bandwith and I/O
avoid indexing columns that have low unique value count compared to row count - they just waste space (though there are exceptions to this)

Improving MySQL Performance on a Run-Once Query with a Large Dataset

I previously asked a question on how to analyse large datasets (how can I analyse 13GB of data). One promising response was to add the data into a MySQL database using natural keys and thereby make use of INNODB's clustered indexing.
I've added the data to the database with a schema that looks like this:
TorrentsPerPeer
+----------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+------------------+------+-----+---------+-------+
| ip | int(10) unsigned | NO | PRI | NULL | |
| infohash | varchar(40) | NO | PRI | NULL | |
+----------+------------------+------+-----+---------+-------+
The two fields together form the primary key.
This table represents known instances of peers downloading torrents. I'd like to be able to provide information on how many torrents can be found at peers. I'm going to draw a histogram of the frequencies of which I see numbers of torrents (e.g. 20 peers have 2 torrents, 40 peers have 3, ...).
I've written the following query:
SELECT `count`, COUNT(`ip`)
FROM (SELECT `ip`, COUNT(`infohash`) AS `count`
FROM TorrentsPerPeer
GROUP BY `ip`) AS `counts`
GROUP BY `count`;
Here's the EXPLAIN for the sub-select:
+----+-------------+----------------+-------+---------------+---------+------------+--------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_length | ref | rows | Extra |
+----+-------------+----------------+-------+---------------+---------+------------+--------+----------+-------------+
| 1 | SIMPLE | TorrentPerPeer | index | [Null] | PRIMARY | 126 | [Null] | 79262772 | Using index |
+----+-------------+----------------+-------+---------------+---------+------------+--------+----------+-------------+
I can't seem to do an EXPLAIN for the full query because it takes way too long. This bug suggests it's because it's running the sub query first.
This query is currently running (and has been for an hour). top is reporting that mysqld is only using ~5% of the available CPU whilst its RSIZE is steadily increasing. My assumption here is that the server is building temporary tables in RAM that it's using to complete the query.
My question is then; how can I improve the performance of this query? Should I change the query somehow? I've been altering the server settings in the my.cnf file to increase the INNODB buffer pool size, should I change any other values?
If it matters the table is 79'262'772 rows deep and takes up ~8GB of disk space. I'm not expecting this to be an easy query, maybe 'patience' is the only reasonable answer.
EDIT Just to add that the query has finished and it took 105mins. That's not unbearable, I'm just hoping for some improvements.
My hunch is that with an unsigned int and a varchar 40 (especially the varchar!) you have now a HUGE primary key and it is making your index file too big to fit in whatever RAM you have for Innodb_buffer_pool. This would make InnoDB have to rely on disk to swap index pages as it searches and that is a LOT of disk seeks and not a lot of CPU work.
One thing I did for a similar issue is use something in between a truly natural key and a surrogate key. We would take the 2 fields that are actually unique (one of which was also a varchar) and in the application layer would make a fixed width MD5 hash and use THAT as the key. Yes, it means more work for the app but it makes for a much smaller index file since you are no longer using an arbitrary length field.
OR, you could just use a server with tons of RAM and see if that makes the index fit in memory but I always like to make 'throw hardware at it' a last resort :)

Speed of mysql query on tables containing blob depends on filesystem cache

I have a table with approximately 120k rows, which contains a field with a BLOB (not more than 1MB each entry in size, usually much less). My problem is that whenever I run a query asking any columns on this table (not including the BLOB one), if the filesystem cache is empty, it takes approximately 40'' to complete. All subsequent queries on the same table require less than 1'' (testing from the command line client, on the server itself). The number of rows returned in the queries vary from an empty set to 60k+
I have eliminated the query cache so it has nothing to do with it.
The table is myisam but I also tried to change it to innodb (and setting ROW_FORMAT=COMPACT), but without any luck.
If I remove the BLOB column, the query is always fast.
So I would assume that the server reads the blobs from the disk (or parts of them) and the filesystem caches them. The problem is that on a server with high traffic and limited memory, the filesystem cache is refreshed every once in a while, so this particular query keeps causing me trouble.
So my question is, is there a way to considerably speed things up, without removing the blob column from the table?
here are 2 example queries, ran one after the other, along with explain, indexes and table definition:
mysql> SELECT ct.score FROM completed_tests ct where ct.status != 'deleted' and ct.status != 'failed' and score < 100;
Empty set (48.21 sec)
mysql> SELECT ct.score FROM completed_tests ct where ct.status != 'deleted' and ct.status != 'failed' and score < 99;
Empty set (1.16 sec)
mysql> explain SELECT ct.score FROM completed_tests ct where ct.status != 'deleted' and ct.status != 'failed' and score < 99;
+----+-------------+-------+-------+---------------+--------+---------+------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+--------+---------+------+-------+-------------+
| 1 | SIMPLE | ct | range | status,score | status | 768 | NULL | 82096 | Using where |
+----+-------------+-------+-------+---------------+--------+---------+------+-------+-------------+
1 row in set (0.00 sec)
mysql> show indexes from completed_tests;
+-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| completed_tests | 0 | PRIMARY | 1 | id | A | 583938 | NULL | NULL | | BTREE | |
| completed_tests | 1 | users_login | 1 | users_LOGIN | A | 11449 | NULL | NULL | YES | BTREE | |
| completed_tests | 1 | tests_ID | 1 | tests_ID | A | 140 | NULL | NULL | | BTREE | |
| completed_tests | 1 | status | 1 | status | A | 3 | NULL | NULL | YES | BTREE | |
| completed_tests | 1 | timestamp | 1 | timestamp | A | 291969 | NULL | NULL | | BTREE | |
| completed_tests | 1 | archive | 1 | archive | A | 1 | NULL | NULL | | BTREE | |
| completed_tests | 1 | score | 1 | score | A | 783 | NULL | NULL | YES | BTREE | |
| completed_tests | 1 | pending | 1 | pending | A | 1 | NULL | NULL | | BTREE | |
+-----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
mysql> show create table completed_tests;
+-----------------+--------------------------------------
| Table | Create Table |
+-----------------+--------------------------------------
| completed_tests | CREATE TABLE `completed_tests` (
`id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
`users_LOGIN` varchar(100) DEFAULT NULL,
`tests_ID` mediumint(8) unsigned NOT NULL DEFAULT '0',
`test` longblob,
`status` varchar(255) DEFAULT NULL,
`timestamp` int(10) unsigned NOT NULL DEFAULT '0',
`archive` tinyint(1) NOT NULL DEFAULT '0',
`time_start` int(10) unsigned DEFAULT NULL,
`time_end` int(10) unsigned DEFAULT NULL,
`time_spent` int(10) unsigned DEFAULT NULL,
`score` float DEFAULT NULL,
`pending` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `users_login` (`users_LOGIN`),
KEY `tests_ID` (`tests_ID`),
KEY `status` (`status`),
KEY `timestamp` (`timestamp`),
KEY `archive` (`archive`),
KEY `score` (`score`),
KEY `pending` (`pending`)
) ENGINE=InnoDB AUTO_INCREMENT=117996 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED
1 row in set (0.00 sec)
I originally posted this on mysql query slow at first fast afterwards but I now have more information so I repost as a different question
I also posted this on the mysql forum, but I haven't heard back
Thanks in advance as always
The design of BLOB (=TEXT) storage in MySQL seems to be totally flawed and counter-intuitive. I ran a couple of times into the same problem and was unable to find any authoritative explanation. The most detailed analysis I've finally found is this post from 2010: http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/
General belief and expectation is that BLOBs/TEXTs are stored outside main row storage (e.g., see this answer). This is NOT TRUE, though. There are several issues here (I'm basing on the article given above):
If the size of a BLOB item is several KB, it is included directly in row data. Consequently, even if you SELECT only non-BLOB columns, the engine still has to load all your BLOBs from disk. Say, you have 1M rows with 100 bytes of non-blob data each and 5000 bytes of blob data. You SELECT all non-blob columns and expect that MySQL would read from disk around 100-120 bytes per row, which is 100-120 MB in total (+20 for BLOB address). However, the reality is that MySQL stores all BLOBs in the same disk blocks as rows, so they all must be read together even if not used, and so the size of data read from disk is around 5100 MB = 5 GB - this is 50 times more than you would expect and means 50 times slower query execution.
Of course, this design has an advantage: when you need all the columns, including the blob one, SELECT query is faster when blobs are stored with the row than when stored externally: you avoid (sometimes) 1 additional page access per row. However, this is not a typical use case for BLOBs and DB engine should not be optimized towards this case. If your data is so small that it fits in a row and you're fine with loading it in every query no matter if needed or not - then you would use VARCHAR type instead of BLOB/TEXT.
Even if for some reason (long row or long blob) the BLOB value is stored externally, its 768-byte prefix is still kept in the row itself. Let's take the previous example: you have 100 bytes of non-blob data in each row, but now the blob column holds items of 1 MB each so they must be kept externally. SELECT of non-blob columns will have to read roughly 800 bytes per row (non-blobs + blob prefix), instead of 100-120 - this is again 7 times larger disk transfer than you'd expect, and 7x slower query execution.
External BLOB storage is ineffective in its usage of disk space: it allocates space in blocks of 16 KB and single block cannot hold multiple items, so if your blobs are small and take, for instance, 8 KB each, the actual space allocated is twice that large.
I hope this design will get fixed one day: MySQL will store ALL blobs - big and small - in external storage, without any prefixes kept in DB, with external storage allocation being efficient for items of all sizes. Before this happens, separating out BLOB/TEXT columns seems the only reasonable solution - separating out to another table or to the filesystem (each BLOB value kept as a file).
[UPDATE 2019-10-15]
InnoDB documentation provides now an ultimate answer to the issue discussed above:
https://dev.mysql.com/doc/refman/8.0/en/innodb-row-format.html
The case of storing 768-byte prefixes of BLOB/TEXT values inline holds indeed for COMPACT row format. According to the docs, "For each non-NULL variable-length field (...) The internal part is 768 bytes".
However, you can use DYNAMIC row format instead. With this format:
"InnoDB can store long variable-length column values (...) fully off-page, with the clustered index record containing only a 20-byte pointer to the overflow page. (...) TEXT and BLOB columns that are less than or equal to 40 bytes are stored in line."
Here, a BLOB value can occupy up to 40 bytes of inline storage, which is much better than 768 bytes as in the COMPACT mode, and looks like a lot more reasonable approach in the case you want to mix BLOB and non-BLOB types in a table and still be able to scan multiple rows pretty fast. Moreover, the extended (over 20 bytes) inline storage is used ONLY for values sized between 20-40 bytes; for larger values, only the 20-byte pointer is stored (no prefix), unlike in the COMPACT mode. Hence, the extended 40-byte storage is used rarely in practice and one can safely assume the average size of inline storage to be just 20 bytes (or less, if you tend to keep many small values of less than 20B in your BLOB). All in all, it seems DYNAMIC row format, rather than COMPACT, should be the default choice in most cases to achieve good predictable performance of BLOB columns in InnoDB.
An example how to check the actual physical storage in InnoDB can be found here:
https://dba.stackexchange.com/a/210430/177276
As to MyISAM, it apparently does NOT provide off-page storage for BLOBs at all (just inline). Check here for more info:
https://dev.mysql.com/doc/refman/5.7/en/dynamic-format.html
https://forums.mysql.com/read.php?24,105964,267596#msg-267596
I was doing research on this issue for a while. Many people recommend using blob with only one primary key in a separate table and storing the blobs meta data in another table with a foreign key to the blob table. With this the performance will be higher considerably.
Adding a composite index on the two relevant columns should allow these queries to be executed without accessing the table data directly.
CREATE INDEX `IX_score_status` ON `completed_tests` (`score`, `status`);
If you are able to switch to MariaDB then you can make the most of the table elimination optimisations. This would allow you to split the BLOB field out into it's own table and use a view to recreate you existing table structure using a LEFT JOIN. This way it will only access the BLOB data if it is explicitly required for the executing query.
Just add index or indexes to fields used after WHERE query for a table with blobs.
e.g. You have 2 tables with those fields
users : USERID, NAME, ...
userphotos : BLOBID, BLOB, USERNO, ...
select * from userphotos where USERNO=123456;
Normaly this works fine. When you have many large images (e.g. BLOB, MEDIUMBLOB or LONGBLOB more than 5GB in total ) this will take much time (more than minutes) while BLOBID is primary key.
Somehow MySQL is searching whole data including images if there is no index about the field of BLOB table in WHERE clause. When your data goes larger and larger that takes much time. If you create index for the field USERNO, this will speed up your database and it will be independed by the size of whole data.
Solution:
**Add Index to the USERNO at userphotos**
As an answer to your question you should create index for the ct.status

Mysql select - improve performances

I am working on an e-shop which sells products only via loans. I display 10 products per page in any category, each product has 3 different price tags - 3 different loan types. Everything went pretty well during testing time, query execution time was perfect, but today when transfered the changes to the production server, the site "collapsed" in about 2 minutes. The query that is used to select loan types sometimes hangs for ~10 seconds and it happens frequently and thus it cant keep up and its hella slow. The table that is used to store the data has approximately 2 milion records and each select looks like this:
SELECT *
FROM products_loans
WHERE KOD IN("X17/Q30-10", "X17/12", "X17/5-24")
AND 369.27 BETWEEN CENA_OD AND CENA_DO;
3 loan types and the price that needs to be in range between CENA_OD and CENA_DO, thus 3 rows are returned.
But since I need to display 10 products per page, I need to run it trough a modified select using OR, since I didnt find any other solution to this. I have asked about it here, but got no answer. As mentioned in the referencing post, this has to be done separately since there is no column that could be used in a join (except of course price and code, but that ended very, very badly). Here is the show create table, kod and CENA_OD/CENA_DO very indexed via INDEX.
CREATE TABLE `products_loans` (
`KOEF_ID` bigint(20) NOT NULL,
`KOD` varchar(30) NOT NULL,
`AKONTACIA` int(11) NOT NULL,
`POCET_SPLATOK` int(11) NOT NULL,
`koeficient` decimal(10,2) NOT NULL default '0.00',
`CENA_OD` decimal(10,2) default NULL,
`CENA_DO` decimal(10,2) default NULL,
`PREDAJNA_CENA` decimal(10,2) default NULL,
`AKONTACIA_SUMA` decimal(10,2) default NULL,
`TYP_VYHODY` varchar(4) default NULL,
`stage` smallint(6) NOT NULL default '1',
PRIMARY KEY (`KOEF_ID`),
KEY `CENA_OD` (`CENA_OD`),
KEY `CENA_DO` (`CENA_DO`),
KEY `KOD` (`KOD`),
KEY `stage` (`stage`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
And also selecting all loan types and later filtering them trough php doesnt work good, since each type has over 50k records and the select takes too much time as well...
Any ides about improving the speed are appreciated.
Edit:
Here is the explain
+----+-------------+----------------+-------+---------------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+-------+---------------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | products_loans | range | CENA_OD,CENA_DO,KOD | KOD | 92 | NULL | 190158 | Using where |
+----+-------------+----------------+-------+---------------------+------+---------+------+--------+-------------+
I have tried the combined index and it improved the performance on the test server from 0.44 sec to 0.06 sec, I cant access the production server from home though, so I will have to try it tomorrow.
Your issue is that you are searching for intervals which contain a point (rather than the more normal query of all points in an interval). These queries do not work well with the standard B-tree index, so instead you need to use an R-Tree index. Unfortunately MySQL doesn't allow you to select an R-Tree index on a column, but you can get the desired index by changing your column type to GEOMETRY and using the geometric functions to check if the interval contains the point.
See Quassnoi's article Adjacency list vs. nested sets: MySQL where he explains this in more detail. The use case is different, but the techniques involved are the same. Here's an extract from the relevant part of the article:
There is also a certain class of tasks that require searching for all ranges containing a known value:
Searching for an IP address in the IP range ban list
Searching for a given date within a date range
and several others. These tasks can be improved by using R-Tree capabilities of MySQL.
Try to refactor your query like:
SELECT * FROM products_loans
WHERE KOD IN("X17/Q30-10", "X17/12", "X17/5-24")
AND CENA_OD >= 369.27
AND CENA_DO <= 369.27;
(mysql is not very smart when choosing indexes) and check the performance.
The next try is to add a combined key - (KOD,CENA_OD,CENA_DO)
And the next major try is to refactor your base to have products separated from prices. This should really help.
PS: you can also migrate to postgresql, it's smarter than mysql when choosing right indexes.
MySQL can only use 1 key. If you always get the entry by the 3 columns, depending on the actual data (range) in the columns one of the following could very well add a serious amount of performance:
ALTER TABLE products_loans ADD INDEX(KOD, CENA_OD, CENA_DO);
ALTER TABLE products_loans ADD INDEX(CENA_OD, CENA_DO, KOD);
Notice that the order of the columns matter! If that doesn't improve performance, give us the EXPLAIN output of the query.

How to optimize mysql indexes so that INSERT operations happen quickly on a large table with frequent writes and reads?

I have a table watchlist containing today almost 3Mil records.
mysql> select count(*) from watchlist;
+----------+
| count(*) |
+----------+
| 2957994 |
+----------+
It is used as a log to record product-page-views on a large e-commerce site (50,000+ products). It records the productID of the viewed product, the IP address and USER_AGENT of the viewer. And a timestamp of when it happens:
mysql> show columns from watchlist;
+-----------+--------------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+-------------------+-------+
| productID | int(11) | NO | MUL | 0 | |
| ip | varchar(16) | YES | | NULL | |
| added_on | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
| agent | varchar(220) | YES | MUL | NULL | |
+-----------+--------------+------+-----+-------------------+-------+
The data is then reported on several pages throughout the site on both the back-end (e.g. checking what GoogleBot is indexing), and front-end (e.g. a side-bar box for "Recently Viewed Products" and a page showing users what "People from your region also liked" etc.).
So that these "report" pages and side-bars load quickly I put indexes on relevant fields:
mysql> show indexes from watchlist;
+-----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| watchlist | 1 | added_on | 1 | added_on | A | NULL | NULL | NULL | | BTREE | |
| watchlist | 1 | productID | 1 | productID | A | NULL | NULL | NULL | | BTREE | |
| watchlist | 1 | agent | 1 | agent | A | NULL | NULL | NULL | YES | BTREE | |
+-----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
Without the INDEXES, pages with the side-bar for example would spend about 30-45sec executing a query to get the 7 most-recent ProductIDs. With the indexes it takes <0.2sec.
The problem is that with the INDEXES the product pages themselves are taking longer and longer to load because as the table grows the write operations are taking upwards of 5sec. In addition there is a spike on the mysqld process amounting to 10-15% of available CPU each time a product page is viewed (roughly once every 2sec). We already had to upgrade the server hardware because on a previous server it was reaching 100% and caused mysqld to crash.
My plan is to attempt a 2-table solution. One table for INSERT operations, and another for SELECT operations. I plan to purge the INSERT table whenever it reaches 1000 records using a TRIGGER, and copy the oldest 900 records into the SELECT table. The report pages are a mixture of real-time (recently viewed) and analytics (which region), but the real-time pages tend to only need a handful of fresh records while the analytical pages don't need to know about the most recent trend (i.e. last 1000 views). So I can use the small table for the former and the large table for the latter reports.
My question: Is this an ideal solution to this problem?
Also: With TRIGGERS in MySQL is it possible to nice the trigger_statement so that it takes longer, but doesn't consume much CPU? Would running a cron job every 30min that is niced, and which performs the purging if required be a better solution?
Write operations for a single row into a data table should not take 5 seconds, regardless how big the table gets.
Is your clustered index based on the timestamp field? If not, it should be, so you're not writing into the middle of your table somewhere. Also, make sure you are using InnoDB tables - MyISAM is not optimized for writes.
I would propose writing into two tables: one long-term table, one short-term reporting table with little or no indexing, which is then dumped as needed.
Another solution would be to use memcached or an in-memory database for the live reporting data, so there's no hit on the production database.
One more thought: exactly how "live" must either of these reports be? Perhaps retrieving a new list on a timed basis versus once for every page view would be sufficient.
A quick fixed might be to use the INSERT DELAYED syntax, which allows mysql to queue the inserts and execute them when it has time. That's probably not a very scalable solution though.
I actually think that the principles of what you will be attempting is sound, although I wouldn't use a trigger. My suggested solution would be to let the data accumulate for a day and then purge the data to the secondary log table with a batch script that runs at night. This is mainly because these frequent transfers of a thousand rows would still put a rather heavy load on the server, and because I don't really trust the MySQL trigger implementation (although that isn't based on any real substance).
Instead of optimizing indexes you could use some sort of database write offload. You could delegate writing to some background process via asynchronous queue (ActiveMQ for example). Inserting a message into ActiveMQ queue is very fast. We are using ActiveMQ and have about 10-20K insert operations on test platform (and this is single threaded test application! So you could have more).
Look for 'shadow tables' when reconstructing tables this way, you don't need to write to the production table.
I had the same issue even using InnoDB tables or MyISAM as mention before, not optimized for writes, and solved it by using a second table to write temp data (that can periodically update master huge table). Master table over 18 million records, used to read only records and write result on to second small table.
The problem is the insert/update onto the big master table, takes a while, and worse if there are several updates or inserts on the queue awaiting, even with the INSERT DELAYED or UPDATE [LOW_PRIORITY] options enabled
To make it even faster, do read the small secondary table first, when searching a record, if te record is there, then work on the second table only. Use the master big table for reference and picking up new data record only *if data is not on the secondary small table, you just go and read the record from the master (Read is fast on InnoDB tables or MyISAM schemes)and then insert that record on the small second table.
Works like a charm, takes much less than 5 seconds to read from huge master 20 Million record and write on to second small table 100K to 300K records in less than a second.
This works just fine.
Regards
Something that often helps when doing bulk loads is to drop any indexes, do the bulk load, then recreate the indexes. This is generally much faster than the database having to constantly update the index for each and every row inserted.