Can i set up Mysql to auto-partition? - mysql

I want to partition a very large table. As the business is growing, partitioning by date isn't really that good because each year the partitions get bigger and bigger. What I'd really like is a partition for every 10 million records.
The Mysql manual show this simple example:
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT NOT NULL,
store_id INT NOT NULL
)
PARTITION BY RANGE (store_id) (
PARTITION p0 VALUES LESS THAN (6),
PARTITION p1 VALUES LESS THAN (11),
PARTITION p2 VALUES LESS THAN (16),
PARTITION p3 VALUES LESS THAN MAXVALUE
);
But this means that everything larger than 16 and less than MAXVALUE gets thrown in the last partition. Is there a way to auto-generate a new partition every interval (in my case, 10 million records) so I won't have to keep modifying an active database? I am running Mysql 5.5
Thanks!
EDIT: Here is my actual table
CREATE TABLE `my_table` (
`row_id` int(11) NOT NULL AUTO_INCREMENT,
`filename` varchar(50) DEFAULT NULL,
`timestamp` datetime DEFAULT NULL,
`unit_num` int(3) DEFAULT NULL,
`string` int(3) DEFAULT NULL,
`voltage` float(6,4) DEFAULT NULL,
`impedance` float(6,4) DEFAULT NULL,
`amb` float(6,2) DEFAULT NULL,
`ripple_v` float(8,6) DEFAULT NULL,
PRIMARY KEY (`row_id`),
UNIQUE KEY `timestamp` (`timestamp`,`filename`,`string`,`unit_num`),
KEY `index1` (`filename`),
KEY `index2` (`timestamp`),
KEY `index3` (`timestamp`,`filename`,`string`),
KEY `index4` (`filename`,`unit_num`)
) ENGINE=MyISAM AUTO_INCREMENT=690892041 DEFAULT CHARSET=latin1
and an example query for the graph is...
SELECT DATE_FORMAT(timestamp,'%Y/%m/%d %H:%i:%s') as mytime,voltage,impedance,amb,ripple_v,unit_num
FROM my_table WHERE timestamp >= DATE_SUB('2015-07-31 00:05:59', INTERVAL 90 DAY)
AND filename = 'dlrphx10s320upsab3' and unit_num='5' and string='2'ORDER BY timestamp asc;
Here is the explain for the query...
mysql> explain SELECT DATE_FORMAT(timestamp,'%Y/%m/%d %H:%i:%s') as mytime,voltage,impedance,amb,ripple_v,unit_num FROM my_table WHERE timestamp >= DATE_SUB('2015-07-31 00:05:59', INTERVAL 90 DAY) AND filename = 'dlrphx10s320upsab3' and unit_num='5' and string='2'ORDER BY timestamp asc;
+----+-------------+------------+------+-------------------------+--------+---------+-------------+-------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+-------------------------+--------+---------+-------------+-------+----------------------------------------------------+
| 1 | SIMPLE | unit_tarma | ref | timestamp,index3,index4 | index4 | 58 | const,const | 13440 | Using index condition; Using where; Using filesort |
+----+-------------+------------+------+-------------------------+--------+---------+-------------+-------+----------------------------------------------------+

(This answer is directed at the schema and SELECT.)
Since you anticipate millions of rows, first I want to point out some improvements to the schema.
FLOAT(m,n) is usually the 'wrong' thing to do because it leads to two roundings. Either use plain FLOAT (which seems 'right' for metrics like voltage) or use DECIMAL(m,n). FLOAT is 4 bytes; in the cases given, DECIMAL would be 3 or 4 bytes.
When you have both INDEX(a) and INDEX(a,b), the former is unnecessary since the latter can cover for such. You have 3 unnecessary KEYs. This slows down INSERTs.
INT(3) -- Are you saying a "3-digit number"? If so consider TINYINT UNSIGNED (values 0..255) for 1 byte instead of INT for 4 bytes. This will save many MB of disk space, hence speed. (See also SMALLINT, etc, and SIGNED or UNSIGNED.)
If filename is repeated a lot, you may want to "normalize" it. This would save many MB.
Use NOT NULL unless you need NULL for something.
AUTO_INCREMENT=690892041 implies that you are about 1/3 of the way to disaster with id, which will top out at about 2 billion. Do you use id for anything? Getting rid of the column would avoid the issue; and change the UNIQUE KEY to PRIMARY KEY. (If you do need id, let's talk further.)
ENGINE=MyISAM -- Switching has some ramifications, both favorable and unfavorable. The table would become 2-3 times as big. The 'right' choice of PRIMARY KEY would further speed up this SELECT significantly. (And may or may not slow down other SELECTs.)
A note on the SELECT: Since string and unit_num are constants in the query, the last two fields of ORDER BY timestamp asc, string asc, unit_num asc are unnecessary. If they are relevant for reasons not apparent in the SELECT, then my advice may be incomplete.
This
WHERE filename = 'foobar'
AND unit_num='40'
AND string='2'
AND timestamp >= ...
is optimally handled by INDEX(filename, unit_name, string, timestamp). The order of the columns is not important except that timestamp needs to be last. Rearranging the current UNIQUE key, you give you the optimal index. (Meanwhile, none of the indexes is very good for this SELECT.) Making it the PRIMARY KEY and the table InnoDB would make it even faster.
Partitioning? No advantage. Not for performance; not for anything else you have mentioned. A common use for partitioning is for purging 'old'. If you intend to do such, let's talk further.
In huge tables it is best to look at all the important SELECTs simultaneously so that we don't speed up one while demolishing the speed of others. It may even turn out that partitioning helps in this kind of tradeoff.

First, I must ask what benefit Partitioning gives you? Is there some query that runs faster because of it?
There is no auto-partitioning.
Instead, you should have a job that runs every day and it counts the number of rows in the 'last active' partition to see if it is about 10M. If so, add another partition.
I recommend keeping the "last" partition (the one with MAXVALUE) empty. That way you can REORGANIZE PARTITION to split it into two empty partitions with essentially zero overhead. And I recommend that instead of ADD PARTITION because you might slip up and put something in the last partition.
It is unclear what will trigger 10M. Are there multiple rows for each store_id? And are there new rows coming in for each store? If so, then partitioning on store_id since all partitions will be growing all the time.
OK, so store_id was just a lame example from the reference manual. Please provide SHOW CREATE TABLE so we can talk concrete, not hand-waving. There are simply too many ways to take this task.
What is the activity?
If you mostly hit the "recent" partition(s), then an uneven distribution may be warrantied -- periodically add a new partition and combine an adjacent pair of old partitions. (I did this successfully in one system.)
If you will be purging "old" data, then clearly you need to use PARTITION BY RANGE(TO_DAYS(...)) and use DROP PARTITION plus REORGANIZE PARTITION.
And there are lots of other scenarios. But I know of only 4 scenarios where Partitioning provides any performance benefit. See my blog.

Related

Trouble getting partitions to make a difference in query time

I'm just experimenting a bit with partitions with some dummy data, and am not having any luck optimizing my queries so far.
I downloaded a dataset from the Internet, which consists of a single table of measurements:
CREATE TABLE `partitioned_measures` (
`measure_timestamp` datetime NOT NULL,
`station_name` varchar(255) DEFAULT NULL,
`wind_mtsperhour` int(11) NOT NULL,
`windgust_mtsperhour` int(11) NOT NULL,
`windangle` int(3) NOT NULL,
`rain_mm` decimal(5,2) DEFAULT NULL,
`temperature_dht11` int(5) DEFAULT NULL,
`humidity_dht11` int(5) DEFAULT NULL,
`barometric_pressure` decimal(10,2) NOT NULL,
`barometric_temperature` decimal(10,0) NOT NULL,
`lux` decimal(7,2) DEFAULT NULL,
`is_plugged` tinyint(1) DEFAULT NULL,
`battery_level` int(3) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY RANGE (TO_DAYS(measure_timestamp))
(PARTITION `slow` VALUES LESS THAN (736634) ENGINE = InnoDB,
PARTITION `fast` VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
Just as a learning exercise I wanted to try to partition the measurements by measure_timestamp (without help of indexing). Specifically, I thought it would be interesting to try and put the most recent month in a partition by itself. (I understand that it's best to have equally-sized partitions, but I just wanted to experiment)
I used the following command to add the partition (Note that the dataset ends in Dec of 2016, and the vast majority of the datapoints are in prior months):
ALTER TABLE partitioned_measures
PARTITION BY RANGE(TO_DAYS(measure_timestamp)) (
PARTITION slow VALUES LESS THAN(TO_DAYS('2016-12-01')),
PARTITION fast VALUES LESS THAN (MAXVALUE)
);
To query, I'm looking at all entries from the 2nd and onward (just to be sure that I'm only looking in the latest partition):
select SQL_NO_CACHE COUNT(*) FROM partitioned_measures
WHERE measure_timestamp >= '2016-12-02'
AND DAYOFWEEK(measure_timestamp) = 1;
When I add an EXPLAIN to the front of that, I get the following:
+----+-------------+----------------------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------------------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | partitioned_measures | slow,fast | ALL | NULL | NULL | NULL | NULL | 1835458 | 33.33 | Using where |
+----+-------------+----------------------+------------+------+---------------+------+---------+------+---------+----------+-------------+
But the query time is about the same as it was before the partition (~1.6 seconds). I've never used partitions before so I feel like there's something conceptual that I'm missing.
Tricky but i found a working solution or should i say a workaround, it seams to be a MySQL bug?
ALTER TABLE partitioned_measures
PARTITION BY RANGE COLUMNS(measure_timestamp) (
PARTITION slow VALUES LESS THAN('2016-12-01'),
PARTITION fast VALUES LESS THAN(MAXVALUE)
);
see demo which does use Partition Pruning correctly
i noticed that syntax here
I still find it wierd the partioning puning does not work correct, with
ALTER TABLE partitioned_measures
PARTITION BY RANGE(TO_DAYS(measure_timestamp)) (
PARTITION slow VALUES LESS THAN(TO_DAYS('2016-12-01')),
PARTITION fast VALUES LESS THAN (MAXVALUE)
);
MySQL 5.7 should be able to do the Partition Pruning which TO_DAYS() just fine
Pruning can also be applied for tables partitioned on a DATE or
DATETIME column when the partitioning expression uses the YEAR() or
TO_DAYS() function. In addition, in MySQL 5.7
source
see demo which does not use Partition Pruning correct, i've tryed alot to get it working all methods failed which i could think off.
The explanation:
It did do the pruning you requested, but it added the first partition. Why? Because there is where bad dates are put.
The workaround is to have a bogus first partition:
/*!50100 PARTITION BY RANGE (TO_DAYS(measure_timestamp))
({ARTITION bogus VALUES LESS THAN (0) ENGINE = InnoDB, -- any small value
PARTITION `slow` VALUES LESS THAN (736634) ENGINE = InnoDB,
PARTITION `fast` VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */
Reference is buried in https://dev.mysql.com/doc/refman/5.7/en/partitioning-handling-nulls.html
If you had more than a trivial number of partitions you might have been more obvious that it picked the desired partition, plus always the first.
With rare exceptions, partitioning does not provide better performance than you can get from a non-partitioned table with a suitable index. In this case, INDEX(measure_timestamp). (Or a virtual column with INDEX(dow, measure_timestamp).)

MySQL Large Table Sharding to Smaller Table based on Unique ID

We have a large MySQL table (device_data) with the following columns:
ID (int)
dt (timestamp)
serial_number (char(20))
data1 (double)
data2 (double)
... // other columns
The table receives around 10M rows every day.
We have done a sharding by separating the table based on the date of the timestamp (device_data_YYYYMMDD). However, we feel this is not effective because most of our queries (shown below) always check on the "serial_number" and will perform across many dates.
SELECT * FROM device_data WHERE serial_number = 'XXX' AND dt >= '2018-01-01' AND dt <= '2018-01-07';
Therefore, we think that creating the sharding based on the serial number will be more effective. Basically, we will have:
device_data_<serial_number>
device_data_0012393746
device_data_7891238456
Hence, when we want to find data for a particular device, we can easily reference as:
SELECT * FROM device_data_<serial_number> WHERE dt >= '2018-01-01' AND dt <= '2018-01-07';
This approach seems to be effective because:
The application at all time will access the data based on the device first.
We have checked that there is no query that access the data without specifying the device serial number first.
The table for each device will be relatively small (9000 rows per day)
A few challenges that we think we will face is:
We have alot of devices. This means that the table device_data_ will be alot too. I have checked that MySQL does not provide limitation in the number of tables in the database. Will this impact on performance vs keeping them in one table?
How will this impact on later on when we would like to scale MySQL (e.g. using master / slave, etc)?
Are there other alternative / solution in resolving this?
Update. Below is the show create table result from our existing table:
CREATE TABLE `test_udp_new` (
`id` int(20) unsigned NOT NULL AUTO_INCREMENT,
`dt` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`device_sn` varchar(20) NOT NULL,
`gps_date` datetime NOT NULL,
`lat` decimal(10,5) DEFAULT NULL,
`lng` decimal(10,5) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `device_sn_2` (`dt`,`device_sn`),
KEY `dt` (`dt`),
KEY `data` (`data`) USING BTREE,
KEY `test_udp_new_device_sn_dt_index` (`device_sn`,`dt`),
KEY `test_udp_new_device_sn_data_dt_index` (`device_sn`,`data`,`dt`)
) ENGINE=InnoDB AUTO_INCREMENT=44449751 DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC
The most frequent queries being run:
SELECT *
FROM test_udp_new
WHERE device_sn = 'xxx'
AND dt >= 'xxx'
AND dt <= 'xxx'
ORDER BY dt DESC;
The optimal way to handle that query is in a non-partitioned table with
INDEX(serial_number, dt)
Even better is to change the PRIMARY KEY. Assuming you currently have id AUTO_INCREMENT because there is not a unique combination of columns suitable for being a "natural PK",
PRIMARY KEY(serial_number, dt, id), -- to optimize that query
INDEX(id) -- to keep AUTO_INCREMENT happy
If there are other queries that are run often, please provide them; this may hurt them. In large tables, it is a juggling task to find the optimal index(es).
Other Comments:
There are very few use cases for which partitioning actually speed up processing.
Making lots of 'identical' tables is a maintenance nightmare, and, again, not a performance benefit. There are probably a hundred Q&A on stackoverflow shouting not to do such.
By having serial_number first in the PRIMARY KEY, all queries referring to a single serial_number are likely to benefit.
A million serial_numbers? No problem.
One common use case for partitioning involves purging "old" data. This is because big DELETEs are much more costly than DROP PARTITION. That involves PARTITION BY RANGE(TO_DAYS(dt)). If you are interested in that, my PK suggestion still stands. (And the query in question will run about the same speed with or without this partitioning.)
How many months before the table outgrows your disk? (If this will be an issue, let's discuss it.)
Do you need 8-byte DOUBLE? FLOAT has about 7 significant digits of precision and takes only 4 bytes.
You are using InnoDB?
Is serial_number fixed at 20 characters? If not, use VARCHAR. Also, CHARACTER SET ascii may be better than the default of utf8?
Each table (or each partition of a table) involves at least one file that the OS must deal with. When you have "too many", the OS groans, often before MySQL groans. (It is hard to make either "die" of overdose.)
Addressing the query
PRIMARY KEY (`id`),
KEY `device_sn_2` (`dt`,`device_sn`),
KEY `dt` (`dt`),
KEY `data` (`data`) USING BTREE,
KEY `test_udp_new_device_sn_dt_index` (`device_sn`,`dt`),
KEY `test_udp_new_device_sn_data_dt_index` (`device_sn`,`data`,`dt`)
-->
PRIMARY KEY(`device_sn`,`dt`, id),
INDEX(id)
KEY `dt_sn` (`dt`,`device_sn`),
KEY `data` (`data`) USING BTREE,
Notes:
By starting the PK with device_sn, dt, you get the clustering benefits to make the query with WHERE device_sn = .. AND dt BETWEEN ...
INDEX(id) is to keep AUTO_INCREMENT happy.
When you have INDEX(a,b), INDEX(a) is redundant.
The (20) is meaningless; id will max out at about 4 billion.
I tossed the last index because it is probably helped enough by the new PK.
lng decimal(10,5) -- Don't need 5 decimal places to left of point; only need 3 or 2. So: lat decimal(7,5),lng decimal(8,5)`. This will save a total of 3 bytes per row.

Poor query with table scans occasionally takes hours on MariaDB

My application uses a MariaDB database which I try to keep isolated, but one particular user goes straight to the database and started complaining today after 6 weeks without incident that one of their queries slowed down from 5 mins (which I thought was bad enough) to over 120 mins.
Since then today it has sometimes been as fast as usual, sometimes slowing down again.
This is their query:
SELECT MAX(last_updated) FROM data_points;
This is the table:
CREATE TABLE data_points (
seriesId INT UNSIGNED NOT NULL,
modifiedDate DATE NOT NULL,
valueDate DATE NOT NULL,
value DOUBLE NOT NULL,
created DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
last_updated DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP()
ON UPDATE CURRENT_TIMESTAMP,
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
CONSTRAINT pk_data PRIMARY KEY (seriesId, modifiedDate, valueDate),
KEY ix_data_modifieddate (modifiedDate),
KEY ix_data_id (id),
CONSTRAINT fk_data_seriesid FOREIGN KEY (seriesId)
REFERENCES series(id)
) ENGINE=InnoDB
DEFAULT CHARSET=utf8mb4
COLLATE=utf8mb4_unicode_ci
MAX_ROWS=222111000;
and this is the EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE data_points ALL NULL NULL NULL NULL 224166191
The table has approx 250M rows and is growing relatively fast.
I can coerce the user into doing something more sensible but in the short term I'm keen to understand why the query duration is going crazy today after 6 weeks of calm. I'll accept the first answer that can explain that.
SELECT MAX(last_updated) FROM data_points; is easily optimized:
INDEX(last_updated)
That index will make that MAX be essentially instantaneous. And it will avoid pounding on the disk and cache (see below).
Two things control the un-indexed speed:
The size of the table, which is "growing relatively fast", and
[This is probably what you are fishing for.] How much of the table is cached when the query is run. This can make a 10x difference in the speed. You can partially test this claim thus:
Restart mysqld; time the query; time it again. The first run had to hit the disk a lot (because of the fresh restart); the second may have found everything in RAM.
Another thing that can mess with the timings: If some other 'big' query is run and it bumps blocks of this table out of cache, then the query will again be slow.
Of relevance: Size of table, value of innodb_buffer_pool_size, and amount of RAM.
On an unrelated topic... That PRIMARY KEY (seriesId, modifiedDate, valueDate) seems strange. A PK is must be unique. Dates (datetime, etc) are likely to have multiple entries for the same day/second; so can you be sure of uniqueness? Especially with 2 dates?
(More)
Please explain the meaning of each of the 4 dates. And ask yourself if they are all needed. (About half the bulk of the table is those dates!)
The table has an AUTO_INCREMENT; is it needed by some other table? If not then either it could be removed, or it could be used to assure that the PK is unique.
To better help you, we need to see more of the queries.

Count the number of rows between unix time stamps for each ID

I'm trying to populate some data for a table. The query is being run on a table that contains ~50 million records. The query I'm currently using is below. It counts the number of rows that match the template id and are BETWEEN two unix timestamps:
SELECT COUNT(*) as count FROM `s_log`
WHERE `time_sent` BETWEEN '1346904000' AND '1346993271'
AND `template` = '1'
While the query above does work, performance is rather slow while looping through each template which at times can be in the hundreds. The time stamps are stored as int and are properly indexed. Just to test thing out, I tried running the query below, omitting the time_sent restriction:
SELECT COUNT(*) as count FROM `s_log`
AND `template` = '1'
As expected, it runs very fast, but is obviously not restricting count results inside the correct time frame. How can I obtain a count for a specific template AND restrict that count BETWEEN two unix timestamps?
EXPLAIN:
1 | SIMPLE | s_log | ref | time_sent,template | template | 4 | const | 71925 | Using where
SHOW CREATE TABLE s_log:
CREATE TABLE `s_log` (
`id` int(255) NOT NULL AUTO_INCREMENT,
`email` varchar(255) NOT NULL,
`time_sent` int(25) NOT NULL,
`template` int(55) NOT NULL,
`key` varchar(255) NOT NULL,
`node_id` int(55) NOT NULL,
`status` varchar(55) NOT NULL,
PRIMARY KEY (`id`),
KEY `email` (`email`),
KEY `time_sent` (`time_sent`),
KEY `template` (`template`),
KEY `node_id` (`node_id`),
KEY `key` (`key`),
KEY `status` (`status`),
KEY `timestamp` (`timestamp`)
) ENGINE=MyISAM AUTO_INCREMENT=2078966 DEFAULT CHARSET=latin1
The best index you may have in this case is composite one template + time_sent
CREATE INDEX template_time_sent ON s_log (template, time_sent)
PS: Also as long as all your columns in the query are integer DON'T enclose their values in quotes (in some cases it could lead to issues, at least with older mysql versions)
First, you have to create an index that has both of your columns together (not seperately). Also check your table type, i think it would work great if your table is innoDB.
And lastly, use your WHERE clause in this fashion:
`WHEREtemplate= '1' ANDtime_sent` BETWEEN '1346904000' AND '1346993271'
What this does is first check if template is 1, if it is then it would check for the second condition else skip. This will definitely give you performance-edge
If you have to call the query for each template maybe it would be faster to get all the information with one query call by using GROUP BY:
SELECT template, COUNT(*) as count FROM `s_log`
WHERE `time_sent` BETWEEN 1346904000 AND 1346993271;
GROUP BY template
It's just a guess that this would be faster and you also would have to redesign your code a bit.
You could also try to use InnoDB instead of MyISAM. InnoDB uses a clustered index which maybe performs better on large tables. From the MySQL site:
Accessing a row through the clustered index is fast because the row data is on the same page where the index search leads. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record. (For example, MyISAM uses one file for data rows and another for index records.)
There are some questions on Stackoverflow which discuss the performance between InnoDB and MyISAM:
Should I use MyISAM or InnoDB Tables for my MySQL Database?
Migrating from MyISAM to InnoDB
MyISAM versus InnoDB

Mysql slow perfomance on big table

I have following table with millions rows:
CREATE TABLE `points` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`DateNumber` int(10) unsigned DEFAULT NULL,
`Count` int(10) unsigned DEFAULT NULL,
`FPTKeyId` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`),
KEY `index3` (`FPTKeyId`,`DateNumber`) USING HASH
) ENGINE=InnoDB AUTO_INCREMENT=16755134 DEFAULT CHARSET=utf8$$
As you can see i have created indexes. I donnt know am i do it right may be not.
The problem is queries execute super slow.
Let's take a simple query
SELECT fptkeyid, count FROM points group by fptkeyid
I cannt get result because query aborting by timeout(10 min). What i am doing wrong?
Beware MySQL's stupid behaviour: GROUP BYing implicitly executes ORDER BY.
To prevent this, explicitely add ORDER BY NULL, which prevents unnecessary ordering.
http://dev.mysql.com/doc/refman/5.0/en/select.html says:
If you use GROUP BY, output rows are sorted according to the GROUP BY
columns as if you had an ORDER BY for the same columns. To avoid the
overhead of sorting that GROUP BY produces, add ORDER BY NULL:
SELECT a, COUNT(b) FROM test_table GROUP BY a ORDER BY NULL;
+
http://dev.mysql.com/doc/refman/5.6/en/group-by-optimization.html says:
The most important preconditions for using indexes for GROUP BY are
that all GROUP BY columns reference attributes from the same index,
and that the index stores its keys in order (for example, this is a
BTREE index and not a HASH index).
Your query does not make sense:
SELECT fptkeyid, count FROM points group by fptkeyid
You group by fptkeyid so count is not useful here. There should be an aggregate function. Not a count field. Next that that count is also a MySQL function which makes it not very useful / advisable to use the same name for a field.
Don't you need something like:
SELECT fptkeyid, SUM(`count`) FROM points group by fptkeyid
If not please explain what result you expect from the query.
Created a database with test data, half a million records, to see if I can find something equal to your issue. This is what the explain tells me:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE points index NULL index3 10 NULL 433756
And on the SUM query:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE points index NULL index3 10 NULL 491781
Both queries are done on a laptop (macbook air) within a second, nothing takes long. Inserting though took some time, few minutes to get half a million records. But retrieving and calculating does not.
We need more to answer your question totally complete. Maybe the configuration of the database is wrong, for example almost no memory allocated?
I would personally start with your AUTO_INCREMENT value. You have set it to increase by 16,755,134 for each new record. Your field value is set to INT UNSIGNED which means that the range of values is 0 to 4,294,967,295 (or almost 4.3 billion). This means that you would have only 256 values before the field goes beyond the data type limits thereby compromising the purpose of the PRIMARY KEY INDEX.
You could changed the data type to BIGINT UNSIGNED and you would have a value range of 0 to 18,446,744,073,709,551,615 (or slightly more then 18.4 quintillion) which would allow you to have up to 1,100,960,700,983 (or slightly more then 1.1 trillion) unique values with this AUTO_INCREMENT value.
I would first ask if you really need to have your AUTO_INCREMENT value set to such a large number and if not then I would suggest changing that to 1 (or at least some lower number) as storing the field values as INT vs BIGINT will save considerable disk space within larger tables such as this. Either way, you should get a more stable PRIMARY KEY INDEX which should help improve queries.
I think the problem is your server bandwidth. Having a million rows would probably need at least high megabyte bandwidths.