I have following table with millions rows:
CREATE TABLE `points` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`DateNumber` int(10) unsigned DEFAULT NULL,
`Count` int(10) unsigned DEFAULT NULL,
`FPTKeyId` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`),
KEY `index3` (`FPTKeyId`,`DateNumber`) USING HASH
) ENGINE=InnoDB AUTO_INCREMENT=16755134 DEFAULT CHARSET=utf8$$
As you can see i have created indexes. I donnt know am i do it right may be not.
The problem is queries execute super slow.
Let's take a simple query
SELECT fptkeyid, count FROM points group by fptkeyid
I cannt get result because query aborting by timeout(10 min). What i am doing wrong?
Beware MySQL's stupid behaviour: GROUP BYing implicitly executes ORDER BY.
To prevent this, explicitely add ORDER BY NULL, which prevents unnecessary ordering.
http://dev.mysql.com/doc/refman/5.0/en/select.html says:
If you use GROUP BY, output rows are sorted according to the GROUP BY
columns as if you had an ORDER BY for the same columns. To avoid the
overhead of sorting that GROUP BY produces, add ORDER BY NULL:
SELECT a, COUNT(b) FROM test_table GROUP BY a ORDER BY NULL;
+
http://dev.mysql.com/doc/refman/5.6/en/group-by-optimization.html says:
The most important preconditions for using indexes for GROUP BY are
that all GROUP BY columns reference attributes from the same index,
and that the index stores its keys in order (for example, this is a
BTREE index and not a HASH index).
Your query does not make sense:
SELECT fptkeyid, count FROM points group by fptkeyid
You group by fptkeyid so count is not useful here. There should be an aggregate function. Not a count field. Next that that count is also a MySQL function which makes it not very useful / advisable to use the same name for a field.
Don't you need something like:
SELECT fptkeyid, SUM(`count`) FROM points group by fptkeyid
If not please explain what result you expect from the query.
Created a database with test data, half a million records, to see if I can find something equal to your issue. This is what the explain tells me:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE points index NULL index3 10 NULL 433756
And on the SUM query:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE points index NULL index3 10 NULL 491781
Both queries are done on a laptop (macbook air) within a second, nothing takes long. Inserting though took some time, few minutes to get half a million records. But retrieving and calculating does not.
We need more to answer your question totally complete. Maybe the configuration of the database is wrong, for example almost no memory allocated?
I would personally start with your AUTO_INCREMENT value. You have set it to increase by 16,755,134 for each new record. Your field value is set to INT UNSIGNED which means that the range of values is 0 to 4,294,967,295 (or almost 4.3 billion). This means that you would have only 256 values before the field goes beyond the data type limits thereby compromising the purpose of the PRIMARY KEY INDEX.
You could changed the data type to BIGINT UNSIGNED and you would have a value range of 0 to 18,446,744,073,709,551,615 (or slightly more then 18.4 quintillion) which would allow you to have up to 1,100,960,700,983 (or slightly more then 1.1 trillion) unique values with this AUTO_INCREMENT value.
I would first ask if you really need to have your AUTO_INCREMENT value set to such a large number and if not then I would suggest changing that to 1 (or at least some lower number) as storing the field values as INT vs BIGINT will save considerable disk space within larger tables such as this. Either way, you should get a more stable PRIMARY KEY INDEX which should help improve queries.
I think the problem is your server bandwidth. Having a million rows would probably need at least high megabyte bandwidths.
Related
I have a table when making a query,
EXPLAIN SELECT `id`
FROM `tblsender`
WHERE `userid` = '6'
AND `astatus` = '1'
AND `sender` = 'ABCDEF'
I am getting USING WHERE even after indexing in all possible ways. Here is my final table structure code.
CREATE TABLE IF NOT EXISTS `tblsender` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`sender` varchar(6) NOT NULL,
`astatus` tinyint(1) NOT NULL DEFAULT '0',
`userid` int(11) unsigned NOT NULL,
PRIMARY KEY (`id`),
KEY `astatus` (`astatus`),
KEY `userid` (`userid`),
KEY `sender` (`sender`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=22975 ;
I even tried full text for sender column but still no luck and I also tried indexing on all where clause columns.
ALTER TABLE `tblsender` ADD INDEX ( `sender` , `astatus` , `userid` ) ;
Still getting using where, how can I properly index this table.
Edit: Explain output for above structure.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE tblsender ref astatus,userid,sender astatus 1 const 1 Using where
and Explain output for all 3 columns together
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE tblsender ref astatus,userid,sender,sender_2 astatus 1 const 1 Using where
You can't effectively predict optimizer behavior on large data sets when testing with small data sets.
As illustrated by the query plans, the multi column index is seen as a candidate, but the optimizer is choosing not to use it in this case. That doesn't mean it won't use it when it will be considered more beneficial.
I can only speculate without seeing your actual data set and perhaps using optimizer tracing, but I'll offer a reasonable speculation.
The optimizer in MySQL is cost-based. It tries to resolve your query in the least costly way possible. Note that rows = 1. This means that the optimizer has concluded that -- statistically, at least -- it expects that only 1 row is going to match in the index on astatus. With key_len = 1, meaning that astatus is only 1 byte wide -- as opposed to the multicolumn index, which is 11 bytes wide (1 + 6 + 4) -- the astatus index looks like a really inexpensive solution, so it decides to go with that index. Using the longer index theoretically means more I/O, therefore more costly, though in this case (because of a small data set) we humans recognize that the cost difference isn't particularly meaningful.
Using where means that for each row actually returned by using that index, the server will need to verify that the rows match the remainder of the WHERE clause, but if we're only expecting approximately 1 row to match, it's no big deal.
I suggest, then, that you do not have cause for concern, because the small size of the current data set is not going to be able to give you useful information in predicting future behavior. In this specific case, Using where is an artifact of the small number of rows in the table.
You need more data. But yes, you do want a multicolumn index here.
I have a monitoring table with the following structure:
CREATE TABLE `monitor_data` (
`monitor_id` INT(10) UNSIGNED NOT NULL,
`monitor_data_time` INT(10) UNSIGNED NOT NULL,
`monitor_data_value` INT(10) NULL DEFAULT NULL,
INDEX `monitor_id_data_time` (`monitor_id`, `monitor_data_time`),
INDEX `monitor_data_time` (`monitor_data_time`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB;
This is a very high traffic table with potentially thousands of rows every minute. Each row belongs to a monitor and contains a value and time (unix_timestamp)
I have three issues:
1.
Suddenly, after a number of months in dev, the table suddenly became very slow. Queries that previously was done under a second could now take up to a minute. I'm using standard settings in my.cnf since this is a dev machine, but the behavior was indeed very strange to me.
2.
I'm not sure that I have optimal indexes. A "normal" query looks like this:
SELECT DISTINCT(md.monitor_data_time), monitor_data_value
FROM monitor_data md
WHERE md.monitor_id = 165
AND md.monitor_data_time >= 1484076760
AND md.monitor_data_time <= 1487271199
ORDER BY md.monitor_data_time ASC;
A EXPLAIN on the query above looks like this:
id;select_type;table;type;possible_keys;key;key_len;ref;rows;Extra
1;SIMPLE;md;range;monitor_id_data_time,monitor_data_time;monitor_id_data_time;8;\N;149799;Using index condition; Using temporary; Using filesort
What do you think about the indexes?
3.
If I leave out the DISTINCT in the query above, I actually get duplicate rows even though there aren't any duplicate rows in the table. Any explanation to this behavior?
Any input is greatly appreciated!
UPDATE 1:
New suggestion on table structure:
CREATE TABLE `monitor_data_test` (
`monitor_id` INT UNSIGNED NOT NULL,
`monitor_data_time` INT UNSIGNED NOT NULL,
`monitor_data_value` INT UNSIGNED NULL DEFAULT NULL,
PRIMARY KEY (`monitor_data_time`, `monitor_id`),
INDEX `monitor_data_time` (`monitor_data_time`)
) COLLATE='utf8_general_ci' ENGINE=InnoDB;
SELECT DISTINCT(md.monitor_data_time), monitor_data_value
is the same as
SELECT DISTINCT md.monitor_data_time, monitor_data_value
That is, the pair is distinct. It does not dedup just the time. Is that what you want?
If you are trying to de-dup just the time, then do something like
SELECT time, AVG(value)
...
GROUP BY time;
For optimal performance of
WHERE md.monitor_id = 165
AND md.monitor_data_time >= 14840767604 ...
you need
PRIMARY KEY (monitor_id, monitor_data_time)
and it must be in that order. The opposite order is much less useful. The guiding principle is: Start with the '=', then move on to the 'range'. More discussion here.
Do you have 4 billion monitor_id values? INT takes 4 bytes; consider using a smaller datatype.
Do you have other queries that need optimizing? It is better to design the index(es) after gather all the important queries.
Why PK
In InnoDB, the PRIMARY KEY is "clustered" with the data. That is, the data is an ordered list of triples: (id, time, value) stored in a B+Tree. Locating id = 165 AND time = 1484076760 is a basic operation of a BTree. And it is very fast. Then scanning forward (that's the "+" part of "B+Tree") until time = 1487271199 is a very fast operation of "next row" in this ordered list. Furthermore, since value is right there with the id and time, there is no extra effort to get the values.
You can't scan the requested rows any faster. But it requires PRIMARY KEY. (OK, UNIQUE(id, time) would be 'promoted' to be the PK, but let's not confuse the issue.)
Contrast... Given an index (time, id), it would do the scan over the dates fine, but it would have to skip over any entries where id != 165 But it would have to read all those rows to discover they do not apply. A lot more effort.
Since it is unclear what you intended by DISTINCT, I can't continue this detailed discussion of how that plays out. Suffice it to say: The possible rows have been found; now some kind of secondary pass is needed to do the DISTINCT. (It may not even need to do a sort.)
What do you think about the indexes?
The index on (monitor_id,monitor_data_time) seems appropriate for the query. That's suited to an index range scan operation, very quickly eliminating boatloads of rows that need to be examined.
Better would be a covering index that also includes the monitor_data_value column. Then the query could be satisfied entirely from the index, without a need to lookup pages from the data table to get monitor_data_value.
And even better would be having the InnoDB cluster key be the PRIMARY KEY or UNIQUE KEY on the columns, rather than incurring the overhead of the synthetic row identifier that InnoDB creates when an appropriate index isn't defined.
If I wasn't allowing duplicate (monitor_id, monitor_data_time) tuples, then I'd define the table with a UNIQUE index on those non-nullable columns.
CREATE TABLE `monitor_data`
( `monitor_id` INT(10) UNSIGNED NOT NULL
, `monitor_data_time` INT(10) UNSIGNED NOT NULL
, `monitor_data_value` INT(10) NULL DEFAULT NULL
, UNIQUE KEY `monitor_id_data_time` (`monitor_id`, `monitor_data_time`)
) ENGINE=InnoDB
or equivalent, specify PRIMARY in place of UNIQUE and remove the identifier
CREATE TABLE `monitor_data`
( `monitor_id` INT(10) UNSIGNED NOT NULL
, `monitor_data_time` INT(10) UNSIGNED NOT NULL
, `monitor_data_value` INT(10) NULL DEFAULT NULL
, PRIMARY KEY (`monitor_id`, `monitor_data_time`)
) ENGINE=InnoDB
Any explanation to this behavior?
If the query (shown in the question) returns a different number of rows with the DISTINCT keyword, then there must be duplicate (monitor_id,monitor_data_time,monitor_data_value) tuples in the table. There's nothing in the table definition that guarantees us that there aren't duplicates.
There are a couple of other possible explanations, but those explanations are all related to rows being added/changed/removed, and the queries seeing different snapshots, transaction isolation levels, yada, yada. If the data isn't changing, then there are duplicate rows.
A PRIMARY KEY constraint (or UNIQUE KEY constraint non-nullable columns) would guarantee us uniqueness.
Note that DISTINCT is a keyword in the SELECT list. It's not a function. The DISTINCT keyword applies to all expressions in the SELECT list. The parens around md.monitor_date_time are superfluous.
Leaving the DISTINCT keyword out would eliminate the need for the "Using filesort" operation. And that can be expensive for large sets, particularly when the set is too large to sort in memory, and the sort has to spill to disk.
It would be much more efficient to have guaranteed uniqueness, omit the DISTINCT keyword, and return rows in order by the index, preferably the cluster key.
Also, the secondary index monitor_data_time doesn't benefit this query. (There may be other queries that can make effective use of the index, though one suspects that those queries would also make effective use of a composite index that had monitor_data_time as the leading column.
I want to partition a very large table. As the business is growing, partitioning by date isn't really that good because each year the partitions get bigger and bigger. What I'd really like is a partition for every 10 million records.
The Mysql manual show this simple example:
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT NOT NULL,
store_id INT NOT NULL
)
PARTITION BY RANGE (store_id) (
PARTITION p0 VALUES LESS THAN (6),
PARTITION p1 VALUES LESS THAN (11),
PARTITION p2 VALUES LESS THAN (16),
PARTITION p3 VALUES LESS THAN MAXVALUE
);
But this means that everything larger than 16 and less than MAXVALUE gets thrown in the last partition. Is there a way to auto-generate a new partition every interval (in my case, 10 million records) so I won't have to keep modifying an active database? I am running Mysql 5.5
Thanks!
EDIT: Here is my actual table
CREATE TABLE `my_table` (
`row_id` int(11) NOT NULL AUTO_INCREMENT,
`filename` varchar(50) DEFAULT NULL,
`timestamp` datetime DEFAULT NULL,
`unit_num` int(3) DEFAULT NULL,
`string` int(3) DEFAULT NULL,
`voltage` float(6,4) DEFAULT NULL,
`impedance` float(6,4) DEFAULT NULL,
`amb` float(6,2) DEFAULT NULL,
`ripple_v` float(8,6) DEFAULT NULL,
PRIMARY KEY (`row_id`),
UNIQUE KEY `timestamp` (`timestamp`,`filename`,`string`,`unit_num`),
KEY `index1` (`filename`),
KEY `index2` (`timestamp`),
KEY `index3` (`timestamp`,`filename`,`string`),
KEY `index4` (`filename`,`unit_num`)
) ENGINE=MyISAM AUTO_INCREMENT=690892041 DEFAULT CHARSET=latin1
and an example query for the graph is...
SELECT DATE_FORMAT(timestamp,'%Y/%m/%d %H:%i:%s') as mytime,voltage,impedance,amb,ripple_v,unit_num
FROM my_table WHERE timestamp >= DATE_SUB('2015-07-31 00:05:59', INTERVAL 90 DAY)
AND filename = 'dlrphx10s320upsab3' and unit_num='5' and string='2'ORDER BY timestamp asc;
Here is the explain for the query...
mysql> explain SELECT DATE_FORMAT(timestamp,'%Y/%m/%d %H:%i:%s') as mytime,voltage,impedance,amb,ripple_v,unit_num FROM my_table WHERE timestamp >= DATE_SUB('2015-07-31 00:05:59', INTERVAL 90 DAY) AND filename = 'dlrphx10s320upsab3' and unit_num='5' and string='2'ORDER BY timestamp asc;
+----+-------------+------------+------+-------------------------+--------+---------+-------------+-------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+-------------------------+--------+---------+-------------+-------+----------------------------------------------------+
| 1 | SIMPLE | unit_tarma | ref | timestamp,index3,index4 | index4 | 58 | const,const | 13440 | Using index condition; Using where; Using filesort |
+----+-------------+------------+------+-------------------------+--------+---------+-------------+-------+----------------------------------------------------+
(This answer is directed at the schema and SELECT.)
Since you anticipate millions of rows, first I want to point out some improvements to the schema.
FLOAT(m,n) is usually the 'wrong' thing to do because it leads to two roundings. Either use plain FLOAT (which seems 'right' for metrics like voltage) or use DECIMAL(m,n). FLOAT is 4 bytes; in the cases given, DECIMAL would be 3 or 4 bytes.
When you have both INDEX(a) and INDEX(a,b), the former is unnecessary since the latter can cover for such. You have 3 unnecessary KEYs. This slows down INSERTs.
INT(3) -- Are you saying a "3-digit number"? If so consider TINYINT UNSIGNED (values 0..255) for 1 byte instead of INT for 4 bytes. This will save many MB of disk space, hence speed. (See also SMALLINT, etc, and SIGNED or UNSIGNED.)
If filename is repeated a lot, you may want to "normalize" it. This would save many MB.
Use NOT NULL unless you need NULL for something.
AUTO_INCREMENT=690892041 implies that you are about 1/3 of the way to disaster with id, which will top out at about 2 billion. Do you use id for anything? Getting rid of the column would avoid the issue; and change the UNIQUE KEY to PRIMARY KEY. (If you do need id, let's talk further.)
ENGINE=MyISAM -- Switching has some ramifications, both favorable and unfavorable. The table would become 2-3 times as big. The 'right' choice of PRIMARY KEY would further speed up this SELECT significantly. (And may or may not slow down other SELECTs.)
A note on the SELECT: Since string and unit_num are constants in the query, the last two fields of ORDER BY timestamp asc, string asc, unit_num asc are unnecessary. If they are relevant for reasons not apparent in the SELECT, then my advice may be incomplete.
This
WHERE filename = 'foobar'
AND unit_num='40'
AND string='2'
AND timestamp >= ...
is optimally handled by INDEX(filename, unit_name, string, timestamp). The order of the columns is not important except that timestamp needs to be last. Rearranging the current UNIQUE key, you give you the optimal index. (Meanwhile, none of the indexes is very good for this SELECT.) Making it the PRIMARY KEY and the table InnoDB would make it even faster.
Partitioning? No advantage. Not for performance; not for anything else you have mentioned. A common use for partitioning is for purging 'old'. If you intend to do such, let's talk further.
In huge tables it is best to look at all the important SELECTs simultaneously so that we don't speed up one while demolishing the speed of others. It may even turn out that partitioning helps in this kind of tradeoff.
First, I must ask what benefit Partitioning gives you? Is there some query that runs faster because of it?
There is no auto-partitioning.
Instead, you should have a job that runs every day and it counts the number of rows in the 'last active' partition to see if it is about 10M. If so, add another partition.
I recommend keeping the "last" partition (the one with MAXVALUE) empty. That way you can REORGANIZE PARTITION to split it into two empty partitions with essentially zero overhead. And I recommend that instead of ADD PARTITION because you might slip up and put something in the last partition.
It is unclear what will trigger 10M. Are there multiple rows for each store_id? And are there new rows coming in for each store? If so, then partitioning on store_id since all partitions will be growing all the time.
OK, so store_id was just a lame example from the reference manual. Please provide SHOW CREATE TABLE so we can talk concrete, not hand-waving. There are simply too many ways to take this task.
What is the activity?
If you mostly hit the "recent" partition(s), then an uneven distribution may be warrantied -- periodically add a new partition and combine an adjacent pair of old partitions. (I did this successfully in one system.)
If you will be purging "old" data, then clearly you need to use PARTITION BY RANGE(TO_DAYS(...)) and use DROP PARTITION plus REORGANIZE PARTITION.
And there are lots of other scenarios. But I know of only 4 scenarios where Partitioning provides any performance benefit. See my blog.
I'm trying to populate some data for a table. The query is being run on a table that contains ~50 million records. The query I'm currently using is below. It counts the number of rows that match the template id and are BETWEEN two unix timestamps:
SELECT COUNT(*) as count FROM `s_log`
WHERE `time_sent` BETWEEN '1346904000' AND '1346993271'
AND `template` = '1'
While the query above does work, performance is rather slow while looping through each template which at times can be in the hundreds. The time stamps are stored as int and are properly indexed. Just to test thing out, I tried running the query below, omitting the time_sent restriction:
SELECT COUNT(*) as count FROM `s_log`
AND `template` = '1'
As expected, it runs very fast, but is obviously not restricting count results inside the correct time frame. How can I obtain a count for a specific template AND restrict that count BETWEEN two unix timestamps?
EXPLAIN:
1 | SIMPLE | s_log | ref | time_sent,template | template | 4 | const | 71925 | Using where
SHOW CREATE TABLE s_log:
CREATE TABLE `s_log` (
`id` int(255) NOT NULL AUTO_INCREMENT,
`email` varchar(255) NOT NULL,
`time_sent` int(25) NOT NULL,
`template` int(55) NOT NULL,
`key` varchar(255) NOT NULL,
`node_id` int(55) NOT NULL,
`status` varchar(55) NOT NULL,
PRIMARY KEY (`id`),
KEY `email` (`email`),
KEY `time_sent` (`time_sent`),
KEY `template` (`template`),
KEY `node_id` (`node_id`),
KEY `key` (`key`),
KEY `status` (`status`),
KEY `timestamp` (`timestamp`)
) ENGINE=MyISAM AUTO_INCREMENT=2078966 DEFAULT CHARSET=latin1
The best index you may have in this case is composite one template + time_sent
CREATE INDEX template_time_sent ON s_log (template, time_sent)
PS: Also as long as all your columns in the query are integer DON'T enclose their values in quotes (in some cases it could lead to issues, at least with older mysql versions)
First, you have to create an index that has both of your columns together (not seperately). Also check your table type, i think it would work great if your table is innoDB.
And lastly, use your WHERE clause in this fashion:
`WHEREtemplate= '1' ANDtime_sent` BETWEEN '1346904000' AND '1346993271'
What this does is first check if template is 1, if it is then it would check for the second condition else skip. This will definitely give you performance-edge
If you have to call the query for each template maybe it would be faster to get all the information with one query call by using GROUP BY:
SELECT template, COUNT(*) as count FROM `s_log`
WHERE `time_sent` BETWEEN 1346904000 AND 1346993271;
GROUP BY template
It's just a guess that this would be faster and you also would have to redesign your code a bit.
You could also try to use InnoDB instead of MyISAM. InnoDB uses a clustered index which maybe performs better on large tables. From the MySQL site:
Accessing a row through the clustered index is fast because the row data is on the same page where the index search leads. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record. (For example, MyISAM uses one file for data rows and another for index records.)
There are some questions on Stackoverflow which discuss the performance between InnoDB and MyISAM:
Should I use MyISAM or InnoDB Tables for my MySQL Database?
Migrating from MyISAM to InnoDB
MyISAM versus InnoDB
I have several tables in MySQL in wich are stored chronological data. I added covering index for this tables with date field in the end. In my queries i'm selecting data for some period using BETWEEN operation for date field. So my WHERE statement consists from all fields from covering index.
When i'm executing EXPLAIN query in Extra column i have "Using where" - so, as i think, it means, that date field doesn't searched in index. When i'm selecting data for one period - i'm using "=" operation instead of BETWEEN and "Using where" doesn't appear - all searched in index.
What can i do, to all my WHERE statement to be searched in index, containing BETWEEN operation?
UPDATE:
table structure:
CREATE TABLE phones_stat (
id_site int(10) unsigned NOT NULL,
group smallint(5) unsigned NOT NULL,
day date NOT NULL,
id_phone mediumint(8) unsigned NOT NULL,
sessions int(10) unsigned NOT NULL,
PRIMARY KEY (id_site,group,day,id_phone) USING BTREE
) ;
query:
SELECT id_phone,
SUM(sessions) AS cnt
FROM phones_stat
WHERE id_site = 25
AND group = 1
AND day BETWEEN '2010-01-01' AND '2010-01-31'
GROUP BY id_phone
ORDER BY cnt DESC
How many rows do you have? Sometimes an index is not used if the optimizer deems it unnecessary (for instance, if the number of rows in your table(s) is very small). Could you give us an idea of what your SQL looks like?
You could try hinting your index usage and seeing what you get in EXPLAIN, just to confirm that your index is being overlooked, e.g.
http://dev.mysql.com/doc/refman/5.1/en/optimizer-issues.html
If you're GROUPing by id_phone, then a more useful index will be one which starts with that i.e.
... PRIMARY KEY (id_phone, id_site, `group`, day) USING BTREE
If you change the index to that and rerun the query, does it help?