Very slow query on mysql table with 35 million rows - mysql

I am trying to figure out why a query is so slow on my MySQL database. I've read various content about MySQL performance, various SO questions, but this stays a riddle for me.
I am using MySQL 5.6.23-log - MySQL Community Server (GPL)
I have a table with roughly 35 million rows.
This table is being inserted to about 5 times / second
The table looks like this:
I have indexes on all the columns except for answer_text
The query I'm running is:
SELECT answer_id, COUNT(1)
FROM answers_onsite a
WHERE a.screen_id=384
AND a.timestamp BETWEEN 1462670000000 AND 1463374800000
GROUP BY a.answer_id
this query takes roughly 20-30 seconds, then gives a result set:
Any insights?
EDIT
as asked, my show create table:
CREATE TABLE 'answers_onsite' (
'id' bigint(20) unsigned NOT NULL AUTO_INCREMENT,
'device_id' bigint(20) unsigned NOT NULL,
'survey_id' bigint(20) unsigned NOT NULL,
'answer_set_group' varchar(255) NOT NULL,
'timestamp' bigint(20) unsigned NOT NULL,
'screen_id' bigint(20) unsigned NOT NULL,
'answer_id' bigint(20) unsigned NOT NULL DEFAULT '0',
'answer_text' text,
PRIMARY KEY ('id'),
KEY 'device_id' ('device_id'),
KEY 'survey_id' ('survey_id'),
KEY 'answer_set_group' ('answer_set_group'),
KEY 'timestamp' ('timestamp'),
KEY 'screen_id' ('screen_id'),
KEY 'answer_id' ('answer_id')
) ENGINE=InnoDB AUTO_INCREMENT=35716605 DEFAULT CHARSET=utf8

ALTER TABLE answers_onsite ADD key complex_index (screen_id,`timestamp`,answer_id);

you can use mysql Partitioning like this :
alter table answers_onsite drop primary key;
alter table answers_onsite add primary key (id, timestamp) partition by HASH(id) partitions 500;
Running the above may take a while depending on the size of your table.

Look at your WHERE clause:
WHERE a.screen_id=384
AND a.timestamp BETWEEN 1462670000000 AND 1463374800000
GROUP BY a.answer_id
I would create a composite index (screen_id, answer_id, timestamp) and run some tests.
You could also try (screen_id, timestamp, answer_id) to see if it performs better.
The BETWEEN clause is known to be slow though, as any range query. So is COUNT on million of rows. I would count once a day and save the result to a 'Stats' table which you can query when you need...obviously if you do not need live data.

Related

Why does MySQL query return zero rows, but after defrag it works?

I have an InnoDB, MySQL table and this query returns zero rows:
SELECT id, display FROM ra_table WHERE parent_id=7266 AND display=1;
However, there are actually 17 rows that should match:
SELECT id, display FROM ra_itable1 WHERE parent_id=7266;
ID display
------------------
1748 1
5645 1
...
There is an index on display (int 1), and ID is the primary key. The table also has several other fields which I'm not pulling in this query.
After noticing this query wasn't working, I defragmented the table and then the first query started working correctly, but only for a time. It seems after a few days, the query stops working again and I have to defragment to fix it.
My question is, why does the fragmented table break this query?
Additional info: MySQL 5.6.27 on Amazon RDS.
CREATE TABLE `ra_table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`parent_id` int(6) NOT NULL,
`display` int(1) NOT NULL,
PRIMARY KEY (`id`),
KEY `parent_id` (`parent_id`),
KEY `display` (`display`),
) ENGINE=InnoDB AUTO_INCREMENT=13302 DEFAULT CHARSET=latin1
ROW_FORMAT=DYNAMIC
There may be a bug in the version you are running.
Meanwhile, change
INDEX(parent_id),
INDEX(display)
to
INDEX(parent_id, display)
By combining them, the query will run faster (and hopefully correctly). An index on a flag (display) is likely to never be used.

MySQL partitioning by table rows

I create a table as below
CREATE TABLE `Archive_MasterLog` (
`LogID` INT(10) NOT NULL AUTO_INCREMENT,
`LogDate` DATETIME NULL,
`AssessorName` VARCHAR(255) NULL,
`TblName` VARCHAR(100) NULL,
PRIMARY KEY (`LogID`),
UNIQUE INDEX `Index_72491D22_3806_4A01` (`LogID`)
)
ENGINE = INNODB;
I want to partitioning this table by number of rows of table ==> every of 100K rows will create a new partition.
How can do it from MySQL?
Why? You will probably gain no benefits from PARTITIONing.
Will you be purging old data? If so, the partition on LogDate. Then we can discuss how to purge.
You have two keys on the same pair of rows, keep the PRIMARY KEY, toss the UNIQUE key.
You have an index on RecordID, but that column does not exist??
The problem comes from the frequently of data. Some months or weeks we have more than 2M rows/month but others month we have less than 10K rows. I reviewed the data and found that the we should partition by LogID
The reasion also comes from the customer. They don't want to change the the key of table.
Here's my solution
CREATE TABLE `ULPAT`.`MasterLog` (
`LogID` INT(10) NOT NULL AUTO_INCREMENT,
`LogDate` DATETIME NULL,
`AssessorName` VARCHAR(255) NULL,
`TblName` VARCHAR(100) NULL,
PRIMARY KEY (`LogID`),
INDEX `LogID` (`LogID`)
)
ENGINE = INNODB
PARTITION BY HASH(mod(ceiling(LogID*0.0000005), 400))
PARTITIONS 400;
I think this is not the best solution but work for me.
Thanks

Optimize MySQL count query with JOIN

I have a query that takes about 20 seconds, I would like to understand if there is a way to optimize it.
Table 1:
CREATE TABLE IF NOT EXISTS `sessions` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `user_id` (`user_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=9845765 ;
And table 2:
CREATE TABLE IF NOT EXISTS `access` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`session_id` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `session_id ` (`session_id `)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=9467799 ;
Now, what I am trying to do is to count all the access connected to all sessions about one user, so my query is:
SELECT COUNT(*)
FROM access
INNER JOIN sessions ON access.session_id=session.id
WHERE session.user_id='6';
It takes almost 20 seconds...and for user_id 6 there are about 3 millions sessions stored.
There is anything I can do to optimize that query?
Change this line from the session table:
KEY `user_id` (`user_id`)
To this:
KEY `user_id` (`user_id`, `id`)
What this will do for you is allow you to complete the query from the index, without going back to the raw table. As it is, you need to do an index scan on the session table for your user_id, and for each item go back to the table to find the id for the join to the access table. By including the id in the index, you can skip going back to the table.
Sadly, this will make your inserts slower into that table, and it seems like this may be a bid deal, given just one user has 3 millions sessions. Sql Server and Oracle would address this by allowing you to include the id column in your index, without actually indexing on it, saving a little work at insert time, and also by allowing you specify a lower fill factor for the index, reducing the need to re-build or re-order the indexes at insert, but MySql doesn't support these.

Should we use the "LIMIT clause" in following example?

There is a structure:
CREATE TABLE IF NOT EXISTS `categories` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`parent_id` int(11) unsigned NOT NULL DEFAULT '0',
`title` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Query_1:
SELECT * FROM `categories` WHERE `id` = 1234
Query_2:
SELECT * FROM `categories` WHERE `id` = 1234 LIMIT 1
I need to get just one row. Since we apply WHERE id=1234 (finding by PRIMARY KEY) obviously that row with id=1234 is only one in whole table.
After MySQL has found the row, whether engine to continue the search when using Query_1?
Thanks in advance.
Look at this SQLFiddle: http://sqlfiddle.com/#!2/a8713/4 and especially View Execution Plan.
You see, that MySQL recognizes the predicate on a PRIMARY column and therefore it does not matter if you add LIMIT 1 or not.
PS: A little more explanation: Look at the column rows of the Execution Plan. The number there is the amount of columns, the query engine thinks, it has to examine. Since the columns content is unique (as it's a primary key), this is 1. Compare it to this: http://sqlfiddle.com/#!2/9868b/2 same schema but without primary key. Here rows says 8. (The execution plan is explained in the German MySQL reference, http://dev.mysql.com/doc/refman/5.1/en/explain.html the English one is for some reason not so detailed.)

Optimization of a query with GROUP BY clause by using indexes

I need to optimize indexes in a table that stores more than 10 Millions rows. The query that is particularly time consuming takes up to 10 seconds to load (when WHERE clause filters only about 2 Millions rows - 8 Millions must be grouped). I have created a few indexes (some of them are complex, some simpler) and tried to find out how to speed this up. Perhaps I'm doing something wrong. MySQL is using optimized_5 index (based on EXPLAIN).
Here is the table's structure and the query:
CREATE TABLE IF NOT EXISTS `geo_reverse` (
`fid` mediumint(8) unsigned NOT NULL,
`tablename` enum('table1','table2') NOT NULL default 'table1',
`geo_continent` varchar(2) NOT NULL,
`geo_country` varchar(2) NOT NULL,
`geo_region` varchar(8) NOT NULL,
`geo_city` mediumint(8) unsigned NOT NULL,
`type` varchar(30) NOT NULL,
PRIMARY KEY (`fid`,`tablename`,`geo_continent`,`geo_country`,`geo_region`,`geo_city`),
KEY `geo_city` (`geo_city`),
KEY `fid` (`fid`),
KEY `geo_region` (`geo_region`,`geo_city`),
KEY `optimized` (`tablename`,`type`,`geo_continent`,`geo_country`,`geo_region`,`geo_city`,`fid`),
KEY `optimized_2` (`fid`,`tablename`),
KEY `optimized_3` (`type`,`geo_city`),
KEY `optimized_4` (`geo_city`,`tablename`),
KEY `optimized_5` (`tablename`,`type`,`geo_city`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
An example query:
SELECT type, COUNT(*) AS objects FROM geo_reverse WHERE tablename = 'table1' AND geo_city IN (5847207,5112771,4916894,...) GROUP BY type
Do you have any idea of how to speed the computation up?
i would use the following index: (geo_city, tablename, type) - geo_city is obviously more selective than tablename, thus it should be on the left. After the condition is applied, the rest should be sorted by type for grouping.