I have this table in one server:
CREATE TABLE `mh` (
`M` char(13) NOT NULL DEFAULT '',
`F` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`D` char(6) DEFAULT NULL,
`A` int(11) DEFAULT NULL,
`DC` char(13) DEFAULT NULL,
`S` char(22) DEFAULT NULL,
`S0` int(11) DEFAULT NULL,
PRIMARY KEY (`F`,`M`),
KEY `IDX_S` (`S`),
KEY `IDX_M` (`M`),
KEY `IDX_A` (`M`,`A`)
) ENGINE=TokuDB DEFAULT CHARSET=latin1;
And the same table but using MyISAM engine in another similar server.
When I execute this query:
CREATE TEMPORARY TABLE temp
(S VARCHAR(22) PRIMARY KEY)
AS
(
SELECT S, COUNT(S) AS HowManyS
FROM mh
WHERE A = 1 AND S IS NOT NULL
GROUP BY S
);
The table has 120 millions of rows. The server using TokuDB executes the query in 3 hours... the server using MyISAM in 22 minutes.
The query using TokuDB shows a "Queried about 38230000 rows, Fetched about 303929 rows, loading data still remains" status.
Why TokuDB query duration take so long? TokuDB is a really good engine, but I don't know what I'm doing wrong with this query
The servers are using a MariaDB 5.5.38 server
TokuDB is not currently using it's bulk-fetch algorithm on this statement, as noted in https://github.com/Tokutek/tokudb-engine/issues/143. I've added a link to this page so it is considered as part of the upcoming effort.
Related
I'm trying to denormalize a few MySQL tables I have into a new table that I can use to speed up some complex queries with lots of business logic. The problem that I'm having is that there are 2.3 million records I need to add to the new table and to do that I need to pull data from several tables and do a few conversions too. Here's my query (with names changed)
INSERT INTO database_name.log_set_logs
(offload_date, vehicle, jurisdiction, baselog_path, path,
baselog_index_guid, new_location, log_set_name, index_guid)
(
select STR_TO_DATE(logset_logs.offload_date, '%Y.%m.%d') as offload_date,
logset_logs.vehicle, jurisdiction, baselog_path, path,
baselog_trees.baselog_index_guid, new_location, logset_logs.log_set_name,
logset_logs.index_guid
from
(
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 7), '/', -1) as offload_date,
SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 8), '/', -1) as vehicle,
SUBSTRING_INDEX(path, '/', 9) as baselog_path, index_guid,
path, log_set_name
FROM database_name.baselog_and_amendment_guid_to_path_mappings
) logset_logs
left join database_name.log_trees baselog_trees
ON baselog_trees.original_location = logset_logs.baselog_path
left join database_name.baselog_offload_location location
ON location.baselog_index_guid = baselog_trees.baselog_index_guid);
The query itself works because I was able to run it using a filter on log_set_name however that filter's condition will only work for less than 1% of the total records because one of the values for log_set_name has 2.2 million records in it which is the majority of the records. So there is nothing else I can use to break this query up into smaller chunks from what I can see. The problem is that the query is taking too long to run on the rest of the 2.2 million records and it ends up timing out after a few hours and then the transaction is rolled back and nothing is added to the new table for the 2.2 million records; only the 0.1 million records were able to be processed and that was because I could add a filter that said where log_set_name != 'value with the 2.2 million records'.
Is there a way to make this query more performant? Am I trying to do too many joins at once and perhaps I should populate the row's columns in their own individual queries? Or is there some way I can page this type of query so that MySQL executes it in batches? I already got rid of all my indexes on the log_set_logs table because I read that those will slow down inserts. I also jacked my RDS instance up to a db.r4.4xlarge write node. I am also using MySQL Workbench so I increased all of it's timeout values to their maximums giving them all nines. All three of these steps helped and were necessary in order for me to get the 1% of the records into the new table but it still wasn't enough to get the 2.2 million records without timing out. Appreciate any insights as I'm not adept to this type of bulk insert from a select.
'CREATE TABLE `log_set_logs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`purged` tinyint(1) NOT NULL DEFAUL,
`baselog_path` text,
`baselog_index_guid` varchar(36) DEFAULT NULL,
`new_location` text,
`offload_date` date NOT NULL,
`jurisdiction` varchar(20) DEFAULT NULL,
`vehicle` varchar(20) DEFAULT NULL,
`index_guid` varchar(36) NOT NULL,
`path` text NOT NULL,
`log_set_name` varchar(60) NOT NULL,
`protected_by_retention_condition_1` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_2` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_3` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_4` tinyint(1) NOT NULL DEFAULT ''1'',
`general_comments_about_this_log` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1736707 DEFAULT CHARSET=latin1'
'CREATE TABLE `baselog_and_amendment_guid_to_path_mappings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`path` text NOT NULL,
`index_guid` varchar(36) NOT NULL,
`log_set_name` varchar(60) NOT NULL,
PRIMARY KEY (`id`),
KEY `log_set_name_index` (`log_set_name`),
KEY `path_index` (`path`(42))
) ENGINE=InnoDB AUTO_INCREMENT=2387821 DEFAULT CHARSET=latin1'
...
'CREATE TABLE `baselog_offload_location` (
`baselog_index_guid` varchar(36) NOT NULL,
`jurisdiction` varchar(20) NOT NULL,
KEY `baselog_index` (`baselog_index_guid`),
KEY `jurisdiction` (`jurisdiction`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1'
'CREATE TABLE `log_trees` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`baselog_index_guid` varchar(36) DEFAULT NULL,
`original_location` text NOT NULL, -- This is what I have to join everything on and since it's text I cannot index it and the largest value is above 255 characters so I cannot change it to a vachar then index it either.
`new_location` text,
`distcp_returncode` int(11) DEFAULT NULL,
`distcp_job_id` text,
`distcp_stdout` text,
`distcp_stderr` text,
`validation_attempt` int(11) NOT NULL DEFAULT ''0'',
`validation_result` tinyint(1) NOT NULL DEFAULT ''0'',
`archived` tinyint(1) NOT NULL DEFAULT ''0'',
`archived_at` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`dir_exists` tinyint(1) NOT NULL DEFAULT ''0'',
`random_guid` tinyint(1) NOT NULL DEFAULT ''0'',
`offload_date` date NOT NULL,
`vehicle` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `baselog_index_guid` (`baselog_index_guid`)
) ENGINE=InnoDB AUTO_INCREMENT=1028617 DEFAULT CHARSET=latin1'
baselog_offload_location has not PRIMARY KEY; what's up?
GUIDs/UUIDs can be terribly inefficient. A partial solution is to convert them to BINARY(16) to shrink them. More details here: http://localhost/rjweb/mysql/doc.php/uuid ; (MySQL 8.0 has similar functions.)
It would probably be more efficient if you have a separate (optionally redundant) column for vehicle rather than needing to do
SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 8), '/', -1) as vehicle
Why JOIN baselog_offload_location? Three seems to be no reference to columns in that table. If there, be sure to qualify them so we know what is where. Preferably use short aliases.
The lack of an index on baselog_index_guid may be critical to performance.
Please provide EXPLAIN SELECT ... for the SELECT in your INSERT and for the original (slow) query.
SELECT MAX(LENGTH(original_location)) FROM .. -- to see if it really is too big to index. What version of MySQL are you using? The limit increased recently.
For the above item, we can talk about having a 'hash'.
"paging the query". I call it "chunking". See http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks . That talks about deleting, but it can be adapted to INSERT .. SELECT since you want to "chunk" the select. If you go with chunking, Javier's comment becomes moot. Your code would be chunking the selects, hence batching the inserts:
Loop:
INSERT .. SELECT .. -- of up to 1000 rows (see link)
End loop
I have partitioned a MySQL table containing 53 rows. Now when I query number of records in all partitions, the records are almost 3 times the expected. Even phpMyAdmin thinks there are 156 records.
Have I done somthing wrong in my table design and partitioning?
Below picture shows count of records in partitions:
phpMyAdmin:
Finally, this is my table:
CREATE TABLE cl_inbox (
id int(11) NOT NULL AUTO_INCREMENT,
user int(11) NOT NULL,
contact int(11) DEFAULT NULL,
sdate timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
body text NOT NULL,
userstatus tinyint(4) NOT NULL DEFAULT 1 COMMENT '0: new, 1:read, 2: deleted',
contactstatus tinyint(4) NOT NULL DEFAULT 0,
class tinyint(4) NOT NULL DEFAULT 0,
attachtype tinyint(4) NOT NULL DEFAULT 0,
attachsrc varchar(255) DEFAULT NULL,
PRIMARY KEY (id, user),
INDEX i_class (class),
INDEX i_contact_user (contact, user),
INDEX i_contactstatus (contactstatus),
INDEX i_user_contact (user, contact),
INDEX i_userstatus (userstatus)
)
ENGINE = INNODB
AUTO_INCREMENT = 69
AVG_ROW_LENGTH = 19972
CHARACTER SET utf8
COLLATE utf8_general_ci
ROW_FORMAT = DYNAMIC
PARTITION BY KEY (`user`)
(
PARTITION partition1 ENGINE = INNODB,
PARTITION partition2 ENGINE = INNODB,
PARTITION partition3 ENGINE = INNODB,
.....
PARTITION partition128 ENGINE = INNODB
);
Those numbers are approximations, just as with SHOW TABLE STATUS and EXPLAIN.
Meanwhile, you will probably find that PARTITION BY KEY provides no performance improvement. If you find otherwise, I would be very interested to hear about it.
I have a MyISAM table (on a Mariadb) with 7 millions rows in it.
CREATE TABLE `mytable` (
`id` bigint(100) unsigned NOT NULL AUTO_INCREMENT,
`x` int(5) unsigned NOT NULL DEFAULT '0',
`y` int(5) unsigned NOT NULL DEFAULT '0',
`value` int(5) unsigned NOT NULL DEFAULT '0'
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=10152508 DEFAULT CHARSET=utf8 PAGE_CHECKSUM=1
When i do
SELECT * FROM mytable WHERE id = 167880;
it takes around 0.272 sec
When i do
UPDATE mytable SET value = 1 WHERE id = 167880;
it takes randomly from 0.200 to 2.5 sec
I was thinking it's because my table have a lot of rows, but still, it shouldn't take that much time to update a row by it's primary key.
Since i did some researchs before posting, here are the checks i've already done :
No duplicate indexes
No others indexes than the primary key "id"
No triggers
Tried to switch to innoDB engine, it was worse (around 6 sec for an update)
Tried to switch to aria engine, it's even worse
Already did OPTIMIZE TABLE;
Config is the default config of last version of Mariadb (fresh install)
Made all theses check while the db was not used by anything else, so no heavy readings during the tests
I think that the problem is the data type you are using for id column.
Using INT rather then BIGINT can make a significant reduction in disk space.
Read this article instead.
http://ronaldbradford.com/blog/bigint-v-int-is-there-a-big-deal-2008-07-18/
Hope it helps
Hello this query is producing this explain which is odd considering I have a index set up for both the columns
'1', 'SIMPLE', 'vtr_video_transactions', 'ALL', 'user_standard,user_date', NULL, NULL, NULL, '5', 'Using where; Using filesort'
CREATE TABLE `vtr_video_transactions` (
`vtr_id` int(11) NOT NULL AUTO_INCREMENT,
`vtr_transaction_id` int(11) unsigned DEFAULT NULL,
`vtr_user_id` int(11) unsigned DEFAULT NULL,
`vtr_standards_id` int(11) unsigned DEFAULT NULL,
`vtr_video_date` datetime DEFAULT NULL,
PRIMARY KEY (`vtr_id`),
KEY `user_standard` (`vtr_user_id`,`vtr_standards_id`),
KEY `user_date` (`vtr_user_id`,`vtr_video_date`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8;
See user_date for the index. I have it set to DESC on MYSQL Workbench. But I get filesort with the explain. Not sure why. Cheers
Data for table
LOCK TABLES `vtr_video_transactions` WRITE;
/*!40000 ALTER TABLE `vtr_video_transactions` DISABLE KEYS */;
INSERT INTO `vtr_video_transactions` VALUES (1,1,1,2,'2015-09-05 17:18:59'),(2,2,1,3,'2015-08-27 19:04:12'),(3,2,1,4,'2015-08-27 18:55:53'),(4,10,1,119,'2015-08-27 19:04:12'),(5,11,1,10,'2015-08-27 19:04:12');
See the manual page here
Indexes are less important for queries on small tables, or big tables
where report queries process most or all of the rows. When a query
needs to access most of the rows, reading sequentially is faster than
working through an index. Sequential reads minimize disk seeks, even
if not all the rows are needed for the query.
Basically I am monitoring slowest query on a website. It turns out they are something like:
INSERT INTO beststat (bestid,period,rawView) VALUES ( 'idX' , 2012 , 1 )
ON DUPLICATE KEY UPDATE rawView = rawView+1
Basically it's a logging table. If the row is already there it updates rawView with a +1
beststat is InnoDB so I have row-level locking and consindering I do a lot of inserts-updates it should be faster than MyISAM.
Anyway that query shouldn't take so long, maybe there is something else wrong. What it could be ?
Of course I have an Unique Index on bestid, period
Additional Info
This table (beststat) currently has ~1mil record and its size is: 68MB. I have 4GB RAM and innodb buffer pool size = 104,857,600. Mysql: 5.1.49-3
CREATE TABLE `beststat` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`bestid` int(11) unsigned NOT NULL,
`period` mediumint(8) unsigned NOT NULL,
`view` mediumint(8) unsigned NOT NULL DEFAULT '0',
`rawView` mediumint(8) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `bestid` (`bestid`,`period`)
) ENGINE=InnoDB AUTO_INCREMENT=2020577 DEFAULT CHARSET=utf8
Notice to faster thing a litte bit i could do somethijng like:
UPDATE beststat SET rawView = rawView + 1 WHERE bestid = idX AND period = 2012;
if (mysql_affected_rows()==0)
INSERT INTO beststat (bestid,period,rawView) VALUES ('idX',2012,1)
So most of time i would run only the first query UPDATE. But I would like to understand why the first, more concise, query is slow.
I found this interesting article... still reading
dealing with big # of rows, i suggest to use load date infile to make query faster.
To further improve the query time, you can consider using memory table as well.