I am trying to partition a table with 60 million rows of data based on year.
Specifications:
MySQL 5.7.1
OS : windows
ALTER TABLE full_data PARTITION BY RANGE (YEAR(ProcessDate))(
PARTITION years VALUES LESS THAN (2019)
) ;
For the past one day, the process is running. Could you please help me to improve the performance.
CREATE TABLE full_data` (
Mobile bigint(11) DEFAULT NULL,
Name varchar(200) DEFAULT NULL,
Barcode varchar(200) DEFAULT NULL,
Batch varchar(200) DEFAULT NULL,
Carton varchar(500) DEFAULT NULL,
Doctype varchar(500) DEFAULT NULL,
Rack varchar(500) DEFAULT NULL,
ProcessDate datetime DEFAULT NULL,
KEY Mobile (Mobile,Barcode),
KEY MobileBarcode (Mobile,Barcode)
) ENGINE=InnoDB DEFAULT CHARSET=utf8; '
Such an ALTER will take hours to copy all the data from the existing table into a temp table, then swap it into place. Disk I/O is taking the time.
But why do that partitioning? Only in rare cases will it provide any performance benefit. How many years do you have? What queries do you run against the table? Please provide SHOW CREATE TABLE.
If you are having performance problems, let's start with EXPLAIN SELECT ...
Related
I am using Mysql 5.6 with ~150 million records in Transaction table (InnodB). As the size is increasing this table is becoming unmanageable (adding column or index) and slow even with required indexing. After searching through internet I found it is appropriate time to partition the table. I am confidant that partitioning will solve following purpose for me
Improve DML statements response time (using partitioning pruning)
Improve archival process
But I am not sure wether (and how) it will improve DDL performance for this table or not. More specifically following DDL's performance.
ALTER TABLE ADD/DROP COLUMN
ALTER TABLE ADD/DROP INDEX
I went through Mysql documentation and internet but unable to find my answer. Can anyone please help me in this or provide any relevant documentation for this.
My table structure is as following
CREATE TABLE `TRANSACTION` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`parent_id` int(11) DEFAULT NULL,
`parent_uuid` char(36) DEFAULT NULL,
`order_number` varchar(64) DEFAULT NULL,
`order_id` int(11) DEFAULT NULL,
`order_uuid` char(36) DEFAULT NULL,
`order_type` char(1) DEFAULT NULL,
`business_id` int(11) DEFAULT NULL,
`store_id` int(11) DEFAULT NULL,
`store_device_id` int(11) DEFAULT NULL,
`source` char(1) DEFAULT NULL COMMENT 'instore, online, order_ahead, etc',
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`flags` int(11) DEFAULT NULL,
`customer_lang` char(2) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `parent_id` (`parent_id`),
KEY `business_id` (`business_id`,`store_id`,`store_device_id`),
KEY `parent_uuid` (`parent_uuid`),
KEY `order_uuid` (`order_uuid`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
And I am partitioning using following statement.
ALTER TABLE TRANSACTION PARTITION BY RANGE (id)
(PARTITION p0 VALUES LESS THAN (5000000) ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN (10000000) ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN MAXVALUE ENGINE = InnoDB)
Thanks!
Partitioning is not a performance panacea. Even the items you mentioned will not speed up; they may even slow down.
Instead, I will critique the table to look for ways to speed up some things.
UUIDs are terrible for performance once the index on it becomes too big to be cached. This is because of its randomness. Possible solutions: compact it into BINARY(16); shrink the table other ways; avoid UUIDs.
Why have both parent_id and parent_uuid??
Shrink the 4-byte INTs to smaller datatypes where practical.
Usually CHAR should be CHARACTER SET ascii (1-byte/character), not utf8mb4 (4 bytes/char).
Caution: 150M is getting remotely close to the 2-billion limit of INT SIGNED. Consider 4B limit of INT UNSIGNED. (Each is 4 bytes.)
Do you ever use created_at or updated_at?
MySQL 8.0.13 has a very fast ADD COLUMN and DROP COLUMN (for limited situations).
5.7.?? has a less-invasive ADD INDEX than previous versions, but I am not sure it applies to partitioned tables.
5.7.4: Online DDL support reduces table rebuild time and permits concurrent DML, which helps reduce user application downtime. For additional information, see Overview of Online DDL.
More importantly, let's see the main queries that are "too slow". There may be composite indexes and/or reformulations of the queries that will speed them up.
There is even a slim chance that partitioning will help but not on the PRIMARY KEY.
I think there are only 4 use cases where partitioning helps performance.
We have a table that will store data every minute per user. We have about 1,000 users. The table only has 11 columns. As mentioned a new record is create for each user each minute, so 1440 records per user per day. The table is highly indexed.
Once created, the data is read and processed via cron jobs every hour.
After 14 days the data is deleted. This is a rolling process.
Generic MySQL wisdom seems to be to use InnoDB for everything, however we have had problems deleting large amounts of data using InnoDB. A memory table is no good as the data must survive a reboot.
Does anyone understand the other MySQL table engines well enough to know is a different type would be better in this scenario?
Here is the table definition:
CREATE TABLE geoc1clo_where.map_data (
MapID bigint(20) NOT NULL AUTO_INCREMENT,
Date date DEFAULT NULL,
DeviceID varchar(128) DEFAULT NULL,
Alarm varchar(255) DEFAULT NULL,
FixTime datetime DEFAULT NULL,
Valid int(1) DEFAULT NULL,
Lat double DEFAULT NULL,
Lon double DEFAULT NULL,
Speed float DEFAULT NULL,
Course float DEFAULT NULL,
Address varchar(512) DEFAULT NULL,
PRIMARY KEY (MapID),
INDEX IDX_map_data (MapID, FixTime, DeviceID),
INDEX IDX_map_data_FixTime (FixTime),
INDEX IDX_map_data2 (DeviceID, FixTime),
INDEX IDX_map_data3 (DeviceID, Date)
)
ENGINE = MYISAM
AUTO_INCREMENT = 98169276
AVG_ROW_LENGTH = 69
CHARACTER SET latin1
COLLATE latin1_swedish_ci;
I have a MySQL database table with more than 34M rows (and growing).
CREATE TABLE `sensordata` (
`userID` varchar(45) DEFAULT NULL,
`instrumentID` varchar(10) DEFAULT NULL,
`utcDateTime` datetime DEFAULT NULL,
`dateTime` datetime DEFAULT NULL,
`data` varchar(200) DEFAULT NULL,
`dataState` varchar(45) NOT NULL DEFAULT 'Original',
`gps` varchar(45) DEFAULT NULL,
`location` varchar(45) DEFAULT NULL,
`speed` varchar(20) NOT NULL DEFAULT '0',
`unitID` varchar(5) NOT NULL DEFAULT '1',
`parameterID` varchar(5) NOT NULL DEFAULT '1',
`originalData` varchar(200) DEFAULT NULL,
`comments` varchar(45) DEFAULT NULL,
`channelHashcode` varchar(12) DEFAULT NULL,
`settingHashcode` varchar(12) DEFAULT NULL,
`status` varchar(7) DEFAULT 'Offline',
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=98772 DEFAULT CHARSET=utf8
I access this table from multiple threads (at least 400 threads) every minute to insert data into the table.
As the table was growing, it was getting slower to read and write the data. One SELECT query used to take about 25 seconds, then I added a unique index
UNIQUE INDEX idx_userInsDate ( userID,instrumentID,utcDateTime)
This reduced the read time from 25 seconds to some milliseconds but it has increased the insert time as it has to update the index for each record.
Also If I run a SELECT query from multiple threads as the same time the queries take too long to return the data.
This is an example query
Select dateTime from sensordata WHERE userID = 'someUserID' AND instrumentID = 'someInstrumentID' AND dateTime between 'startDate' AND 'endDate' order by dateTime asc;
Can someone help me, to improve the table schema or add an effective index to improve the performance, please.
Thank you in advance
A PRIMARY KEY is a UNIQUE key. Toss the redundant UNIQUE(id) !
Is id referenced by any other tables? If not, then get rid of it all together. Instead have just
PRIMARY KEY ( userID, instrumentID, utcDateTime)
That is, if that triple is guaranteed to be unique. You mentioned DST -- use the datatype TIMESTAMP instead of DATETIME. Doing that, you can convert to DATETIME if needed, thereby eliminating one of the columns.
That one index (the PK) takes virtually no space since it is "clustered" with the data in InnoDB.
Your table is awfully fat with all those VARCHARs. For example, status can be reduced to a 1-byte ENUM. Others can be normalized. Things like speed can be either a 4-byte FLOAT or some smaller DECIMAL, depending on how much range and precision you need.
With 34M wide rows, you have probably recently exceeded the cacheability of the RAM you have. By making the row narrower, you will postpone that overflow.
Why attack the indexes? Every UNIQUE (including PRIMARY) index is checked before allowing the row to be inserted. By getting it down to 1 index, that minimizes the cost there. (InnoDB really needs a PRIMARY KEY.)
INT is 4 bytes. Do you have a billion instruments? Maybe instrumentID could be SMALLINT UNSIGNED, which is 2 bytes, with a max of 64K? Think about all the other IDs.
You have 400 INSERTs/minute, correct? That is not bad. If you get to 400/second, we need to have a different talk.
("Fill factor" is not tunable in MySQL because it does not make much difference.)
How much RAM do you have? What is the setting for innodb_buffer_pool_size? Optimal is somewhere around 70% of available RAM.
Let's see your main queries; there may be other issues to address.
It's not the indexes at fault here. It's your data types. As the size of the data on disk grows, the speed of all operations decrease. Indexes can certainly help speed up selects - provided your data is properly structured - but it appears that it isnt
CREATE TABLE `sensordata` (
`userID` int, /* shouldn't this have a foreign key constraint? */
`instrumentID` int,
`utcDateTime` datetime DEFAULT NULL,
`dateTime` datetime DEFAULT NULL,
/* what exactly are you putting here? Are you sure it's not causing any reduncy? */
`data` varchar(200) DEFAULT NULL,
/* your states will be a finite number of elements. They can be represented by constants in your code or a set of values in a related table */
`dataState` int,
/* what's this? Sounds like what you are saving in location */
`gps` varchar(45) DEFAULT NULL,
`location` point,
`speed` float,
`unitID` int DEFAULT '1',
/* as above */
`parameterID` int NOT NULL DEFAULT '1',
/* are you sure this is different from data? */
`originalData` varchar(200) DEFAULT NULL,
`comments` varchar(45) DEFAULT NULL,
`channelHashcode` varchar(12) DEFAULT NULL,
`settingHashcode` varchar(12) DEFAULT NULL,
/* as above and isn't this the same as */
`status` int,
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=98772 DEFAULT CHARSET=utf8
1st of all: Avoid varchars for indexes and especially IDs. Each character position in the varchar generates an own index-entry internally!
2nd: Your select uses dateTime, your index is set to utcDateTime. It will only take userID and instrumentID and ignore the utcDateTime-Part.
Advise: Change your data types for the ids and change your index to match the query (dateTime, not utcDateTime)
Using an index decreases your performance on inserts, unluckily, there is nothing such as a fill factor for indexes in mysql right now. So the best thing you can do is try the indexes to be as small as possible.
Another approach on heavily loaded databases with random access would be: write to an unindexed table, read from an indexed one. At a given time, build the indexes and swap the tables (may require a third table for the index creation while leaving the other ones untouched in between).
I am working with mysql querying a table that has 12 millions registers that are a year of the said data.
The query has to select certain kind of data (coin, enterprise, type, etc..) and then provide a daily average for certain fields of that data, so we can graph it afterwards.
The dream its to be able to do this in real time, so with a response time less than 10 secs, however at the moment its not looking bright at all as its taking between 4 to 6 minutes.
For example, one of the where querys come up with 150k registers, split about 500 per day, and then we average three fields (which are not on the where clause) using a AVG() and GroupBy.
Now, to the raw data, the query is
SELECT
`Valorizacion`.`fecha`, AVG(tir) AS `tir`, AVG(tirBase) AS `tirBase`, AVG(precioPorcentajeValorPar) AS `precioPorcentajeValorPar`
FROM `Valorizacion` USE INDEX (ix_mercado2)
WHERE
(Valorizacion.fecha >= '2011-07-17' ) AND
(Valorizacion.fecha <= '2012-07-18' ) AND
(Valorizacion.plazoResidual >= 365 ) AND
(Valorizacion.plazoResidual <= 3650000 ) AND
(Valorizacion.idMoneda_cache IN ('UF')) AND
(Valorizacion.idEmisorFusionado_cache IN ('ABN AMRO','WATTS', ...)) AND
(Valorizacion.idTipoRA_cache IN ('BB', 'BE', 'BS', 'BU'))
GROUP BY `Valorizacion`.`fecha` ORDER BY `Valorizacion`.`fecha` asc;
248 rows in set (4 min 28.82 sec)
The index is made over all the where clause fields in the order
(fecha, idTipoRA_cache, idMoneda_cache, idEmisorFusionado_cache, plazoResidual)
Selecting the "where" registers, without using group by or AVG
149670 rows in set (58.77 sec)
And selecting the registers, grouping and just doing a count(*) istead of average takes
248 rows in set (35.15 sec)
Which probably its because it doesnt need to go to the disk to search for the data but its obtained directly from the index queries.
So as far as it goes im of the idea of telling my boss "Im sorry but it cant be done", but before doing so i come to you guys asking if you think there is something i could do to improve this. I think i could improve the search by index time moving the index with the biggest cardinality to the front and so on, but even after that the time that takes to access the disk for each record and do the AVG seems too much.
Any ideas?
-- EDIT, the table structure
CREATE TABLE `Valorizacion` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`idInstrumento` int(11) NOT NULL,
`fecha` date NOT NULL,
`tir` decimal(10,4) DEFAULT NULL,
`tirBase` decimal(10,4) DEFAULT NULL,
`plazoResidual` double NOT NULL,
`duracionMacaulay` double DEFAULT NULL,
`duracionModACT365` double DEFAULT NULL,
`precioPorcentajeValorPar` decimal(20,15) DEFAULT NULL,
`valorPar` decimal(20,15) DEFAULT NULL,
`convexidad` decimal(20,15) DEFAULT NULL,
`volatilidad` decimal(20,15) DEFAULT NULL,
`montoCLP` double DEFAULT NULL,
`tirACT365` decimal(10,4) DEFAULT NULL,
`tipoVal` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
`idEmisorFusionado_cache` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
`idMoneda_cache` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
`idClasificacionRA_cache` int(11) DEFAULT NULL,
`idTipoRA_cache` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
`fechaPrepagable_cache` date DEFAULT NULL,
`tasaEmision_cache` decimal(10,4) DEFAULT NULL,
PRIMARY KEY (`id`,`fecha`),
KEY `ix_FechaNemo` (`fecha`,`idInstrumento`) USING BTREE,
KEY `ix_mercado_stackover` (`idMoneda_cache`,`idTipoRA_cache`,`idEmisorFusionado_cache`,`plazoResidual`)
) ENGINE=InnoDB AUTO_INCREMENT=12933194 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Selecting 150K records out of 12M records and performing aggregate functions on them will not be fast no matter what you try to do.
You are probably dealing with primarily historical data as your sample query is for a year of data. A better approach may be to pre-calculate your daily averages and put them into separate tables. Then you may query those tables for reporting, graphs, etc. You will need to decide when and how to run such calculations so that you don't need to re-run them again on the same data.
When your requirement is to do analysis and reporting on millions of historical records you need to consider a data warehouse approach http://en.wikipedia.org/wiki/Data_warehouse rather than a simple database approach.
I have a table inside of my mysql database which I constantly need to alter and insert rows into but it continues running slow when I make changes making it difficult because there are over 200k+ entries. I tested another table which has very few rows and it moves quickly, so it's not the server or database itself but that particular table which has a tough time. I need all of the table's rows and cannot find a solution to get around the load issues.
DROP TABLE IF EXISTS `articles`;
/*!40101 SET #saved_cs_client = ##character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `articles` (
`id` int(11) NOT NULL auto_increment,
`content` text NOT NULL,
`author` varchar(255) NOT NULL,
`alias` varchar(255) NOT NULL,
`topic` varchar(255) NOT NULL,
`subtopics` varchar(255) NOT NULL,
`keywords` text NOT NULL,
`submitdate` timestamp NOT NULL default CURRENT_TIMESTAMP,
`date` varchar(255) NOT NULL,
`day` varchar(255) NOT NULL,
`month` varchar(255) NOT NULL,
`year` varchar(255) NOT NULL,
`time` varchar(255) NOT NULL,
`ampm` varchar(255) NOT NULL,
`ip` varchar(255) NOT NULL,
`score_up` int(11) NOT NULL default '0',
`score_down` int(11) NOT NULL default '0',
`total_score` int(11) NOT NULL default '0',
`approved` varchar(255) NOT NULL,
`visible` varchar(255) NOT NULL,
`searchable` varchar(255) NOT NULL,
`addedby` varchar(255) NOT NULL,
`keyword_added` varchar(255) NOT NULL,
`topic_added` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
KEY `score_up` (`score_up`),
KEY `score_down` (`score_down`),
FULLTEXT KEY `SEARCH` (`content `),
FULLTEXT KEY `asearch` (`author`),
FULLTEXT KEY `topic` (`topic`),
FULLTEXT KEY `keywords` (`content `,`keywords`,`topic`,`author`),
FULLTEXT KEY `content ` (`content `,`keywords`),
FULLTEXT KEY `new` (`keywords`),
FULLTEXT KEY `author` (`author`)
) ENGINE=MyISAM AUTO_INCREMENT=290823 DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = #saved_cs_client */;
With indexes it depends:
more indexes = faster selecting, slower inserting
less indexes = slower selecting, faster inserting
Because the index tables has to be rebuild when inserting and the more data in the table is the more work is for mysql to do to rebuild the index.
So maybe you could remove indexes you not need, that should speed your inserting up.
Another option is to partition you table into many - this stops the bottle neck.
Just try to pass the changes in an update script. This is slow because it creates tables. try updating the tables where changes has been made.
For example create a variable that catches all the changes in the program, with that, insert it to the tables query. That should be fast enough for programs. But as we all know speed depends on how much data is processed.
Let me know if you need anything else.
This may or may not help you directly, but I notice that you have a lot of VARCHAR(255) columns in your table. Some of them seem like they might be totally unnecessary — do you really need all those date / day / month / year / time / ampm columns? — and many could be replaced by more compact datatypes:
Dates could be stored as a DATETIME (or TIMESTAMP).
IP addresses could be stored as INTEGERs, or as BINARY(16) for IPv6.
Instead of storing usernames in the article table, you should create a separate user table and reference it using INTEGER keys.
I don't know what the approved, visible and searchable fields are, but I bet they don't need to be VARCHAR(255)s.
I'd also second Adrian Cornish's suggestion to split your table. In particular, you really want to keep frequently changing and frequently accessed metadata, such as up/down vote scores, separate from rarely changing and infrequently accessed bulk data like article content. See for example http://20bits.com/articles/10-tips-for-optimizing-mysql-queries-that-dont-suck/
"I have a table inside of my mysql database which I constantly need to alter and insert rows into but it continues"
Try innodb on this table if you application performs A LOT update, insert concurrently there, row level locking $$$
I recommend you to split that "big table"(not that big actually, but for MySQL it may be) in several tables to make the most of the query cache. Any time you update some record in that table, the query cache is erased. Also you can try to reduce the isolation level, but that is a little more complicated.