MYSQL improve performance in a very big table - mysql

I have this table
CREATE TABLE llegada (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`emc_id` int(10) unsigned DEFAULT NULL,
`cuartel_id` int(10) unsigned DEFAULT NULL,
`fecha` datetime DEFAULT NULL,
`nro_entrada` int(10) unsigned DEFAULT NULL,
`valor` varchar(10) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `ind_llegada` (`emc_id`,`cuartel_id`,`fecha`,`nro_entrada`)
) ENGINE=MyISAM AUTO_INCREMENT=18822145 DEFAULT CHARSET=latin1
This tablet has approximately 100000000 records. And to improve the performance I would like to partition to this table, in 6 parts depending on the year. But first two problems happens I'm not sure how to do it and not know if would modify the queries made to the table. My idea would not have to modify the query page that accesses the database.
Thanks for advance.

I have never heard of a way to partition sql itself without just creating multiple databases. As you said, you'll need to modify the query page or the way information is stored to the various databases on your site, which you should do anyway because that's going to be a lot of wasted processing time. I'm surprised it hasn't already affected your user experience.

Related

Safely alter a column in MySQL 5.5

I have found myself looking after an old testlink installation, all the people responsible have left and it is years since I did any serious SQL work.
The underlying database is version 5.5.24-0ubuntu0.12.04.1
I do not have all the passwords, but I have enough rights to do a backup without locks;
mysqldump --all-databases --single-transaction -u testlink -p --result-file=dump2.sql
I really do not want to a attempt to restore the data!
We need to increase the length of the name field in testlink, various pages lead me to increasing the length of a field in the nodes_hierarchy table.
The backup yielded this;
CREATE TABLE `nodes_hierarchy` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(100) DEFAULT NULL,
`parent_id` int(10) unsigned DEFAULT NULL,
`node_type_id` int(10) unsigned NOT NULL DEFAULT '1',
`node_order` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `pid_m_nodeorder` (`parent_id`,`node_order`)
) ENGINE=MyISAM AUTO_INCREMENT=184284 DEFAULT CHARSET=utf8;
I have only really one chance to get this right and cannot lose any data. Does this look exactly right?
ALTER TABLE nodes_hierarchy MODIFY name VARCHAR(150) DEFAULT NULL;
That is the correct syntax.
Backup
You should backup the database regardless how safe this operation is. It seems like you are already planning on it. It is unlikely you will have problems. Backup is just an insurance policy to account for unlikely occurrences.
Test table
You seem to have ~200K records. I'd recommend you make a copy of this table by just doing:
CREATE TABLE `test_nodes_hierarchy` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(100) DEFAULT NULL,
`parent_id` int(10) unsigned DEFAULT NULL,
`node_type_id` int(10) unsigned NOT NULL DEFAULT '1',
`node_order` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `test_pid_m_nodeorder` (`parent_id`,`node_order`)
) ENGINE=MyISAM AUTO_INCREMENT=184284 DEFAULT CHARSET=utf8;
Populate test table
Populate the test table with:
insert into test_nodes_hierarchy
select *
from nodes_hierarchy;
Run alter state on this test table
Find how long the alter statement will take on the test table.
ALTER TABLE test_nodes_hierarchy
MODIFY name VARCHAR(150) DEFAULT NULL;
Rename test table
Practice renaming the test table using:
RENAME TABLE test_nodes_hierarchy TO test2_nodes_hierarchy;
Once you know the time it takes, you know what to expect on the main table. If something goes awry, you can replace the drop the nodes_hierarchy table and just rename test_nodes_hierarchy table.
That'll just build confidence around the operation.

Alter table to apply partitioning by key in mysql

I have a table with million of rows and the frequency of growth will probably increase in future, so far about 4.3 million rows are added in a month, causing the database to slow down. I have already applied indexing but it's not really optimizing the speed. Is applying Partitioning to such data favorable?
Also how can I apply partitioning on a table with million of rows? I know it will look something like this
ALTER TABLE gpsloggs
PARTITION BY KEY(DeviceCode)
PARTITIONS 10;
The problem is I was Partitioning on DeviceCode which is not a primary key so partitioning isn't permissible.
DROP TABLE IF EXISTS `gpslogss`;
CREATE TABLE `gpslogss` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`DeviceCode` varchar(255) DEFAULT NULL,
`Latitude` varchar(255) DEFAULT NULL,
`Longitude` varchar(255) DEFAULT NULL,
`Speed` double DEFAULT NULL,
`rowStamp` datetime DEFAULT NULL,
`Date` varchar(255) DEFAULT NULL,
`Time` varchar(255) DEFAULT NULL,
`AlarmCode` int(11) DEFAULT NULL,
PRIMARY KEY `Id` (`Id`) USING BTREE,
KEY `DeviceCode` (`DeviceCode`) USING BTREE
);
So I altered the table and made the table in a new database with 0 records this way and it worked fine
DROP TABLE IF EXISTS `gpslogss`;
CREATE TABLE `gpslogss` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`DeviceCode` varchar(255) DEFAULT NULL,
`Latitude` varchar(255) DEFAULT NULL,
`Longitude` varchar(255) DEFAULT NULL,
`Speed` double DEFAULT NULL,
`rowStamp` datetime DEFAULT NULL,
`Date` varchar(255) DEFAULT NULL,
`Time` varchar(255) DEFAULT NULL,
`AlarmCode` int(11) DEFAULT NULL,
KEY `Id` (`Id`) USING BTREE,
KEY `DeviceCode` (`DeviceCode`) USING BTREE
);
PARTITION BY KEY(DeviceCode)
PARTITIONS 10;
How should I render the code so that I can apply partitioning to the table with million of rows? How should I drop keys and alter the table to apply partitioning without damaging data?
Short answer: Don't.
Long answer: PARTITION BY KEY does not provide any performance benefit (that I know of). And why else use PARTITION?
Other notes:
You should use InnoDB for virtually all tables.
InnoDB tables should have an explicit PRIMARY KEY.
There is a DATETIME datatype; don't use VARCHAR for date or time, and don't split them.
latitude and longitude are numeric; don't use VARCHAR. FLOAT is a likely candidate (precise enough to differentiate vehicles, but not people).
Your real question is about speed. Let's see the slow SELECTs and work backward from them. Adding PARTITIONing is rarely a solution to performance.

Optimizing aggregation on MySQL Table with 850 million rows

I have a query that I'm using to summarize via aggregations.
The table is called 'connections' and has about 843 million rows.
CREATE TABLE `connections` (
`app_id` varchar(16) DEFAULT NULL,
`user_id` bigint(20) DEFAULT NULL,
`time_started_dt` datetime DEFAULT NULL,
`device` varchar(255) DEFAULT NULL,
`os` varchar(255) DEFAULT NULL,
`firmware` varchar(255) DEFAULT NULL,
KEY `app_id` (`bid`),
KEY `time_started_dt` (`time_started_dt`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
When I try to run a query, such as the one below, it takes over 10 hours and I end up killing it. Does anyone see any mistakes that I'm making, of have any suggestions as to how I could optimize the query?
SELECT
app_id,
MAX(time_started_dt),
MIN(time_started_dt),
COUNT(*)
FROM
connections
GROUP BY
app_id
I suggest you create a composite index on (app_id, time_started_dt):
ALTER TABLE connections ADD INDEX(app_id, time_started_dt)
To get that query to perform, you really need a suitable covering index, with app_id as the leading column, e.g.
CREATE INDEX `connections_IX1` ON `connections` (`app_id`,` time_start_dt`);
NOTE: creating the index may take hours, and the operation will prevent insert/update/delete to the table while it is running.
An EXPLAIN will show the proposed execution plan for your query. With the covering index in place, you'll see "Using index" in the plan. (A "covering index" is an index that can be used by MySQL to satisfy a query without having to access the underlying table. That is, the query can be satisfied entirely from the index.)
With the large number of rows in this table, you may also want to consider partitioning.
I have tried your query on randomly generated data (around 1 million rows). Adding PRIMATY KEY will improve performance of your query by 10%.
As already suggested by other people composite index should be added to the table. Index time_started_dt is useless.
CREATE TABLE `connections` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`app_id` varchar(16) DEFAULT NULL,
`user_id` bigint(20) DEFAULT NULL,
`time_started_dt` datetime DEFAULT NULL,
`device` varchar(255) DEFAULT NULL,
`os` varchar(255) DEFAULT NULL,
`firmware` varchar(255) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `composite_idx` (`app_id`,`time_started_dt`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Mysql partition indexing

I want to create a table from batch data for data mining purposes. I will have about 25 million rows of data a day going into this table. There are several indices defined on the table, so the insertion (I do batch insertions) speed is quite slow. With no indices I can stick 40K rows, while with indices it is more like 3-4 K, which makes this whole thing infeasible. So the idea is to partition the data by day, disable the keys, and then do the day's insertions, and reenable the indices. Reenabling indices on a day's worth of data takes, say, 20 minutes, which is fine. This takes me to my question. When you reenable the indices, will it have to recalculate the indices on all partions, or just for that day? It seems clear that for the index that the partitions are on (date in this case), it should be for that day only. But how about the other indices? If it needs to recalculate the indices for all partitions, there is no way it can be done in a reasonable amount of time. Does anyone know?
Show create is like this:
sts | CREATE TABLE `sts` (
`userid` int(10) unsigned DEFAULT NULL,
`urlid` int(10) unsigned DEFAULT NULL,
`geoid` mediumint(8) unsigned DEFAULT NULL,
`cid` mediumint(8) unsigned DEFAULT NULL,
`m` smallint(5) unsigned DEFAULT NULL,
`t` smallint(5) unsigned DEFAULT NULL,
`d` tinyint(3) unsigned DEFAULT NULL,
`requested` int(10) unsigned DEFAULT NULL,
`rate` tinyint(4) DEFAULT NULL,
`mode` varchar(12) DEFAULT NULL,
`session` smallint(5) unsigned DEFAULT NULL,
`sins` smallint(5) unsigned DEFAULT NULL,
`tos` mediumint(8) unsigned DEFAULT NULL,
PRIMARY KEY (userid, urlid, requested),
KEY `id_index` (`m`),
KEY `id_index2` (`t`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
It is not currently partitioned.
You disable/enable index on a table. It means that index will be disabled/enables on all parts of the table.
Consider this scenario for loading new data:
Create a staging table defining all partitions you will need
Load data into staging table without indexes.
Create indexes on this table.
Move partition to target table, which is partitioned same as staging table.
Drop indexes on staging table
To partition your existing data in controllable manner you can use same logic to move data to new partitioned table.

Is there a better index to speed up this query?

The following query is using temporary and filesort. I'd like to avoid that if possible.
SELECT lib_name, description, count(seq_id), floor(avg(size))
FROM libraries l JOIN sequence s ON (l.lib_id=s.lib_id)
WHERE s.is_contig=0 and foreign_seqs=0 GROUP BY lib_name;
The EXPLAIN says:
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,SIMPLE,s,ref,libseq,contigs,contigs,4,const,28447,Using temporary; Using filesort
1,SIMPLE,l,eq_ref,PRIMARY,PRIMARY,4,s.lib_id,1,Using where
The tables look like this:
libraries
CREATE TABLE `libraries` (
`lib_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`lib_name` varchar(30) NOT NULL,
`method_id` int(10) unsigned DEFAULT NULL,
`lib_efficiency` decimal(4,2) unsigned DEFAULT NULL,
`insert_avg` decimal(5,2) DEFAULT NULL,
`insert_high` decimal(5,2) DEFAULT NULL,
`insert_low` decimal(5,2) DEFAULT NULL,
`amtvector` decimal(4,2) unsigned DEFAULT NULL,
`description` text,
`foreign_seqs` tinyint(1) NOT NULL DEFAULT '0' COMMENT '1 means the sequences in this library are not ours',
PRIMARY KEY (`lib_id`),
UNIQUE KEY `lib_name` (`lib_name`)
) ENGINE=InnoDB AUTO_INCREMENT=9 DEFAULT CHARSET=latin1;
sequence
CREATE TABLE `sequence` (
`seq_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`seq_name` varchar(40) NOT NULL DEFAULT '',
`lib_id` int(10) unsigned DEFAULT NULL,
`size` int(10) unsigned DEFAULT NULL,
`add_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`sequencing_date` date DEFAULT '0000-00-00',
`comment` text DEFAULT NULL,
`is_contig` int(10) unsigned NOT NULL DEFAULT '0',
`fasta_seq` longtext,
`primer` varchar(15) DEFAULT NULL,
`gc_count` int(10) DEFAULT NULL,
PRIMARY KEY (`seq_id`),
UNIQUE KEY `seq_name` (`seq_name`),
UNIQUE KEY `libseq` (`lib_id`,`seq_id`),
KEY `primer` (`primer`),
KEY `sgitnoc` (`seq_name`,`is_contig`),
KEY `contigs` (`is_contig`,`seq_name`) USING BTREE,
CONSTRAINT `FK_sequence_1` FOREIGN KEY (`lib_id`) REFERENCES `libraries` (`lib_id`)
) ENGINE=InnoDB AUTO_INCREMENT=61508 DEFAULT CHARSET=latin1 ROW_FORMAT=DYNAMIC;
Are there any changes I can do to make the query go faster? If not, when (for a web application) is it worth putting the results of a query like the above into a MEMORY table?
First strategy: make it faster for mySQL to locate the records you want summarized.
You've already got an index on sequence.is_contig. You might try indexing on libraries.foreign_seqs. I don't know if that will help, but it's worth a try.
Second strategy: see if you can get your sort to run in memory, rather than in a file. Try making the sort_buffer_size parameter bigger. This will consume RAM on your server, but that's what RAM is for.
Third strategy: IF your application needs to do this query a lot but updates the underlying data only a little, take your own suggestion and create a summary table. Perhaps use an EVENT to remake the summary table., and run it once every few minutes. If you're going to follow that strategy, start by creating a view with this table in it and have your app retrieve information from the view. Then get the summary table stuff working, drop the view, and give the summary table the same name as the view. That way your data model work and your application design work can proceed independently of each other.
Final suggestion: If this is truly slowly-changing summary data, switch to myISAM. It's a little faster for this kind of data wrangling.