Partitioning a mySQL table - mysql

I am considering partitioning a mySQL table that has the potential to grow very big. The table as it stands goes like this
DROP TABLE IF EXISTS `uidlist`;
CREATE TABLE IF NOT EXISTS `uidlist` (
`uid` varchar(9) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
`chcs` varchar(16) NOT NULL DEFAULT '',
UNIQUE KEY `uid` (`uid`)
) ENGINE=InnoDB DEFAULT CHARSET=ascii;
where
uid is a sequence of 9 character id strings starting with a lowercase letter
chcs is a checksum that is used internally.
I suspect that the best way to partition this table would be based on the first letter of the uid field. This would give
Partition 1
abcd1234,acbd1234,adbc1234...
Partition 2
bacd1234,bcad1234,bdac1234...
However, I have never done partitioning before I have no idea how to go about it. Is the partitioning scheme I have outlined possible? If so, how do I go about implementing it?
I would much appreciate any help with this.

Check out the manual for start :)
http://dev.mysql.com/tech-resources/articles/partitioning.html
MySQL is pretty feature-rich when it comes to partitioning and choosing the correct strategy depends on your use case (can partitioning help your sequential scans?) and the way your data grows since you don't want any single partition to become too large to handle.
If your data will tend to grow over time somewhat steadily you might want to do a create-date based partitioning scheme so that (for example) all records generated in a single year end up in last partition and previous partitions are never written to - for this to happen you may have to introduce another column to regulate this, see http://dev.mysql.com/doc/refman/5.1/en/partitioning-hash.html.
Added optimization benefit of this approach would be that you can have the most recent partition on a disk with fast writes (a solid state for example) and you can keep the older partitions on a cheaper disk with decent read speed.
Anyway, knowing more about your use case would help people give you more concrete answers (possibly including sql code)
EDIT, also, check out http://www.tokutek.com/products/tokudb-for-mysql/

The main question you need to ask yourself before partitioning is "why". What is the goal you are trying to achieve by partitioning the table?
Since all the table's data will still existing on a single MySQL server and, I assume, new rows will be arriving in "random" order (meaning the partition they'll be inserted into), you won't gain much by partitioning. Your point select queries might be slightly faster, but not likely by much.
The main benefit I've seen using MySQL partitioning is for data that needs to be purged according to a set retention policy. Partitioning data by week or month makes it very easy to delete old data quickly.
It sounds more likely to me that you want to be sharding your data (spreading it across many servers), and since your data design as shown is really just key-value then I'd recommend looking at database solutions that include sharding as a feature.

I have upvoted both of the answers here since they both make useful points. #bbozo - a move to TokuDB is planned but there are constraints that stop it from being made right now.
I am going off the idea of partitioning the uidlist table as I had originally wanted to do. However, for the benefit of anyone who finds this thread whilst trying to do something similiar here is the "how to"
DROP TABLE IF EXISTS `uidlist`;
CREATE TABLE IF NOT EXISTS `uidlist` (
`uid` varchar(9) CHARACTER SET ascii COLLATE ascii_bin NOT NULL ,
`chcs` varchar(16) NOT NULL DEFAULT '',
UNIQUE KEY `uid` (`uid`)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
PARTITION BY RANGE COLUMNS(uid)
(
PARTITION p0 VALUES LESS THAN('f%'),
PARTITION p1 VALUES LESS THAN('k%'),
PARTITION p2 VALUES LESS THAN('p%'),
PARTITION p3 VALUES LESS THAN('u%')
);
which creates four partitions.
I suspect that the long term solution here is to use a key-value store as suggested by #tmcallaghan rather than just stuffing everything into a MySQL table. I will probably post back in due course once I have established what would be the right way to accomplish that.

Related

MySQL Partitioning based on ID and Week

I have a table
CREATE TABLE `acme`.`partitioned_table` (
`id` INT NULL,
`client_id` INT NOT NULL,
`create_datetime` INT NOT NULL,
`some_val` VARCHAR(45) NULL);
I'd like to partition this table in such a way that each client’s data is stored in its own partition based on the client_id AND each partition can only contain data for 1 week based on the create_datetime. This is done so we can drop weekly one week’s worth of data based each client’s own retention policy.
For example, some clients would like to have 3 months of data while others may have longer data retention policies.
I am having a hard time being new to MySQL to come up with a proper partitioning strategy. How can I partition by Week based on the INT column. To throw a curve ball this might be hosted on AWS RDS later.
Many thanks in advance,
M
Your clients x weeks level of partitioning would lead to a lot of partitions. This implies a lot of disk space and queries will be slower.
Your requirement for "separate storage" would be better handled by either separate tables or separate databases.
If you also need to do queries across all clients, we need to discuss things further.
One of the "guidelines" for partitioning is not to partition a table with less than a million rows.
If a client's table is big enough to justify partitioning, see http://mysql.rjweb.org/doc.php/partitionmaint for more discussion. If not big enough, then either simply do the DELETE, or see this for more options: http://mysql.rjweb.org/doc.php/deletebig .
There are a lot of DATETIME functions that are messy if you use INT:
`create_datetime` INT

Implementing a composite index

I've been reading about how a composite index can improve performance but am still a unclear on a few things. I have an INNODB database that has over 20 million entries with 8 data points each. Its performance has dropped substantially in the past few months. The server has 6 cores with 4gb mem which will be increased soon, but there's no indication on the server that I'm running low on mem. INNODB settings have been changed in my.cnf to;
innodb_buffer_pool_size = 1000M
innodb_log_file_size = 147M
These settings have helped in the past. So, my understanding is that many factors can contribute to the performance decrease, including the fact that I originally I had no indexing at all. Indexing methods are predicated on the type of queries that are run. So, this is my table;
cdr_records | CREATE TABLE `cdr_records` (
`dateTimeOrigination` int(11) DEFAULT NULL,
`callingPartyNumber` varchar(50) DEFAULT NULL,
`originalCalledPartyNumber` varchar(50) DEFAULT NULL,
`finalCalledPartyNumber` varchar(50) DEFAULT NULL,
`pkid` varchar(50) NOT NULL DEFAULT '',
`duration` int(11) DEFAULT NULL,
`destDeviceName` varchar(50) DEFAULT NULL,
PRIMARY KEY (`pkid`),
KEY `dateTimeOrigination` (`dateTimeOrigination`),
KEY `callingPartyNumber` (`callingPartyNumber`),
KEY `originalCalledPartyNumber` (`originalCalledPartyNumber`),
KEY `finalCalledPartyNumber` (`finalCalledPartyNumber`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
So, typically, a query will take a value and search callingPartyNumber, originalCalledPartyNumber, and finalCalledPartyNumber to find any entries related to it in the table. So, it wouldn't make any sense to use individual indexes like I have illustrated above because I typically don't run queries like this. However, I have another job in the evenings that is basically;
select * from cdr_records;
In this case, it sounds like it would be a good idea to have another composite index with all columns in it. Am I understanding this correctly?
The benefit of the composite index comes when you need to select/sort/group based on multiple columns, in the same fashion.
I remember there was a very good example with a phone book analogy I read somewhere. As the names in a phone book are ordered alphabetically it is very easy for you to sort through them and find the one you need based on the letters of the name from left to right. You can imagine that is a composite index of the letters in the names.
If the names were ordered only by the first letter and subsequent letters were chaotic (single column index) you would have to go through all records after you find the first letter, which will take a lot of time.
With a composite index, you can start from left to right and very easily find the record you are looking for, this is also the reason why you can't use for example the second or third column of the composite index, because you need the previous ones in order for it to work. Imagine trying to find all names whose third letter is "a" in the phone book, it would be a nightmare, you would need a separate index just for that, which is exactly what you need to do if you need to use a column from a composite index without using other columns from the index before it.
Bear in mind that the phone book example assumes that each letter of the names is a separate column, that could be a little confusing.
The other great strength of the composite indexes are unique composite indexes, which allow you to apply higher logical restrictions on your data that is very handy when you need it. Has nothing to do with performance but I thought it was worth to mention.
In your question your sql has no criteria, so there will be no index used. It is always recommended to use EXPLAIN to see what is going on, you can never be sure!
No, its not a good idea to set a composite index over all fields.
Wich field you are put i one or more index depends on your Querys.
Note:
MySQL can only use one Index per Query and can use composite Index only if all fields from left site on are used.
You not may use all field.
Example:
if you have an index x on the field name, street, number so this index will used when you query (in WHERE)
name or
name and street or
name, street and numer
but not if you search only
street or
street an number.
To find out if your index working well with your query put EXPLAIN before your query and you can see wich indexe are used from your query.

Keeping video viewing statistics breakdown by video time in a database

I need to keep a number of statistics about the videos being watched, and one of them is what parts of the video are being watched most. The design I came up with is to split the video into 256 intervals and keep the floating-point number of views for each of them. I receive the data as a number of intervals the user watched continuously. The problem is how to store them. There are two solutions I see.
Row per every video segment
Let's have a database table like this:
CREATE TABLE `video_heatmap` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`video_id` int(11) NOT NULL,
`position` tinyint(3) unsigned NOT NULL,
`views` float NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `idx_lookup` (`video_id`,`position`)
) ENGINE=MyISAM
Then, whenever we have to process a number of views, make sure there are the respective database rows and add appropriate values to the views column. I found out it's a lot faster if the existence of rows is taken care of first (SELECT COUNT(*) of rows for a given video and INSERT IGNORE if they are lacking), and then a number of update queries is used like this:
UPDATE video_heatmap
SET views = views + ?
WHERE video_id = ? AND position >= ? AND position < ?
This seems, however, a little bloated. The other solution I came up with is
Row per video, update in transactions
A table will look (sort of) like this:
CREATE TABLE video (
id INT NOT NULL AUTO_INCREMENT,
heatmap BINARY (4 * 256) NOT NULL,
...
) ENGINE=InnoDB
Then, upon every time a view needs to be stored, it will be done in a transaction with consistent snapshot, in a sequence like this:
If the video doesn't exist in the database, it is created.
A row is retrieved, heatmap, an array of floats stored in the binary form, is converted into a form more friendly for processing (in PHP).
Values in the array are increased appropriately and the array is converted back.
Row is changed via UPDATE query.
So far the advantages can be summed up like this:
First approach
Stores data as floats, not as some magical binary array.
Doesn't require transaction support, so doesn't require InnoDB, and we're using MyISAM for everything at the moment, so there won't be any need to mix storage engines. (only applies in my specific situation)
Doesn't require a transaction WITH CONSISTENT SNAPSHOT. I don't know what are the performance penalties of those.
I already implemented it and it works. (only applies in my specific situation)
Second approach
Is using a lot less storage space (the first approach is storing video ID 256 times and stores position for every segment of the video, not to mention primary key).
Should scale better, because of InnoDB's per-row locking as opposed to MyISAM's table locking.
Might generally work faster because there are a lot less requests being made.
Easier to implement in code (although the other one is already implemented).
So, what should I do? If it wasn't for the rest of our system using MyISAM consistently, I'd go with the second approach, but currently I'm leaning to the first one. But maybe there are some reasons to favour one approach or another?
Second approach looks tempting at first sight, but it makes queries like "how many views for segment x of video y" unable to use an index on video.heatmap. Not sure if this is a real-life concern for you though. Also, you would have to parse back and forth the entire array every time you need data for one segment only.
But first and foremost, your second solution is hackish (but interesting nonetheless). I wouldn't recommend denormalising your database until you face an acutal performance issue.
Also, try populating the video_headmap table in advance with wiews = 0 as soon as a video is inserted (a trigger can help).
If space is really a concern, remove your surrogate key video_headmap.id and instead make (video_id, position) the primary key (then get rid of the superfluous UNIQUE constraint). But this shouldn't come into the equation. 256 x 12 bytes per video (rough row length with 3 numeric columns, okay add some for the index) is only an extra 3kb per video!
Finally, nothing prevents you from switching your current table to InnoDB and leverage its row-level locking capability.
Please note I fail to undestand why views cannot be an UNSIGNED INT. I would recommend changing this type.

mysql innodb inner join with longtext very slow

I migrated all MySQL tables of one project from MyISAM to InnoDB last week, in order to support transaction. I used the command of alter table for this.
Most works fine, however one particular query runs very very slow, and it always gives the error Incorrect key file for table '/tmp/#sql_xxxx_x.MYI
Later I narrowed down the problem into the inner join of 2 tables, the user table and agreement table. And the inner join took place between the foreign key field of user (i.e. agreement_id) and primary key field of agreement (i.e. id).
The user table has only 50,000 rows of data, and the agreement table has, well, one single row. And we have set up the index for the agreement_id of user.
In any case, this seems to be a very light-weighted query, but it turns out to be the whole bottle neck.
Here is the full schema of agreement:
CREATE TABLE IF NOT EXISTS `agreement` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`remark` varchar(200) NOT NULL,
`content` longtext NOT NULL,
`is_active` tinyint(1) NOT NULL,
`date_stamp` datetime NOT NULL,
PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8 AUTO_INCREMENT=2 ;
One thing I doubt about is the longtext field of remark inside the agreement table, but we did NOT use the field for the inner join, in fact the query is slow even if we did NOT select remark in the query result.
finally, we converted the table of agreement from innoDB back to MyISAM, than everything becomes normal. And the query is finished in less than 1 second.
Now, my question is what actually is going on here? Does that mean once an innoDB table contains any text field, then the table could not be used for inner join?
I wish I could know the real reason so that I could avoid the same problems in the future.
thanks a lot.
This is a famous and tricky one. The most likely cause is that you're out of space in /tmp.
Here is a link I keep in my bookmarks that may help you: http://www.mysqlperformancetuning.com/a-fix-for-incorrect-key-file-for-table-mysql
In my experience, limited though it is, the primary reason for seeing
this error message is because your tmpdir has run out of space. Like
me you'll check how much free space you have: 1Gb, 2Gb, 4Gb. It may
not be enough. And here's why: MySQL can create temporary tables
bigger than that in a matter of seconds, quickly filling up any free
space. Depending on the nature of the query and the size of the
database naturally.
You may also try a REPAIR on your table but to me it is as usefull as breakdancing :/
InnoDB has it's own settings of buffer sizes etc. Check that out and if you can adjust it, go ahead. Just for test try to double them, if it helps, you may want to optimize it. It can make a big difference.
Some links that may help:
http://www.mysqlperformanceblog.com/2007/11/03/choosing-innodb_buffer_pool_size/
http://dev.mysql.com/doc/refman/5.5/en/innodb-buffer-pool.html
Maybe problem here is remark field defined as varchar(200)? Remember that temporary and memory tables stores varchar with fixed length. So 50k rows with varchar(200) can consume a lot of memory even if they are all empty.
If this is the problem then you can try one of several things:
If only few rows have value in column then create varchar(200) column with NULL allowed and always use NULL value instead of empty string
Change varchar(200) to text (there is of course drawback - it will always use temporary table on disk)
Maybe you don't need 200 characters? Try to use smaller VARCHAR size
Try to adjust tmp_table_size and max_heap_table_size, so you could handle larger temporary tables in memory http://dev.mysql.com/doc/refman/5.1/en/internal-temporary-tables.html
Use percona server as they support dynamic row format for memory tables http://www.mysqlperformanceblog.com/2011/09/06/dynamic-row-format-for-memory-tables/
What is the purpose of your query? I see from your comment that you only list the user information and nothing from agreement leading me to believe you are looking for users that has an agreement?
Since you are converting between engines it leads me to think you are doing cleanup before adding constraints. If so, consider a left join from the user table instead like:
select user.* from user left join agreement on user.agreement_id = agreement.id where user.agreement_id != 0;
If it's not cleanup, but simply looking for users with an agreement we make it simple
select user.* from user where user.agreement_id != 0;
If the purpose is something else, consider adding an index on user.agreement_id since an inner join may need it for speed. Let us know the real purpose and you may get better help.

Add index on column or not?

I have a table looking like this:
CREATE TABLE `item` (
`id` int(11) NOT NULL auto_increment,
`title` varchar(255),
`my_number` int(10) unsigned default NULL
);
There are a hundreds of thousands of items, and I usually order them by 'my_number'.
Will adding an index on 'my_number' increase performance on queries when I order by this field?
I use MySQL, the table is InnoDB.
I know the answer but it doesn't matter. What you should do in all cases of potential database optimisation is to add the index and test it. And, by that, I mean test all of your affected queries, inserts and updates to ensure you have a performance boost that outdoes any loss of performance elsewhere.
You shouldn't trust my knowledge or anyone else's if the alternative is concrete data that you can gather yourself. Measure, don't guess!
The answer is, by the way: "it should, unless your DBMS is truly brain-dead". Generally, adding indexes will increase the speed of selects but slow down inserts and updates. But I mean "generally" - it's not always the case. A good DBA continuously examines the performance of the DBMS and tunes it where necessary. Databases are not set-and-forget objects, they need to be cared for and nurtured :-)