How can I use MySQL table partitioning on this table? - mysql

I have a table that essentially looks like this:
CREATE TABLE myTable (
id INT auto_increment,
field1 TINYINT,
field2 CHAR(2),
field3 INT,
theDate DATE,
otherStuff VARCHAR(20)
PRIMARY KEY (id)
UNIQUE KEY (field1, field2, field3)
)
I'd like to partition the table based on the month and year of theDate, however the manual is telling me I'm not allowed:
All columns used in the partitioning expression for a partitioned table must be part of every unique key that the table may have. In other words, every unique key on the table must use every column in the table's partitioning expression
What are my options here? Can I still partition the table?

I wrote a blog post about this issue here Scaling Rails with MySQL table partitioning (rails uses integer PKs). The same technique should work in your case, except, unfortunately you have to drop the [field1, field2, field3] unique key.
Dropping a unique key was a problem I dealt with too (though not mentioned in the post). I worked around it by implemented an existence check before creating the record, realizing that we'd still get occasional dupes due to the race condition. In practice, that didn't turn out to be a problem. However, the increased query throughput required scaling up the innodb buffer size 'cause the i/o's we're brutal.

No, not in the current form.
You could remove the primary key and instead just make "id" a normal index. The secondary unique index would still be a problem and you would need to remove this or change the partitioning scheme to involve theDate column.

If you are partitioning based on the year and month of a datetime column, have you thought about having a table for each year-month combo and then using a merge table? I know this doesn't directly answer your question, but thought it may help...

Related

Partition on composite key in MySQL

In mysql can I have a composite primary key composed of an auto increment and another field? Also, please critique my “mysql partitioning” logic
To explain further->
I have a query about MySQL partition.
I have to partition a table in MySQL, It has one primary key id.
I have to partition by date field(non-primary,duplicate entries).
Since we cannot partition on duplicate entries, i have created a composite key->(id,date).
How can i create partition in this composite key?
Thanks in Advance...
(This answer assumes InnoDB, not MyISAM. There are differences in the implementation of indexes that make some of my comments incorrect for MyISAM.)
In MySQL, a table's PRIMARY KEY can be composed of multiple fields, including an AUTO_INCREMENT.
The only requirement in MySQL for AUTO_INCREMENT is that it be the first column in some index. Let's look at this example of Posts, where there can be many posts for each user:
PRIMARY KEY(user_id, post_id),
INDEX(post_id)
where post_id is AUTO_INCREMENT, but you could benefit from "clustering" the data by user_id. This clustering would make it more efficient to do queries like
SELECT ... FROM Posts
WHERE user_id = 1234;
Back to your question...
The "partition key" does not have to be unique; so, I don't understant your "cannot partition on duplicate entries".
INDEX(id, date), if you also have PRIMARY KEY(id), is essentially useless. When looking up by id, the PRIMARY KEY(id) gives you perfect access; adding date to an index won't help. When looking up by date, but not id, (id, date) is useless since only the "left" part of a composite index can be used.
Perhaps you are leading to a non-partitioned table with
PRIMARY KEY(date, id),
INDEX(id)
to make date ranges efficient? (Note: partitioning won't help.)
Perhaps you will be doing
SELECT ... WHERE x = 123 AND date BETWEEN ...
In that case this is beneficial:
INDEX(x, date)
Only if you do this can we begin to discuss the utility of partitioning:
WHERE x BETWEEN ...
AND date BETWEEN ...
This needs a "two-dimensional" index, which sort of exists with SPATIAL.
See my discussion of partitioning where I list only 4 use cases for partitioning. It also links to an a discussion on how to use partitioning for 2D.
Bottom Line: You must not discuss partitioning without having a clear picture of what queries it might help. Provide them; then we can discuss further.

What if `auto_increment` gaps caused by MySQL `INSERT...ON DUPLICATE KEY UPDATE` cannot be ignored?

While performing INSERT...ON DUPLICATE KEY UPDATE on InnoDB in MySQL, we are often told to ignore the potential gaps in auto_increment columns. What if such gaps are very likely and cannot be ignored?
As an example, suppose there is one table rating that stores the users' ratings of items. The table scheme is something like
CREATE TABLE rating (
id INT AUTO_INCREMENT PRIMARY KEY,
user_id INT NOT NULL,
item_id INT NOT NULL,
rating INT NOT NULL,
UNIQUE KEY tuple (user_id, item_id),
FOREIGN KEY (user_id) REFERENCES user(id),
FOREIGN KEY (item_id) REFERENCES item(id)
);
It is possible that there are many users and many items, while users may frequently change the ratings of items that they have already rated before. Every time a rating is changed, a gap is created if we use INSERT...ON DUPLICATE KEY UPDATE, otherwise we will have to query twice (do a SELECT first) which is performance harming or check affected rows which cannot accommodate multiple records INSERT.
For some system where 100K users each has rated 10 items and changes half of the ratings every day, the auto_increment id will be exhausted within two years. Then what should we do to prevent it in practice?
Full answer.
Gaps it's ok! Just use bigger id field, for example BIGINT. Don't try to reuse gaps. This is a bad idea. Don't think about performance or optimization in this case. Its waste of time.
Another solution is to make composite key as primary. In your case, you can remove id field, and use pair (user_id, item_id) as primary key.
In case of "rating" the most frequent queries are "delete by user_id", and inserting. So you are not really need this "id" primary key for functionality. But you always need any primary key to be presented in table.
The only drawback of this method is, that now when you want to delete just one row from the table, you will need to use query something like:
DELETE FROM rating WHERE user_id = 123 AND item_id=1234
instead of old
DELETE FROM rating WHERE id = 123
But in this case it isn't hard to change one line of code in your application. Furthermore, in most cases people doesn't needs such functionality.
We work in a large table and we have tables with 100s millions of records in some table. We repeatedly use INSERT IGNORE or INSERT.. ON DUPLICATE KEY. Making the column as unsigned bigint will avoid the id issue.
But I would suggest you to think of long term solution as well. With some known facts.
SELECT and INSERT/UPDATE is quite often faster than INSERT..ON DUPLICATE KEY, again based on you data size and other factors
If you have two unique keys ( or one primary and one unique key), your query might not always predictable. It gives replication error if you use statement based replication.
ID is not the only issue with large tables. If you have table with more than some 300M records, performances degrades drastically. You need to think of partitioning/clustering/sharding your database/tables pretty soon
Personally I would suggest not to use INSERT.. ON DUPLICATE KEY. Read extensively on its usage and performance impact if you are planning for a highly scalable service

On Duplicate Update does not work for unique index

My SQL Table I am trying to insert/update has this definition:
CREATE TABLE `place`.`a_table` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`some_id` bigint(20) NOT NULL,
`someOther_id` bigint(20) NOT NULL,
`some_value` text,
`re_id` bigint(20) NOT NULL DEFAULT '0',
`up_id` bigint(20) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `some_id_key` (`some_id`),
KEY `some_id_index1` (`some_id`,`someOther_id`),
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8mb4;
As you can see, some_id and someOther_id share an index.
I am trying to perform my insert/update statement like the following:
INSERT INTO `a_table` (`re_id`,`some_id`,`someOther_id`,`up_id`,`some_value`) VALUES
(100,181,7,101,'stuff in the memo wow') On DUPLICATE KEY UPDATE
`up_id`=101,`some_value`='sampleValues'
I expect since I did not specify the id, that it will fall back onto the index key (some_id_index1) as the insert/update rule. However, it is only Inserting.
Obviously this is incorrect. What am I doing wrong here?
OK, firstly to help you ask better questions. The literal answer to your question "What am I doing wrong here?" is nothing.
You have a table with an auto-increment primary key and two non-unique secondary indexes. You are inserting rows to that table without specifying the value of the primary key, so MySQL will use the auto-increment rule to give it a unique key, and therefore the ON DUPLICATE KEY condition will never trigger. In fact one could argue that it is redundant.
So the question you have to ask yourself is what do you think should happen. Now Stack Overflow is not a forum, so don't come back adding comments to this question trying to clarify your original question. Instead formulate a new question that makes it clear to those trying to answer exactly what it is that you are asking.
As I see it, there are a number of posibilities:
You want to modify the secondary indexes to be unique. One you do that it will trigger the ON DUPLICATE KEY rule and switch to updating
You actually don't want the auto-increment column at all and some_id should actually be your primary key
You don't understand how database indexes work (specifically at least one of your secondary index is likely unnecessary as the database typically can combine several indexes or use a partial index to optimize your queries anyway, so the second secondary index is more useful being an index of only the someOther_id field unless there is a specific uniqueness constraint that you are enforcing. So for example if you expect multiple rows with the same some_id but only ever one row of a specific someOther_id for any specific some_id then the second secondary index would be required, however in that case the first would not as the database can use the second as a partial index to achieve the same performance optimizations)
I suggest you sit down with a pen and piece of paper away from your computer and try to write down exactly what it is that you want to do in such a way that [pick one of: your gradmother; an eleven year old] can understand. Scrunch up the piece of paper and throw it away until you can write it down in one go without making any mistakes. Then return to the computer and try to code what you have just written.
99 times out of 100 you will actually find that this helps you solve your problem without the need to ask others questions, because 99 times out of 100 our problems are due to our own lack of understanding of the problem itself. Trying to (virtually) explain your problem to either your grandmother or an eleven year old forces you to throw away some assumptions that are blinding you and get to the core of the problem real fast before you hit the eyes glaze over look when they stop paying attention. [I am not saying you actually pair-program with your grandmother/an eleven year old]
Here is one example of such a problem statement that I have imagined for you. It is likely incorrect as I do not know what your specific problem is:
We need a table that provides cross-reference notes about two other tables.
There are different types of cross-reference (we use the column 're_id' to
identify the type of cross-reference) and there are different types of notes
(we use the columns 'up_id' as well as 'someValue') to store the actual notes.
The cross-reference is indicated with two columns 'some_id' which is the id
of the row in the 'some' table and 'someOther_id' which is the id of the row
in the 'someOther' table. There can be only one cross-reference between any one
row in the 'some' table and any one row in the 'someOther' table, but there can
be multiple cross-references from one specific row in the 'some' table to different
rows in the 'someOther' table.
With the above problem statement I would switch the primary key from an auto-increment to instead be a two column primary key on (some_id,someOther_id) and remove all the secondary keys!
But I hope you realize that your actual solution actually is likely different as your problem statement will be different from my guess.
From the MySQL documentation
If a table contains an AUTO_INCREMENT column and INSERT ... UPDATE inserts a row, the LAST_INSERT_ID() function returns the AUTO_INCREMENT value. If the statement updates a row instead, LAST_INSERT_ID() is not meaningful. However, you can work around this by using LAST_INSERT_ID(expr). Suppose that id is the AUTO_INCREMENT column. To make LAST_INSERT_ID() meaningful for updates, insert rows as follows:
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id), c=3;
the same query is working fine, change columns_value in DUPLICATE KEY UPDATE
into actual column some_value
fiddle demo
the problem is
he only UNIQUE constraint on your table should be defined over the columns (up_id).
I just altered your table with unique constraint for the column up_id,
Fiddle_demo

Faster selects when using GUID vs. Select WHERE?

I have a table with thousands of records. I do a lot of selects like this to find if a person exists.
SELECT * from person WHERE personid='U244A902'
Because the person ID is not pure numerical, I didn't use it as the primary key and went with auto-increment. But now I'm rethinking my strategy, because I think SELECTS are getting slower as the table fills up. I'm thinking the reason behind this slowness is because personid is not the primary key.
So my question, if I were to go through the trouble of restructuring the table and use the personid as the primary key instead, without an auto-increment, would that significantly speed up the selects? I'm talking about a table that has 200,000 records now and will fill up to about 5 million when done.
The slowness is due indirectly to the fact that the personid is not a primary key, in that it isn't indexed because it wasn't defined as a key. The quickest fix is to simply index it:
CREATE UNIQUE INDEX `idx_personid` ON `person` (`personid`);
However, if it is a unique value, it should be the table's primary key. There is no real need for a separate auto_increment key.
ALTER TABLE person DROP the_auto_increment_column;
ALTER TABLE person ADD PRIMARY KEY personid;
Note however, that if you were also using the_auto_increment_column as a FOREIGN KEY in other tables and dropped it in favor of personid, you would need to modify all your other tables to use personid instead. The difficulty of doing so may not be completely worth the gain for you.
You can to create an index to personid.
CREATE INDEX id_index ON person(personidid)
ALTER TABLE `person ` ADD INDEX `index1` (`personid`);
try to index your coloumns on which you are using where clause or selecting the coloumns

using index with mysql table

my mysql database have a table with 3 columns ,
its strucure :
CREATE TABLE `Table` (
`value1` VARCHAR(50) NOT NULL DEFAULT '',
`value2` VARCHAR(50) NOT NULL DEFAULT '',
`value3` TEXT NULL,
`value4` VARCHAR(50) NULL DEFAULT NULL,
`value5` VARCHAR(50) NULL DEFAULT NULL,
PRIMARY KEY (`value1`, `value2`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
ROW_FORMAT=DEFAULT
the first and the second columns are :
varchar(50)
and they both combine the primary key
the third column is
text ,
the table contain about 1000000 records i doing my search using the first column it take minutes
to search a specific item.
how can i index this table to fast my search and what index type to use ?
A primary key of 50+50 characters? What does it contain? Are you should that the table is in 3rd normal form? It sounds that the key itself might contain some information, sounds like an alarm bell to me.
If you can change your primary key with something else much shorter and manageable, there are a few things you can try:
externalise text3 to a different table, matched by the new primary key
analyse your table to determine a more optimised length, rather than 50 chars with SELECT FROM xcve_info PROCEDURE ANALYSE()
change the size of the fields accordingly and if you can afford the extra space change VARCHAR to CHAR
add an index to value1, which probably shouldn't be part of the primary key
Always check the performance of the changes, to see if they were worth it or not.
What is the actual query you're executing? The index will only help if you're searching for a prefix (or exact) match. For example:
SELECT * FROM Table WHERE value1='Foo%'
will find anything that starts with Foo, and should use the index and be relatively quick. On the other hand:
SELECT * FROM Table WHERE value1='%Foo%'
will not use the index and you'll be forced to do a full table scan. If you need to do that, you should use a full-text index and query: http://dev.mysql.com/doc/refman/5.5/en/fulltext-search.html
The only thing I can see that might possibly improve things would be to add a unique index to the first column. This obviously does not work if the first column is not actually unique, and it is questionable if it would be at all more efficient than the already existing primary key. The way I thought this might possibly help is if the unique index on the first column was smaller than the primary key (index scans would be quicker).
Also, you might be able to create an index on parts of your first column, maybe only the 5 or 10 first characters, that could be more efficient.
Also, after deleting and/or inserting lots of values, remember to run ANALYZE TABLE on the affected table, or even OPTIMIZE TABLE. That way, the stats for the MySQL query optimizer are updated.
Always is a bad idea to use such a long strings as indexes, but in case you really need to search it that way consider how are you filtering the query because MySQL can't perform like operations on indexes, so conditions like WHERE value1 LIKE "%mytext%" will never use indexes, instead try searching a shorter string so MySQL can convert that operation into a equality one. For example, use: value1 = "XXXXX" where "XXXXX" is a part of the string. To determine the best length of the comparision string analize the selectivity of your value1 field.
Consider too that multiple field indexes like (value1, value2) won't use the second field unless the first matches exactly. That it's not a bad index, is just so you know and understand how it works.
If that doesn't works, another solution could be store value1 and value2 in a new table (table2 for example) with an auto incremental id field, then add a foreign key from Table to table2 using ids (f.e. my_long_id) and finally create an index on table2 like: my_idx (value1, value2). The search will be something like:
SELECT t1.*
FROM
table2 as t2
INNER JOIN Table as t1 ON (t1.my_long_id = t2.id)
WHERE
t2.value1 = "your_string"
Ensure that table2 has an index like (value1, value2) and that Table has a primary index on (my_long_id).
As final recommendation, add an 'id' field with AUTO_INCREMENT as PRIMARY KEY and (value1, values2) as a unique/regular key. This helps a lot because B-Tree stores sorted indexes, so using a string of 100 chars makes you waste I/O in this sorting. InnoDB determines the best position for that index at insert, probably it will need to move some indexes to another pages in order to get some space for the new one. With an auto incremental value this is easier and cheaper because it will never need to do such movements.
But why are you searching for a unique item on a non-unique column? Why can't you make queries based on your primary key? If for some reason you cannot then I would index value1, the column you are searching on.
CREATE INDEX 'index_name'
ON 'table' (column_name)