Partition on composite key in MySQL - mysql

In mysql can I have a composite primary key composed of an auto increment and another field? Also, please critique my “mysql partitioning” logic
To explain further->
I have a query about MySQL partition.
I have to partition a table in MySQL, It has one primary key id.
I have to partition by date field(non-primary,duplicate entries).
Since we cannot partition on duplicate entries, i have created a composite key->(id,date).
How can i create partition in this composite key?
Thanks in Advance...

(This answer assumes InnoDB, not MyISAM. There are differences in the implementation of indexes that make some of my comments incorrect for MyISAM.)
In MySQL, a table's PRIMARY KEY can be composed of multiple fields, including an AUTO_INCREMENT.
The only requirement in MySQL for AUTO_INCREMENT is that it be the first column in some index. Let's look at this example of Posts, where there can be many posts for each user:
PRIMARY KEY(user_id, post_id),
INDEX(post_id)
where post_id is AUTO_INCREMENT, but you could benefit from "clustering" the data by user_id. This clustering would make it more efficient to do queries like
SELECT ... FROM Posts
WHERE user_id = 1234;
Back to your question...
The "partition key" does not have to be unique; so, I don't understant your "cannot partition on duplicate entries".
INDEX(id, date), if you also have PRIMARY KEY(id), is essentially useless. When looking up by id, the PRIMARY KEY(id) gives you perfect access; adding date to an index won't help. When looking up by date, but not id, (id, date) is useless since only the "left" part of a composite index can be used.
Perhaps you are leading to a non-partitioned table with
PRIMARY KEY(date, id),
INDEX(id)
to make date ranges efficient? (Note: partitioning won't help.)
Perhaps you will be doing
SELECT ... WHERE x = 123 AND date BETWEEN ...
In that case this is beneficial:
INDEX(x, date)
Only if you do this can we begin to discuss the utility of partitioning:
WHERE x BETWEEN ...
AND date BETWEEN ...
This needs a "two-dimensional" index, which sort of exists with SPATIAL.
See my discussion of partitioning where I list only 4 use cases for partitioning. It also links to an a discussion on how to use partitioning for 2D.
Bottom Line: You must not discuss partitioning without having a clear picture of what queries it might help. Provide them; then we can discuss further.

Related

Add index when allready is primary

this is myTable
clientId, itemId, sellId
primary is on clientId and there is also btree index no uniqe on sellId
now i have part of very slow query
LEFT OUTER JOIN myTable wl ON wl.itemId= ld.itemId and wl.clientId= #clientId
question is should i create here index combined for both clientId and itemId or since clientId is primary then only for itemId ?
Your question is this: should I put the primary key PK of the table into a multi-column index as the last column?
If you use the InnoDB storage engine, the answer is no. Why not? InnoDB already includes the PK as part of the index.
If you don't use InnoDB -- that is, if you use MyISAM or AriaDB (in MariaDB), the answer is yes.
Still, you should evaluate how well the index helps your query.
ON wl.itemId= ld.itemId and wl.clientId= #clientId
begs for
INDEX(itemId, clientId) -- in either order
However, if either of those columns is the PRIMARY KEY of wl, then no index is needed, nor useful. The PK will provide immediate access to the needed row. The other column cannot do anything other than verify that it matches -- that is eliminate the row from the JOIN.
should i create here index combined for both clientId and itemId
Yes, if neither is UNIQUE (Keep in mind that the PK is 'unique'.)
or since clientId is primary then only for itemId
Almost never will MySQL use two separate indexes. (Keep in mind that the PK is an index.)

How to Create MySql Partitions without primary key?

I have a database which use sequence number as its primary key. Other than there is a column called "date_time" which can be duplicated.
Now I need to make partitions by using date_time as follows.
ALTER TABLE data
PARTITION BY RANGE (TO_DAYS('date_time')) (
PARTITION p20220103 VALUES LESS THAN (TO_DAYS('2022-01-04 00:00:00')),
PARTITION p20220104 VALUES LESS THAN (TO_DAYS('2022-01-05 00:00:00')),
PARTITION p20220105 VALUES LESS THAN MAXVALUE
);
Since the date_time is not a primary key in data table, I couldn't create partitions.
ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function (prefixed columns are not considered).
How should I create partitions without adding date_time as a primary key?
You cannot. The rule is simple, stated in https://dev.mysql.com/doc/refman/8.0/en/partitioning-limitations-partitioning-keys-unique-keys.html:
Every unique key on the table must use every column in the table's partitioning expression.
If the table has a primary key or unique key but that key does not include the column(s) in the partitioning expression, then it cannot enforce uniqueness when you insert a new row without checking every partition for duplicates.
The only way around this, to allow a column like your date_time to be the partitioning expression, is to define the table with no primary or unique key.
This has its own hazards. You may need a unique key so you can address rows individually to update or delete them. Also row-based replication becomes very inefficient if your table has no primary key.
This usually means you cannot partition the table by date_time, or even that you cannot partition the table at all. But this isn't always a bad thing. Partitioning doesn't necessarily give a great benefit. Partitioning can even cause more complexity, because you may have queries that would be bound to search every partition anyway.
Partitioning is not a cure-all, and frequently is a liability.

Mysql table with composite index but not primary key

I need a table to store some ratings, in this table I have a composite index (user_id, post_id) and other column to identify different rating system.
user_id - bigint
post_id - bigint
type - varchar
...
Composite Index (user_id, post_id)
In this table I've not a primary key because the primary need to be unique and the INDEX not need to be unique, in my case univocity is a problem.
For example I can have
INSERT INTO tbl_rate
(user_id,post_id,type)
VALUES
(24,1234,'like'),
(24,1234,'love'),
(24,1234,'other');
The missing of PRIMARY KEY may cause performance problem? My table structure is good or I need to change it?
Thank you
A few points:
It sounds like you are just using what is currently unique about the table and making that as a primary key. That works. And natural keys have some advantages when it comes to querying because of locality. (The data for each user is stored in the same area). And because the table is clustered by that key which eliminates lookups to the data if you are searching by the columns in the primary.
But, using a natural primary key like you chose has disadvantages for performance as well.
Using a very large primary key will make all other indexes very large in innodb because the primary key is included in each index value.
Using a natural primary key isn't as fast as a surrogate key for INSERT's because in addition to being bigger it can't just insert at the end of the table each time. It has to insert in the section for that user and post etc.
Also, if u are searching by time most likely you will be seeking all over the table with a natural key unless time is your first column. surrogate keys tend to be local for time and can often be just right for some queries.
Using a natural key like yours as a primary key can also be annoying. What if you want to refer to a particular vote? You need a few fields. Also it's a little difficult to use with lots of ORMs.
Here's the Answer
I would create your own surrogate key and use it as a primary key rather than rely on innodb's internal primary key because you'll be able to use it for updates and lookups.
ALTER TABLE tbl_rate
ADD id INT UNSIGNED NOT NULL AUTO_INCREMENT,
ADD PRIMARY KEY(id);
But, if you do create a surrogate primary key, I'd also make your key a UNIQUE. Same cost but it enforces correctness.
ALTER TABLE tbl_rate
ADD UNIQUE ( user_id, post_id, type );
The missing of PRIMARY KEY may cause performance problem?
Yes in InnoDB for sure, as InnoDB will use a algorithm to create it's own "ROWID",
Which is defined in dict0boot.ic
Returns a new row id.
#return the new id */
UNIV_INLINE
row_id_t
dict_sys_get_new_row_id(void)
/*=========================*/
{
row_id_t id;
mutex_enter(&(dict_sys->mutex));
id = dict_sys->row_id;
if (0 == (id % DICT_HDR_ROW_ID_WRITE_MARGIN)) {
dict_hdr_flush_row_id();
}
dict_sys->row_id++;
mutex_exit(&(dict_sys->mutex));
return(id);
}
The main problem in that code is mutex_enter(&(dict_sys->mutex)); which blocks others threads from accessing if one thread is already running this code.
Meaning it will table lock the same as MyISAM would.
% may take a few nanoseconds. That is insignificant compared to
everything else. Anyway #define DICT_HDR_ROW_ID_WRITE_MARGIN 256
Indeed yes Rick James this is indeed insignificant compared to what was mentioned above.
The C/C++ compiler would micro optimize it more to to get even more performance out off it by making the CPU instructions lighter.
Still the main performance concern is mentioned above..
Also the modulo operator (%) is a CPU heavy instruction.
But depening on the C/C++ compiler (and/or configuration options) if might be optimized if DICT_HDR_ROW_ID_WRITE_MARGIN is a power of two. Like (0 == (id & (DICT_HDR_ROW_ID_WRITE_MARGIN - 1))) as bitmasking is much faster, i believe DICT_HDR_ROW_ID_WRITE_MARGIN indeed had a number which is a power of 2

SQL Index on foreign key

When I join 2 tables tbl1, tbl2 on column1, where column1 is primary key on tbl1. Assuming that column1 is not automatically indexed should I create an index on both tbl1.column1 and tbl2.column1 or just on tbl2.column1. Are the number of rows of each table affect that choice?
A primary key is automatically indexed. There is no way around that (this is how the "unique" part of the unique constraint is implemented). So, tbl1.column1 has an index. No other index is needed.
As for tbl2.column2, you should probably have an index on that. MySQL does create an index if you explicitly declare a foreign key relationship. So, with the explicit declaration, no other index is necessary. Note: this is not true of all databases.
The presence of indexes does not change the results of queries nor the number of rows in the table, so I don't understand your final question. Indexes implement relational integrity and improve (hopefully!) performance on certain types of queries
Generally yes, because often you'll want to do the reverse of the join at some point.

Should I create index on a SQL table column used frequently in WHERE select clause?

So I wonder should I add non-clustered index to a non-unique values column in SQL 2008 R2 table.
Simplified Example:
SELECT Id, FirstName, LastName, City
FROM Customers
WHERE City = 'MyCity'
My understanding is that the primary key [Id] should be the clustered index.
Can non-clustered index be added to the non-unique column [City] ?
Is this going to improve performance or I should not bother at all.
Thanks.
I was thinking to do clustered index as:
CREATE UNIQUE CLUSTERED INDEX IDX_Customers_City
ON Customers (City, Id);
or non-clustered, assuming there is already clustered index on that table.
CREATE NONCLUSTERED INDEX IX_Customers_City
ON Customers (City, Id);
In reality I am dealing with millions of records table. The Select statement returns 0.1% to 5% of the records
Generally yes - you would usually make the clustered index on the primary key.
The exception to this is when you never make lookups based on the primary key, in which case putting the clustered index on another column might be more pertinent.
You should generally add non-clustered indexes to columns that are used as foreign keys, providing there's a reasonably amount of diversity on that column, which I'll explain with an example.
The same applies to columns being used in where clauses, order by etc.
Example
CREATE TABLE Gender (
GenderId INT NOT NULL PRIMARY KEY CLUSTERED
Value NVARCHAR(50) NOT NULL)
INSERT Gender(Id, Value) VALUES (1, 'Male'), (2, 'Female')
CREATE TABLE Person (
PersonId INT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED,
Name NVARCHAR(50) NOT NULL,
GenderId INT NOT NULL FOREIGN KEY REFERENCES Gender(GenderId)
)
CREATE TABLE Order (
OrderId INT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED,
OrderDate DATETIME NOT NULL DEFAULT GETDATE(),
OrderTotal DECIMAL(14,2) NOT NULL,
OrderedByPersonId INT NOT NULL FOREIGN KEY REFERENCES Person(PersonId)
)
In this simple set of tables it would be a good idea to put an index on The OrderedByPersonId Column of the Order table, as you are very likely to want to retrieve all the orders for a given person, and it is likely to have a high amount of diversity.
By a high amount of diversity (or selectiveness) I mean that if you have say 1000 customers, each customer is only likely to have 1 or 2 orders each, so looking up all the values from the order table with a given OrderedByPersonId will result in only a very small proportion of the total records in that table being returned.
By contrast there's not much point in putting an index on the GenderId column in the Person table, as it will have a very low diversity. The query optimiser would not use such an index, and INSERT/UPDATE statements would be a marginally slower because of the extra need to maintain the index.
Now to go back to your example - the answer would have to be "it depends". If you have hundreds of cities in your database then yes, it might be a good idea to index that column
If however you only have 3 or 4 cities, then no - don't bother. As a guideline I might say if the selectivity of the column is 0.9 or higher (ie a where clause selecting a single value in the column would result in only 10% or less of the rows being returned) an index might help, but this is by no means a hard and fast figure!
Even if the column is very selective/diverse you might not bother indexing it if queries are only made very infrequently on it.
One of the easiest things to do though is try your queries with the execution plan displayed in SQL management studio. It will suggest indexes for you if the query optimiser thinks that they'll make a positive impact.
Hope that helps!
If you use the query frequently or if you sort by city regularly in on-line applications specially if your table is dense or has a large row size, it makes sense to add an index. Too many indexes slow down your insert and update. An evaluation of the actual value would only be appreciated when you have significant data in the table.