My SQL Table I am trying to insert/update has this definition:
CREATE TABLE `place`.`a_table` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`some_id` bigint(20) NOT NULL,
`someOther_id` bigint(20) NOT NULL,
`some_value` text,
`re_id` bigint(20) NOT NULL DEFAULT '0',
`up_id` bigint(20) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `some_id_key` (`some_id`),
KEY `some_id_index1` (`some_id`,`someOther_id`),
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8mb4;
As you can see, some_id and someOther_id share an index.
I am trying to perform my insert/update statement like the following:
INSERT INTO `a_table` (`re_id`,`some_id`,`someOther_id`,`up_id`,`some_value`) VALUES
(100,181,7,101,'stuff in the memo wow') On DUPLICATE KEY UPDATE
`up_id`=101,`some_value`='sampleValues'
I expect since I did not specify the id, that it will fall back onto the index key (some_id_index1) as the insert/update rule. However, it is only Inserting.
Obviously this is incorrect. What am I doing wrong here?
OK, firstly to help you ask better questions. The literal answer to your question "What am I doing wrong here?" is nothing.
You have a table with an auto-increment primary key and two non-unique secondary indexes. You are inserting rows to that table without specifying the value of the primary key, so MySQL will use the auto-increment rule to give it a unique key, and therefore the ON DUPLICATE KEY condition will never trigger. In fact one could argue that it is redundant.
So the question you have to ask yourself is what do you think should happen. Now Stack Overflow is not a forum, so don't come back adding comments to this question trying to clarify your original question. Instead formulate a new question that makes it clear to those trying to answer exactly what it is that you are asking.
As I see it, there are a number of posibilities:
You want to modify the secondary indexes to be unique. One you do that it will trigger the ON DUPLICATE KEY rule and switch to updating
You actually don't want the auto-increment column at all and some_id should actually be your primary key
You don't understand how database indexes work (specifically at least one of your secondary index is likely unnecessary as the database typically can combine several indexes or use a partial index to optimize your queries anyway, so the second secondary index is more useful being an index of only the someOther_id field unless there is a specific uniqueness constraint that you are enforcing. So for example if you expect multiple rows with the same some_id but only ever one row of a specific someOther_id for any specific some_id then the second secondary index would be required, however in that case the first would not as the database can use the second as a partial index to achieve the same performance optimizations)
I suggest you sit down with a pen and piece of paper away from your computer and try to write down exactly what it is that you want to do in such a way that [pick one of: your gradmother; an eleven year old] can understand. Scrunch up the piece of paper and throw it away until you can write it down in one go without making any mistakes. Then return to the computer and try to code what you have just written.
99 times out of 100 you will actually find that this helps you solve your problem without the need to ask others questions, because 99 times out of 100 our problems are due to our own lack of understanding of the problem itself. Trying to (virtually) explain your problem to either your grandmother or an eleven year old forces you to throw away some assumptions that are blinding you and get to the core of the problem real fast before you hit the eyes glaze over look when they stop paying attention. [I am not saying you actually pair-program with your grandmother/an eleven year old]
Here is one example of such a problem statement that I have imagined for you. It is likely incorrect as I do not know what your specific problem is:
We need a table that provides cross-reference notes about two other tables.
There are different types of cross-reference (we use the column 're_id' to
identify the type of cross-reference) and there are different types of notes
(we use the columns 'up_id' as well as 'someValue') to store the actual notes.
The cross-reference is indicated with two columns 'some_id' which is the id
of the row in the 'some' table and 'someOther_id' which is the id of the row
in the 'someOther' table. There can be only one cross-reference between any one
row in the 'some' table and any one row in the 'someOther' table, but there can
be multiple cross-references from one specific row in the 'some' table to different
rows in the 'someOther' table.
With the above problem statement I would switch the primary key from an auto-increment to instead be a two column primary key on (some_id,someOther_id) and remove all the secondary keys!
But I hope you realize that your actual solution actually is likely different as your problem statement will be different from my guess.
From the MySQL documentation
If a table contains an AUTO_INCREMENT column and INSERT ... UPDATE inserts a row, the LAST_INSERT_ID() function returns the AUTO_INCREMENT value. If the statement updates a row instead, LAST_INSERT_ID() is not meaningful. However, you can work around this by using LAST_INSERT_ID(expr). Suppose that id is the AUTO_INCREMENT column. To make LAST_INSERT_ID() meaningful for updates, insert rows as follows:
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id), c=3;
the same query is working fine, change columns_value in DUPLICATE KEY UPDATE
into actual column some_value
fiddle demo
the problem is
he only UNIQUE constraint on your table should be defined over the columns (up_id).
I just altered your table with unique constraint for the column up_id,
Fiddle_demo
Related
I found this old code and I'm not sure if it's optimized or just doing something silly.
I have a SQL create statement like this:
CREATE TABLE `wp_pmpro_memberships_categories` (
`membership_id` int(11) unsigned NOT NULL,
`category_id` int(11) unsigned NOT NULL,
`modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY `membership_category` (`membership_id`,`category_id`),
UNIQUE KEY `category_membership` (`category_id`,`membership_id`)
);
Is that second UNIQUE KEY there redundant with the PRIMARY KEY on the same 2 columns? Or would the second one help for queries that filter by the category_id first then by the membership_id? Is it being ignored?
I'm trying to remember why I coded it that way, way back when. Seems similar to what this comment is describing: https://dba.stackexchange.com/a/1793/245678
Thanks!
It depends on your query patterns. If you do SELECT, UPDATE, DELETE only on the category_id column, then the 2nd index makes sense but you should omit the membership_id column (redundant) and the UNIQUE constraint.
MySQL will automatically use the PRIMARY KEY index if you use either membership_id or both columns. It doesn't matter in which order these columns appear in your WHERE clauses.
The secondary index does improve performance when going from a "category" to a "membership".
You coded it with those two indexes because some queries start with a "membership" and need to locate a "category"; some queries go the 'other' direction.
That's a well-coded "many-to-many mapping table".
InnoDB provides better performance than MyISAM.
The "Uniqueness" constraint in the UNIQUE key is redundant.
Checking for Uniqueness slows dowing writes by a very small amount. (The constraint must be checked before finishing the update to the index's BTree. A non-unique index can put off the update until later; see "change buffering".)
I like to say this to indicate that I have some reason for the pair of columns being together in the index:
INDEX(`category_id`,`membership_id`)
I discuss the schema pattern here: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
While performing INSERT...ON DUPLICATE KEY UPDATE on InnoDB in MySQL, we are often told to ignore the potential gaps in auto_increment columns. What if such gaps are very likely and cannot be ignored?
As an example, suppose there is one table rating that stores the users' ratings of items. The table scheme is something like
CREATE TABLE rating (
id INT AUTO_INCREMENT PRIMARY KEY,
user_id INT NOT NULL,
item_id INT NOT NULL,
rating INT NOT NULL,
UNIQUE KEY tuple (user_id, item_id),
FOREIGN KEY (user_id) REFERENCES user(id),
FOREIGN KEY (item_id) REFERENCES item(id)
);
It is possible that there are many users and many items, while users may frequently change the ratings of items that they have already rated before. Every time a rating is changed, a gap is created if we use INSERT...ON DUPLICATE KEY UPDATE, otherwise we will have to query twice (do a SELECT first) which is performance harming or check affected rows which cannot accommodate multiple records INSERT.
For some system where 100K users each has rated 10 items and changes half of the ratings every day, the auto_increment id will be exhausted within two years. Then what should we do to prevent it in practice?
Full answer.
Gaps it's ok! Just use bigger id field, for example BIGINT. Don't try to reuse gaps. This is a bad idea. Don't think about performance or optimization in this case. Its waste of time.
Another solution is to make composite key as primary. In your case, you can remove id field, and use pair (user_id, item_id) as primary key.
In case of "rating" the most frequent queries are "delete by user_id", and inserting. So you are not really need this "id" primary key for functionality. But you always need any primary key to be presented in table.
The only drawback of this method is, that now when you want to delete just one row from the table, you will need to use query something like:
DELETE FROM rating WHERE user_id = 123 AND item_id=1234
instead of old
DELETE FROM rating WHERE id = 123
But in this case it isn't hard to change one line of code in your application. Furthermore, in most cases people doesn't needs such functionality.
We work in a large table and we have tables with 100s millions of records in some table. We repeatedly use INSERT IGNORE or INSERT.. ON DUPLICATE KEY. Making the column as unsigned bigint will avoid the id issue.
But I would suggest you to think of long term solution as well. With some known facts.
SELECT and INSERT/UPDATE is quite often faster than INSERT..ON DUPLICATE KEY, again based on you data size and other factors
If you have two unique keys ( or one primary and one unique key), your query might not always predictable. It gives replication error if you use statement based replication.
ID is not the only issue with large tables. If you have table with more than some 300M records, performances degrades drastically. You need to think of partitioning/clustering/sharding your database/tables pretty soon
Personally I would suggest not to use INSERT.. ON DUPLICATE KEY. Read extensively on its usage and performance impact if you are planning for a highly scalable service
Here is my current table;
CREATE TABLE `linkler` (
`link` varchar(256) NOT NULL,
UNIQUE KEY `link` (`link`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
I will only use these 2 queries on the table, SELECT EXISTS (SELECT 1 FROM linkler WHERE link = ?) and INSERT INTO linkler (link) VALUES (?)
I don't know much about indexing databases. Since I won't be adding same thing twice, I thought marking it unique would be a good idea. Is there anything I can do to increase performance? For example, can I do something that rows are always sorted so that mysql can do binary search or something similiar?
Adding a unique index is perfect. Also, since you have a unique index, you don't need to check for existence before you do an insert. You can simply use INSERT IGNORE to insert the row if it doesn't exist (or ignore the error if it does):
INSERT IGNORE INTO linkler (link) VALUES (?)
Whether that will be faster than doing a SELECT/INSERT combination depends on how often you expect to have duplicates.
ETA: If that is the only column in this table, you might want to make it a PRIMARY KEY instead of just a UNIQUE KEY although I don't think it really matters much other than for clarity.
It is my understanding that when I make a table without a primary key that MySQL creates a sort of underlying primary key that it uses internally.
I am working with a table that does not have a primary key, but it would be very useful for my application if I could somehow access this value, assuming it does in fact exist and is retrievable.
So, I am wanting to know if I am correct in believing that such a value exists somewhere and also if it is possible to get that value.
Edit: just to make it clear, it would be very useful for my application for this table to have an incrementing int attribute. Unfortunately, it was not implemented that way. So, I am sort of grasping at straws to find a solution. What I am trying to do is select every nth row in the table (n changes). So, as you can see if there was this key, this would be very simple.
If a table has no primary key then there's no way of specifying a specific row within it because there is no way to uniquely identify an item. Even if you use a query that specifies a specific value for every column that still wouldn't be certain to only return a single row as without a primary key there's nothing to prevent duplicate rows.
However, a primary key is simply a unique index. If a table has a unique index on one or more of its columns and those columns don't accept NULLs then this is the primary key for the table in all but name.
If you table has no unique columns then you've got nothing to go on. You'll have to either make one column or combination of columns (for a composite key) unique, or add a column that serves as the primary key for the table. Fortunately it's relatively easy to add columns to a MySQL table, just add a primary key autoincrement column to the existing table.
This is probably a common situation, but I couldn't find a specific answer on SO or Google.
I have a large table (>10 million rows) of friend relationships on a MySQL database that is very important and needs to be maintained such that there are no duplicate rows. The table stores the user's uids. The SQL for the table is:
CREATE TABLE possiblefriends(
id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY(id),
user INT,
possiblefriend INT)
The way the table works is that each user has around 1000 or so "possible friends" that are discovered and need to be stored, but duplicate "possible friends" need to be avoided.
The problem is, due to the design of the program, over the course of a day, I need to add 1 million rows or more to the table that may or not be duplicate row entries. The simple answer would seem to be to check each row to see if it is a duplicate, and if not, then insert it into the table. But this technique will probably get very slow as the table size increases to 100 million rows, 1 billion rows or higher (which I expect it to soon).
What is the best (i.e. fastest) way to maintain this unique table?
I don't need to have a table with only unique values always on hand. I just need it once-a-day for batch jobs. In this case, should I create a separate table that just inserts all the possible rows (containing duplicate rows and all), and then at the end of the day, create a second table that calculates all the unique rows in the first table?
If not, what is the best way for this table long-term?
(If indexes are the best long-term solution, please tell me which indexes to use)
Add a unique index on (user, possiblefriend) then use one of:
INSERT ... ON DUPLICATE KEY UPDATE ...
INSERT IGNORE
REPLACE
to ensure that you don't get errors when you try to insert a duplicate row.
You might also want to consider if you can drop your auto-incrementing primary key and use (user, possiblefriend) as the primary key. This will decrease the size of your table and also the primary key will function as the index, saving you from having to create an extra index.
See also:
“INSERT IGNORE” vs “INSERT … ON DUPLICATE KEY UPDATE”
A unique index will let you be sure that the field is indeed unique, you can add a unique index like so:
CREATE TABLE possiblefriends(
id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY(id),
user INT,
possiblefriend INT,
PRIMARY KEY (id),
UNIQUE INDEX DefUserID_UNIQUE (user ASC, possiblefriend ASC))
This will also speec up your table access significantly.
Your other issue with the mass insert is a little more tricky, you could use the in-built ON DUPLICATE KEY UPDATE function below:
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
UPDATE table SET c=c+1 WHERE a=1;