Our MySQL web analytics database contains a summary table which is updated throughout the day as new activity is imported. We use ON DUPLICATE KEY UPDATE in order that the summarization overwrites earlier calculations, but are having difficulty because one of the columns in the summary table's UNIQUE KEY is an optional FK, and contains NULL values.
These NULLs are intended to mean "not present, and all such cases are equivalent". Of course, MySQL usually treats NULLs as meaning "unknown, and all such cases are not equivalent".
Basic structure is as follows:
An "Activity" table containing an entry for each session, each belonging to a campaign, with optional filter and transaction IDs for some entries.
CREATE TABLE `Activity` (
`session_id` INTEGER AUTO_INCREMENT
, `campaign_id` INTEGER NOT NULL
, `filter_id` INTEGER DEFAULT NULL
, `transaction_id` INTEGER DEFAULT NULL
, PRIMARY KEY (`session_id`)
);
A "Summary" table containing daily rollups of total number of sessions in activity table, an d the total number of those sessions which contain a transaction ID. These summaries are split up, with one for every combination of campaign and (optional) filter. This is a non-transactional table using MyISAM.
CREATE TABLE `Summary` (
`day` DATE NOT NULL
, `campaign_id` INTEGER NOT NULL
, `filter_id` INTEGER DEFAULT NULL
, `sessions` INTEGER UNSIGNED DEFAULT NULL
, `transactions` INTEGER UNSIGNED DEFAULT NULL
, UNIQUE KEY (`day`, `campaign_id`, `filter_id`)
) ENGINE=MyISAM;
The actual summarization query is something like the following, counting up the number of sessions and transactions, then grouping by campaign and (optional) filter.
INSERT INTO `Summary`
(`day`, `campaign_id`, `filter_id`, `sessions`, `transactions`)
SELECT `day`, `campaign_id`, `filter_id
, COUNT(`session_id`) AS `sessions`
, COUNT(`transaction_id` IS NOT NULL) AS `transactions`
FROM Activity
GROUP BY `day`, `campaign_id`, `filter_id`
ON DUPLICATE KEY UPDATE
`sessions` = VALUES(`sessions`)
, `transactions` = VALUES(`transactions`)
;
Everything works great, except for the summary of cases where the filter_id is NULL. In these cases, the ON DUPLICATE KEY UPDATE clause does not match the existing row, and a new row is written every time. This is due to the fact that "NULL != NULL". What we need, however, is "NULL = NULL" when comparing the unique keys.
I am looking for ideas for workarounds or feedback on those we have come up with so far. Workarounds we have thought of so far follow.
Delete all summary entries containing a NULL key value prior to running the summarization. (This is what we are doing now)
This has the negative side effect of returning results with missing data if a query is executed during the summarization process.
Change the DEFAULT NULL column to DEFAULT 0, which allows the UNIQUE KEY to be matched consistently.
This has the negative side effect of overly complicating the development of queries against the summary table. It forces us to use a lot of "CASE filter_id = 0 THEN NULL ELSE filter_id END", and makes for awkward joining since all of the other tables have actual NULLs for the filter_id.
Create a view which returns "CASE filter_id = 0 THEN NULL ELSE filter_id END", and using this view instead of the table directly.
The summary table contains a few hundred thousand rows, and I've been told view performance is quite poor.
Allow the duplicate entries to be created, and delete the old entries after summarization completes.
Has similar problems to deleting them ahead of time.
Add a surrogate column which contains 0 for NULL, and use that surrogate in the UNIQUE KEY (actually we could use PRIMARY KEY if all columns are NOT NULL).
This solution seems reasonable, except that the example above is only an example; the actual database contains half a dozen summary tables, one of which contains four nullable columns in the UNIQUE KEY. There is concern by some that the overhead is too much.
Do you have a better workaround, table structure, update process or MySQL best practice which can help?
EDIT: To clarify the "meaning of null"
The data in the summary rows containing NULL columns are considered to belong together only in the sense that of being a single "catch-all" row in summary reports, summarizing those items for which that data point does not exist or is unknown. So within the context of the summary table itself, the meaning is "the sum of those entries for which no value is known". Within the relational tables, on the other hand, these truly are NULL results.
The only reason for putting them into a unique key on the summary table is to allow for automatic update (by ON DUPLICATE KEY UPDATE) when re-calculating the summary reports.
Maybe a better way to describe it is by the specific example that one of the summary tables groups results geographically by the zip code prefix of the business address given by the respondent. Not all respondents provide a business address, so the relationship between the transaction and addresses table is quite correctly NULL. In the summary table for this data, a row is generated for each zip code prefix, containing the summary of data within that area. An additional row is generated to show the summary of data for which no zip code prefix is known.
Altering the rest of the data tables to have an explicit "THERE_IS_NO_ZIP_CODE" 0-value, and placing a special record in the ZipCodePrefix table representing this value, is improper--that relationship truly is NULL.
I think something along the lines of (2) is really the best bet — or, at least, it would be if you were starting from scratch. In SQL, NULL means unknown. If you want some other meaning, you really ought to use a special value for that, and 0 is certainly an OK choice.
You should do this across the entire database, not just this one table. Then you shouldn't wind up with weird special cases. In fact, you should be able to get rid of a lot of your current ones (example: currently, if you want the summary row where there is no filter, you have the special case "filter is null" as opposed to the normal case "filter = ?".)
You should also go ahead and create a "not present" entry in the referred-to table as well, to keep the FK constraint valid (and avoid special cases).
PS: Tables w/o a primary key are not relational tables and should really be avoided.
edit 1
Hmmm, in that case, do you actually need the on duplicate key update? If you're doing a INSERT ... SELECT, then you probably do. But if your app is supplying the data, just do it by hand — do the update (mapping zip = null to zip is null), check how many rows were changed (MySQL returns this), if 0 do an insert.
With modern versions of MariaDB (formerly MySQL), upserts can be done simply with insert on duplicate key update statements if you go with surrogate column route #5. Adding MySQL's generated stored columns or MariaDB persistent virtual columns to apply the uniqueness constraint on the nullable fields indirectly keeps nonsense data out of the database in exchange for some bloat.
e.g.
CREATE TABLE IF NOT EXISTS bar (
id INT PRIMARY KEY AUTO_INCREMENT,
datebin DATE NOT NULL,
baz1_id INT DEFAULT NULL,
vbaz1_id INT AS (COALESCE(baz1_id, -1)) STORED,
baz2_id INT DEFAULT NULL,
vbaz2_id INT AS (COALESCE(baz2_id, -1)) STORED,
blam DOUBLE NOT NULL,
UNIQUE(datebin, vbaz1_id, vbaz2_id)
);
INSERT INTO bar (datebin, baz1_id, baz2_id, blam)
VALUES ('2016-06-01', null, null, 777)
ON DUPLICATE KEY UPDATE
blam = VALUES(blam);
For MariaDB replace STORED with PERSISTENT, indexes require persistence.
MySQL Generated Columns
MariaDB Virtual Columns
Change the DEFAULT NULL column to DEFAULT 0, which allows the UNIQUE KEY to be matched consistently. This has the negative side effect of overly complicating the development of queries against the summary table. It forces us to use a lot of "CASE filter_id = 0 THEN NULL ELSE filter_id END", and makes for awkward joining since all of the other tables have actual NULLs for the filter_id.
Create a view which returns "CASE filter_id = 0 THEN NULL ELSE filter_id END", and using this view instead of the table directly. The summary table contains a few hundred thousand rows, and I've been told view performance is quite poor.
View performance in MySQL 5.x will be fine, as the view does nothing but replace a zero with a null. Unless you use aggregates/sorts in a view, most any query against the view will be re-written by the query optimizer to just hit the underlying table.
And of course, since it's an FK, you'll have to create an entry in the referred-to table with an id of zero.
I'm more than a decade late, but I feel my solution should be an answer on here as I had this exact same problem, and this worked for me. If you know what's got to be updated, you can update them manually just before your existing summarization query, then ignore all cases where filter_id is null in your existing query so it won't get inserted as a record again.
For your example:
UPDATE `Summary` s
LEFT JOIN `Activity` a
ON s.`campaign_id` = a.`campaign_id`
SET s.`sessions` = a.COUNT(`session_id`) ,
SET s.`transactions` = a.COUNT(`transaction_id` IS NOT NULL)
WHERE s.`day` = a.`day`
AND s.`campaign_id` = a.`campaign_id`
AND s.`filter_id` IS NULL
AND a.`filter_id` IS NULL;
INSERT INTO `Summary`
(`day`, `campaign_id`, `filter_id`, `sessions`, `transactions`)
SELECT `day`, `campaign_id`, `filter_id`
, COUNT(`session_id`) AS `sessions`
, COUNT(`transaction_id` IS NOT NULL) AS `transactions`
FROM Activity
WHERE `filter_id` IS NOT NULL
GROUP BY `day`, `campaign_id`, `filter_id`
ON DUPLICATE KEY UPDATE
`sessions` = VALUES(`sessions`)
, `transactions` = VALUES(`transactions`);
Related
Suppose I have an append-only table:
CREATE TABLE IF NOT EXISTS `states` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) NOT NULL,
`start_date` date DEFAULT NULL,
`end_date` date DEFAULT NULL,
`person_id` int(10) unsigned default NULL,
PRIMARY KEY (`id`)
);
There is an index on name and another on person_id (person_id is a fkey reference to another table).
For each name, we store a mapping to person_id for a given date range. The mapping from name -> person_id is many to one (this is a contrived example, but think of it as storing how a person could change their name). We never want to delete history so when altering the mapping, we insert a new entry. The last entry for a given name is the source of truth. We end up wanting to ask two different types of questions on the dataset, for which I have some general questions.
What is the current mapping for a given name/list of names?
If there is only one name, the most straightforward query is:
select * from states where name = 'name' ORDER BY `id` DESC LIMIT 1;
If there is more than one name, the best way I could figure out is to do:
select * from states as a
left join states as b on a.name = b.name and a.id < b.id
where isnull(b.id);
Is this actually the best way to batch query? For a batch of 1, how much worse would the second query be than the first? Using explain, I can tell we end up doing two index lookups instead of 1. Given we care a lot about the performance of this individual lookup, my gut is to run different queries depending on the number of names we are querying for. I'd prefer if there was a way to defer to mysql's optimizer though. Is there a way to write this query so mysql figures out what to do for me?
What are the current mappings that map to person_id / a list of person_ids?
The way I would query for that is:
select * from states as a
left join states as b on a.name = b.name and a.id < b.id
where isnull(b.id) and person_id in person_id_list
I am slightly concerned about the performance for small lists though because my understanding of how mysql works is limited. Using explain, I know that mysql filters by person_id via the index on a before filtering by isnull(b.id). But does it do this before the join or after the join? Could we end up wasting a lot of time joining these two tables? How could I figure this out in general?
The code in (1) is "groupwise-max", but done in a very inefficient way. (Follow the tag I added for more discussion.)
May I suggest you have two tables; one that is append-only, like you have. Let's call this table History. Then have another table called Current. When you add a new entry, INSERT into History, but replace into Current.
If you do take this approach, consider what differences you might have in the two tables. The PRIMARY KEY will certainly be different; other indexes may be different, and even some columns may be different.
I am using SELECT...FOR UPDATE to enforce a unique key. My table looks like:
CREATE TABLE tblProductKeys (
pkKey varchar(100) DEFAULT NULL,
fkVendor varchar(50) DEFAULT NULL,
productType varchar(100) DEFAULT NULL,
productKey bigint(20) DEFAULT NULL,
UNIQUE KEY pkKey (pkKey,fkVendor,productType,productKey)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
So rows might look like:
{'Ace-Hammer','Ace','Hammer',121},
{'Ace-Hammer','Ace','Hammer',122},
...
{'Menards-Hammer','Menards','Hammer',121},
...
So note that 'Ace-Hammer' and 'Menards-Hammer' can have the same productKey, only the product+key combination needs to be unique. The requirement that it is an integer defined in this way is organizational, I don't think this is something I can do with auto_increment using innoDb, but hence the question.
So if a vendor creates a new version of an existing product, we give it a distinct key for that vendor/product combination (I realize the pkKey column is redundant in these examples).
My stored procedure is like:
CREATE PROCEDURE getNewKey(IN vkey varchar(50),vvendor varchar(50),vkeyType varchar(50)) BEGIN
start transaction;
set #newKey=(select max(productKey) from tblProductKeys where pkKey=vkey and fkVendor=vvendor and productType=vkeyType FOR UPDATE);
set #newKey=coalesce(#newKey,0);
set #newKey=#newKey+1;
insert into tblProductKeys values (vkey,vclient,vkeyType,#newKey);
commit;
select #newKey as keyMax;
END
That's all! During periods of heavy use, (1000s of users), I see:
Duplicate entry 'Ace-Hammer-Ace-Hammer-44613' for key 'pkKey'.
I can retry the transaction, but this is not an error I was expecting and I'd like to understand why it happens. I could understand the row locking causing deadlock but in this case it seems like the rows are not locked at all. I wonder if the issue is with max() in this context, or possibly the table index. This sproc is the only transaction that is performed on this table.
Any insight is appreciated. I have read several MySql/SO posts on the subject, most concerns and issues seem to be with over-locking or deadlocks. E.g. here: When using MySQL's FOR UPDATE locking, what is exactly locked?
To achieve "only the product+key combination needs to be unique", say
UNIQUE(pkKey, productKey)
in either order. Then, your 4-column UNIQUE is redundant. It could be turned into a plain INDEX if needed for some particular query.
Furthermore, you really ought to have a PRIMARY KEY. It may as well be
PRIMARY KEY(pkKey, productKey) -- in either order
and then get rid of my suggested UNIQUE key.
There is no good reason to make productKey depend on pkKey, if that is what you are thinking of. Instead, simply do
productKey INT UNSIGNED AUTO_INCREMENT
There needs to be at least INDEX(productKey).
Now, I am unclear on whether you need for the 'Menards' and 'Ace' hammers to both be number 121? Summary:
PRIMARY KEY(pkKey, productKey),
INDEX(productKey)
Case 1: Both need to be "121". You need some way to explicitly insert a new row with an existing auto-inc value. This is not a problem; you simply specify '121' instead of letting it acquire the next auto-inc value.
Case 2: There is no need for both to be "121". Then simply use the full force of AUTO_INCREMENT:
PRIMARY KEY(productKey)
But if you really like your SP, let's shorten it down to a single statement, even tossing the transaction:
BEGIN;
INSERT
INTO tblProductKeys
SELECT vkey, vclient, vkeyType,
#new_id := COALESCE(MAX(productKey) + 1, 0)
FROM tblProductKeys
WHERE pkKey = vkey
AND fkVendor = vvendor
AND productType = vkeyType;
END //
Now, you will need
INDEX(pkKey, fkVendor, productType, -- in any order
productKey) -- last
PRIMARY KEY(pkKey, productKey) -- in either order (as previously discussed)
Then use #new_id outside the SP.
I'm a little embarrassed but it is a pretty obvious problem. The issue is that 'FOR UPDATE' only locks the current row. So you can UPDATE it. But I am doing an INSERT! Not an update.
If 2 queries collide, the row is locked but after the transaction is complete the row is unlocked and it can be read. So you are still reading a stale value. To get the behavior I was expected, you'd need to lock the whole table.
So I think auto-increment would work for me, although I need a way to get the last_inserted_id so I need to be in the context of a procedure anyway (I am using c# driver).
I have a table with 2 foreign keys. I'm somewhat new to MySQL, can someone tell me which is the right way in applying an INDEX to tables?
# Sample 1
CREATE TABLE IF NOT EXISTS `my_table` (
`topic_id` INT UNSIGNED NOT NULL ,
`course_id` INT UNSIGNED NOT NULL ,
PRIMARY KEY (`topic_id`, `course_id`) ,
INDEX `topic_id_idx` (`topic_id` ASC) ,
INDEX `course_id_idx` (`course_id` ASC) )
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8
COLLATE = utf8_general_ci;
# Sample 2
CREATE TABLE IF NOT EXISTS `my_table` (
`topic_id` INT UNSIGNED NOT NULL ,
`course_id` INT UNSIGNED NOT NULL ,
PRIMARY KEY (`topic_id`, `course_id`) ,
INDEX `topic_id_idx` (`topic_id`, `course_id`) ,
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8
COLLATE = utf8_general_ci;
I guess what I'm really asking is what's the difference between defining both as separate indexes and the other as combined?
The reason why you might want one of these over the other has to do with how you plan on querying the data. Getting this determination right can be a bit of trick.
Think of the combined key in terms of, for example, looking up a student's folder in a filing cabinet, first by the student's last name, and then by their first name.
Now, in the case of the two single indexes in your example, you could imagine, in the student example, having two different sets of organized folders, one with every first name in order, and another with ever last name in order. In this case, you'll always have to work through the greatest amount of similar records, but that doesn't matter so much if you only have one name or the other anyway. In such a case, this arrangement gives you the greatest flexibility while still only maintaining indexes over two columns.
In contrast, if given both first and last name, it's a lot easier for us as humans to look up a student first by last name, then by first name (within a smaller set of potentials). However, when the last name is not known, it makes it very difficult to find the student by first name alone because the students with the same first-name are potential interleaved with every veration of last name (table scan). This is all true for the algorithms the computer uses to look up the information too.
So, as a rule of thumb, add the extra key to a single index if you are going to be filtering the data by both values at once. If at times you will have one and not the other, make sure which ever value that is, it's the leftmost key in the index. If the value could be either, you'll probably want both indexes (one of these could actually have both key for the best of both words, but even that comes at a cost in terms of writes). Getting this stuff right can be pretty important, as this often amounts to an all or nothing game. If all the data the dbms requires to preform the indexed lookup isn't present, it will probably resort to a table scan. Mysql's explain feature is one tool which can be helpful in checking your configuration and identifying optimizations.
if u create index by using one key, then when the data is searched it will find through only that key.
INDEX `topic_id_idx` (`topic_id` ASC) ,
INDEX `course_id_idx` (`course_id` ASC)
in this situation data is searched topic_id and course_id separately. but if you combine them data is searched combining them.
for a example if you have some data as follows :
topic_id course_id
----------
abc 1
pqr 2
abc 3
if you want to search abc - 3 if you put separate indexes then it will search these two columns separately and find the result.
but if you combine them then it will search abc+3 directly.
I have a "tasks" table with a priority column, which has a unique constraint.
I'm trying to swap the priority value of two rows, but I keep violating the constraint. I saw this statement somewhere in a similar situation, but it wasn't with MySQL.
UPDATE tasks
SET priority =
CASE
WHEN priority=2 THEN 3
WHEN priority=3 THEN 2
END
WHERE priority IN (2,3);
This will lead to the error:
Error Code: 1062. Duplicate entry '3' for key 'priority_UNIQUE'
Is it possible to accomplish this in MySQL without using bogus values and multiple queries?
EDIT:
Here's the table structure:
CREATE TABLE `tasks` (
`id` int(11) NOT NULL,
`name` varchar(200) DEFAULT NULL,
`priority` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `priority_UNIQUE` (`priority`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Is it possible to accomplish this in MySQL without using bogus values and multiple queries?
No. (none that I can think of).
The problem is how MySQL processes updates. MySQL (in difference with other DBMS that implement UPDATE properly), processes updates in a broken manner. It enforces checking of UNIQUE (and other) constraints after every single row update and not - as it should be doing - after the whole UPDATE statement completes. That's why you don't have this issue with (most) other DBMS.
For some updates (like increasing all or some ids, id=id+1), this can be solved by using - another non-standard feature - an ORDER BY in the update.
For swapping the values from two rows, that trick can't help. You'll have to use NULL or a bogus value (that doesn't exist but is allowed in your column) and 2 or 3 statements.
You could also temporarily remove the unique constraint but I don't think that's a good idea really.
So, if the unique column is a signed integer and there are no negative values, you can use 2 statements wrapped up in a transaction:
START TRANSACTION ;
UPDATE tasks
SET priority =
CASE
WHEN priority = 2 THEN -3
WHEN priority = 3 THEN -2
END
WHERE priority IN (2,3) ;
UPDATE tasks
SET priority = - priority
WHERE priority IN (-2,-3) ;
COMMIT ;
I bumped into the same issue. Had tried every possible single-statement query using CASE WHEN and TRANSACTION - no luck whatsoever. I came up with three alternative solutions. You need to decide which one makes more sense for your situation.
In my case, I'm processing a reorganized collection (array) of small objects returned from the front-end, new order is unpredictable (this is not a swap-two-items deal), and, on top of everything, change of order (usually made in English version) must propagate to 15 other languages.
1st method: Completely DELETE existing records and repopulate entire collection using the new data. Obviously this can work only if you're receiving from the front-end everything that you need to restore what you just deleted.
2st method: This solution is similar to using bogus values. In my situation, my reordered collection also includes original item position before it moved. Also, I had to preserve original index value in some way while UPDATEs are running. The trick was to manipulate bit-15 of the index column which is UNSIGNED SMALLINT in my case. If you have (signed) INT/SMALLINT data type you can just invert the value of the index instead of bitwise operations.
First UPDATE must run only once per call. This query raises 15th bit of the current index fields (I have unsigned smallint). Previous 14 bits still reflect original index value which is never going to come close to 32K range.
UPDATE *table* SET `index`=(`index` | 32768) WHERE *condition*;
Then iterate your collection extracting original and new index values, and UPDATE each record individually.
foreach( ... ) {
UPDATE *table* SET `index`=$newIndex WHERE *same_condition* AND `index`=($originalIndex | 32768);
}
This last UPDATE must also run only once per call. This query clears 15th bit of the index fields effectively restoring original index value for records where it hasn't changed, if any.
UPDATE *table* SET `index`=(`index` & 32767) WHERE *same_condition* AND `index` > 32767;
Third method would be to move relevant records into temporary table that doesn't have a primary key, UPDATE all indexes, then move all records back to first table.
Bogus value option:
Okay, so my query is similar and I've found a way to update in "one" query. My id column is PRIMARY and position is part of a UNIQUE group. This is my original query that doesn't work for swapping:
INSERT INTO `table` (`id`, `position`)
VALUES (1, 2), (2, 1)
ON DUPLICATE KEY UPDATE `position` = VALUES(`position`);
.. but position is an unsigned integer and it's never 0, so I changed the query to the following:
INSERT INTO `table` (`id`, `position`)
VALUES (2, 0), (1, 2), (2, 1)
ON DUPLICATE KEY UPDATE `position` = VALUES(`position`);
.. and now it works! Apparently, MYSQL processes the values groups in order.
Perhaps this would work for you (not tested and I know almost nothing about MYSQL):
UPDATE tasks
SET priority =
CASE
WHEN priority=3 THEN 0
WHEN priority=2 THEN 3
WHEN priority=0 THEN 2
END
WHERE priority IN (2,3,0);
Good luck.
Had a similar problem.
I wanted to swap 2 id's that were unique AND was a FK from an other table.
The fastest solution for me to swap two unique entries was:
Create a ghost entry in my FK table.
Go back to my table where I want to switch the id's.
Turned of the FK Check SET FOREIGN_KEY_CHECKS=0;
Set my first(A) id to the ghost(X) fk (free's A)
Set my second (B) id to A (free's B)
Set A to B (free's X)
Delete ghost record and turn checks back on. SET FOREIGN_KEY_CHECKS=1;
Not sure if this would violate the constraints, but I have been trying to do something similar and eventually came up with this query by combining a few of the answers I found:
UPDATE tasks as T1,tasks as T2 SET T1.priority=T2.priority,T2.priority=T1.priority WHERE (T1.task_id,T2.task_id)=($T1_id, $T2_id)
The column I was swapping did not use a unique, so I am unsure if this will help...
you can achieve swapping your values with your above mentioned update statement, with a slight change in your key indexes.
CREATE TABLE `tasks` ( `id` int(11) NOT NULL, `name` varchar(200) DEFAULT NULL, `priority` varchar(45) DEFAULT NULL, PRIMARY KEY (`id`,`priority`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
This will have a primary key index as a combination of id and priority. you cna then swap values.
UPDATE tasks
SET priority =
CASE
WHEN priority=2 THEN 3
WHEN priority=3 THEN 2
END
WHERE priority IN (2,3);
I dont see any need of user variables or temp variables here.
Hope this solves your issue :)
Scenario: WAMP server, InnoDB Table, auto-increment Unique ID field [INT(10)], 100+ concurrent SQL requests. VB.Net should also be used if needed.
My database has an auto-increment field wich is used to generate a unique ticket/protocol number for each new information stored (service order).
The issue is that this number must be reseted each year. I.e. it starts at 000001/12 on 01/01/2012 00:00:00 up to maximum 999999/12 and then at 01/01/2013 00:00:00 it must start over again to 000001/13.
Obviously it should easly acomplished using some type of algorithm, but, I'm trying to find a more efficient way to do that. Consider:
It must (?) use auto-increment since the database has some concurrency (+100).
Each ticket must be unique. So 000001 on 2012 is not equal to 000001 on 2013.
It must be automatic. (no human interaction needed to make the reset, or whatever)
It should be reasonably efficient. (a watch program should check the database daily (?) but it seems not the best solution since it will 'fail' 364 times to have success only once).
The 'best' approach I could think of is to store the ticket number using year, such as:
12000001 - 12999999 (it never should reach the 999.999, anyway)
and then an watch program should set the auto increment field to 13000000 at 01/01/2013.
Any Suggestions?
PS: Thanks for reading... ;)
So, for futher reference I've adopted the following sollution:
I do create n tables on the database (one for each year) with only one auto-increment field wich is responsible to generate the each year unique id.
So, new inserts are done into the corresponding table considering the event date. After that the algorithm takes the last_inseted_id() an store that value into the main table using the format 000001/12. (ticket/year)
That because each year must have it's own counter since an 2012 event would be inserted even when the current date is already 2013.
That way events should be retroactive, no reset is needed and it's simple to implement.
Sample code for insertion:
$eventdate="2012-11-30";
$eventyear="2012";
$sql = "INSERT INTO tbl$eventyear VALUES (NULL)";
mysql_query ($sql);
$sql = "LAST_INSERT_ID()";
$row = mysql_fetch_assoc(mysql_query($sql));
$eventID = $row(0)
$sql = "INSERT INTO tblMain VALUES ('$eventID/$eventYear', ... ";
mysql_query($sql)
MongoDB uses something very similar to this that encodes the date, process id and host that generated an id along with some random entropy to create UUIDs. Not something that fulfills your requirement of monotonic increase, but something interesting to look at for some ideas on approach.
If I were implementing it, I would create a simple ID broker server that would perform the logic processing on date and create a unique slug for the id like you described. As long as you know how it's constructed, you have native MySql equivalents to get your sorting/grouping working, and the representation serializes gracefully this will work. Something with a UTC datestring and then a monotonic serial appended as a string.
Twitter had some interesting insights into custom index design here as they implemented their custom ID server Snowflake.
The idea of a broker endpoint that generates UUIDs that are not just simple integers, but also contain some business logic is becoming more and more widespread.
You can set up a combined PRIMARY KEY for both of the two columns ID and YEAR; this way you would only have one ID per year.
MySQL:
CREATE TABLE IF NOT EXISTS `ticket` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`year` YEAR NOT NULL DEFAULT '2012',
`data` TEXT NOT NULL DEFAULT '',
PRIMARY KEY (`id`, `year`)
)
ENGINE = InnoDB DEFAULT CHARACTER SET = utf8 COLLATE = utf8_unicode_ci
UPDATE: (to the comment from #Paulo Bueno)
How to reset the auto-increment-value could be found in the MySQL documentation: mysql> ALTER TABLE ticket AUTO_INCREMENT = 1;.
If you also increase the default value of the year-column when resetting the auto-increment-value, you 'll have a continuous two-column primary key.
But I think you still need some sort of trigger-program to execute the reset. Maybe a yearly cron-job, which is launching a batch-script to do so on each first of January.
UPDATE 2:
OK, I've tested that right now and one can not set the auto-increment-value to a number lower than any existing ID in that specific column. My mistake – I thought it would work on combined primary keys…
INSERT INTO `ticket` (`id`, `year`, `data`) VALUES
(NULL , '2012', 'dtg htg het'),
-- some more rows in 2012
);
-- this works of course
ALTER TABLE `ticket` CHANGE `year` `year` YEAR( 4 ) NOT NULL DEFAULT '2013';
-- this does not reset the auto-increment
ALTER TABLE `ticket` AUTO_INCREMENT = 1;
INSERT INTO `ticket` (`id`, `year`, `data`) VALUES
(NULL , '2013', 'sadfadf asdf a'),
-- some more rows in 2013
);
-- this will result in continously counted ID's
UPDATE 3:
The MySQL-documentation page has a working example, which uses grouped primary keys on MyISAM table. They are using a table similar to the one above, but with reversed column-order, because one must not have auto-increment as first column. It seems this works only using MyISAM instead of InnoDB. If MyISAM still fits your needs, you don't need to reset the ID, but merely increase the year and still have a result as the one you've questioned for.
See: http://dev.mysql.com/doc/refman/5.0/en/example-auto-increment.html (second example, after "For MyISAM and BDB tables you can specify AUTO_INCREMENT on a secondary column in a multiple-column index.")