Preventing duplicate data with locking or transactions? - mysql

In our application, when a user creates an order we get the next order # as follows:
SELECT MAX(CAST(REPLACE(orderNum, 'SO', '') AS SIGNED)) + 1 FROM orders
The problem is that because the customer is getting busier, we are starting to see orders that are created at exactly the same time which results in duplicate order #'s.
What is the best way to handle this? Should we lock the whole orders table or just the row? Or should we be doing transactions?

You could create an extra table with just one field which has the auto_increment attribute.
Now every time you need a new order number you call a function that will create a field in this table and return the result of last_insert_id(), which you then can use as an order number (just make sure to set the auto_increment counter of the table higher than your greatest order number).

Related

Count(*) Vs. Max(Id)

If have a table where I do bulk imports from CSV files.
First column is the Id field with autoincrement.
What bothers me is:
When I do a
Select count(*)
And a
Select max(Id)
I get different values. I would have expected those to be identical ?
What am I missing ?
If you insert 10 rows, delete 5, then insert 10 more then your COUNT(*) will not match MAX(id).
You can also insert an id way ahead of where it should be, like in an empty table INSERT ... (id) VALUES (9000000) will kick up your MAX(id) significantly despite having only 1 row.
Rolled-back transactions can also interfere with this.
If you want to know the next increment, check the AUTO_INCREMENT value, but be aware that this is only a guess, the actual value used may differ by the time you actually get around to inserting.
If you want them to match then you need to:
Start with a table where AUTO_INCREMENT=1, as in it's either brand new or has been cleared with TRUNCATE.
Insert using auto-generated id values as one transaction, or as a series of transactions where all of them have been fully committed.

delete rows and return an id of the deleted rows

is it possible to delete a row and return a value of the deleted row?
example
DELETE FROM table where time <= -1 week
SELECT all id that were deleted
If you want to control the operations in the database, you could consider to use JOURNAL tables. There's a question here in SO about this.
They are a mirrored table, usually populated by a trigger with the operation performed (update, delete). It stores the "old values" (the current values you can always get from the main table).
If implemented so, you could then SELECT from the journal table and have exactly what you needed.
Trying to give you an example:
Table USER
CREATE TABLE USER (
INT id,
VARCHAR name
)
Table USER_JN
CREATE TABLE USER_JN (
INT id,
VARCHAR name,
VARCHAR operation
)
Then, for every operation you can populate the USER_JN and have a history of all changes (you should not have constraints in it).
If you delete, your operation column would have the delete value and you could use your select to check that.
It's not exactly "selecting the deleted row", but a way to make it possible.
Hope it's somehow useful.
SELECT id FROM table WHERE time <= -1 week
and then simply
DELETE FROM table WHERE time <= -1 week
I would not search non indexed column twice. You should use a variable like:
SELECT id INTO #tID FROM table WHERE time <= -1 week;
DELETE FROM table WHERE id = #tID
You may then use the variable #tID as you wish.

How to ensure SELECTing new records doesn't miss any records?

I have a table mytable that is continuously being updated with new records.
I'm trying to get the most recent records using the method below ([lastId] is largest id of the previous select):
SELECT *
FROM mytable
WHERE id > [lastId]
ORDER BY id DESC
But I believe ids (and timestamps if I use it) are not necessarily inserted in order so there is a (small) possibility of missing some records. How can I get around this?
--EDIT--
The id field is AUTOINCREMENT.
Can you add a timestamp to the table and let the process that inserts records insert with current time?

MySQL - How do I efficiently get the row with the lowest ID?

Is there a faster way to update the oldest row of a MySQL table that matches a certain condition than using ORDER BY id LIMIT 1 as in the following query?
UPDATE mytable SET field1 = '1' WHERE field1 = 0 ORDER BY id LIMIT 1;
Note:
Assume the primary key is id and there is also a index on field1.
We are updating a single row.
We are not updating strictly the oldest row, we are updating the oldest row that matches a condition.
We want to update the oldest matching row, i.e the lowest id, i.e. the head of the FIFO queue.
Questions:
Is the ORDER BY id necessary? How does MySQL order by default?
Real world example
We have a DB table being used for a email queue. Rows are added when we want to queue emails to send to our users. Rows are removed by a cron job, run each minute, processing as many as possible in that minute and sending 1 email per row.
We plan to ditch this approach and use something like Gearman or Resque to process our email queue. But in the meantime I have a question on how we can efficiently mark the oldest item of the queue for processing, a.k.a. The row with the lowest ID. This query does the job:
mysql_query("UPDATE email_queue SET processingID = '1' WHERE processingID = 0 ORDER BY id LIMIT 1");
However, it is appearing in the mysql slow log a lot due to scaling issues. The query can take more than 10s when the table has 500,000 rows. The problem is that this table has grown massively since it was first introduced and now sometimes has half a million rows and a overhead of 133.9 MiB. For example we INSERT 6000 new rows perhaps 180 times a day and DELETE roughly the same number.
To stop the query appearing in the slow log we removed the ORDER BY id to stop a massive sort of the whole table. i.e.
mysql_query("UPDATE email_queue SET processingID = '1' WHERE processingID = 0 LIMIT 1");
... but the new query no longer always gets the row with the lowest id (although it often does). Is there a more efficient way of getting the row with the lowest id other than using ORDER BY id ?
For reference, this is the structure of the email queue table:
CREATE TABLE IF NOT EXISTS `email_queue` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`time_queued` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Time when item was queued',
`mem_id` int(10) NOT NULL,
`email` varchar(150) NOT NULL,
`processingID` int(2) NOT NULL COMMENT 'Indicate if row is being processed',
PRIMARY KEY (`id`),
KEY `processingID` (`processingID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Give this a read:
ORDER BY … LIMIT Performance Optimization
sounds like you have other processes locking the table preventing your update completing in a timely manner - have you considered using innodb ?
I think the 'slow part' comes from
WHERE processingID = 0
It's slow because it's not indexed. But, indexing this column (IMHO) seems incorrect too.
The idea is to change above query to something like :
WHERE id = 0
Which theoretically will be faster since it uses index.
How about creating another table which contains ids of rows which hasn't been processed? Hence the insertion works twice. First to insert to the real table and the second is to insert id into 'table of hasn't processed'. The processing part too, needs to double its duty. First to retrieve an id from 'table of hasn't been processed' then delete it. The second job of processing part is to process of course.
Of course, the id column in 'table of hasn't been processed' needs to index its content. Just to ensure that selecting and deleting will be faster.
This question is old, but for reference for anyone ending up here:
You have a condition on processingID (WHERE processingID = 0), and within that constraint you want to order by ID.
What's happening with your current query is that it scans the table from the lowest ID to the greatest, stopping when it finds 1 record matching the condition. Presumably, it will first find a ton of old records, scanning almost the entire table until it finds an unprocessed one near the end.
How do we improve this?
Consider that you have an index on processingID. Technically, the primary key is always appended (which is how the index can "point" to anything in the first place). So you really have an index on processingID, id. That means ordering on that will be fast.
Change your ordering to: ORDER BY processingID, id
Since you have fixed processingID to a single value with you WHERE clause, this does not change the resulting order. However, it does make it easy for the database to apply both your condition and your ordering, without scanning any records that do not match.
One funny thing is that MySQL, by default, returns rows orderd by ID, instead in a casual way as stated in the relational theory (I am not sure if this behaviour is changed in the latest versions). So, the last row you get from a select should be the last inserted row. I would not use this way, of course.
As you said, the best solution is to use something like Resque, or RabbitMQ & co.
You could use an in-memory table, that is volatile, but much faster, than store, there the latest ID, or just use a my_isam table to add persistency. It is simple and fast in performance and it takes a little bit to implement.

Index counter shared by multiple tables in mysql

I have two tables, each one has a primary ID column as key. I want the two tables to share one increasing key counter.
For example, when the two tables are empty, and counter = 1. When record A is about to be inserted to table 1, its ID will be 1 and the counter will be increased to 2. When record B is about to be inserted to table 2, its ID will be 2 and the counter will be increased to 3. When record C is about to be inserted to table 1 again, its ID will be 3 and so on.
I am using PHP as the outside language. Now I have two options:
Keep the counter in the database as a single-row-single-column table. But every time I add things to table A or B, I need to update this counter table.
I can keep the counter as a global variable in PHP. But then I need to initialize the counter from the maximum key of the two tables at the start of apache, which I have no idea how to do.
Any suggestion for this?
The background is, I want to display a mix of records from the two tables in either ASC or DESC order of the creation time of the records. Furthermore, the records will be displayed in page-style, say, 50 records per page. Records are only added to the database rather than being removed. Following my above implementation, I can just perform a "select ... where key between 1 and 50" from two tables and merge the select datasets together, sort the 50 records according to IDs and display them.
Is there any other idea of implementing this requirement?
Thank you very much
Well, you will gain next to nothing with this setup; if you just keep the datetime of the insert you can easily do
SELECT * FROM
(
SELECT columnA, columnB, inserttime
FROM table1
UNION ALL
SELECT columnA, columnB, inserttime
FROM table2
)
ORDER BY inserttime
LIMIT 1, 50
And it will perform decently.
Alternatively (if chasing last drop of preformance), if you are merging the results it can be an indicator to merge the tables (why have two tables anyway if you are merging the results).
Or do it as SQL subclass (then you can have one table maintain IDs and other common attributes, and the other two reference the common ID sequence as foreign key).
if you need creatin time wont it be easier to add a timestamp field to your db and sort them according to that field?
i believe using ids as a refrence of creation is bad practice.
If you really must do this, there is a way. Create a one-row, one-column table to hold the last-used row number, and set it to zero. On each of your two data tables, create an AFTER INSERT trigger to read that table, increment it, and set the newly-inserted row number to that value. I can't remember the exact syntax because I haven't created a trigger for years; see here http://dev.mysql.com/doc/refman/5.0/en/triggers.html