I have a server which uses a MySQL database for storing users. I have an integer as the primary key. The integer is the primary key and with auto increment.
The problem is when registration fails (on the website provided by my server) the integer still increases by 1, which means: if a user succeeds in signing up, the user will get the id of one. Just as it should be. However, if a user then fails to register (the username already being taken for example), and then succeeds to register the user will get the id of 3.
I am looking for syntax like: UserId-- or UserId = Userid - 1.
Auto-increment values are not intended to be consecutive. They must be unique, that's all.
If you try to decrement the auto-increment value on error, you'll create a race condition in your app. For example:
Your session tries to register a user. Suppose this generates id 41.
A second session running at the same time also tries to register a user. This generates id 42.
Your session returns an error, because the username you tried to register already exists. So you try to force MySQL to decrement the auto-increment.
A third session registers another user, using id 41. The auto-increment increments so the next registration will use id 42.
The next session tries to register with id 42, but this id has already been used, resulting in mass hysteria, the stock market crashes, and dogs and cats start living together.
Lesson: Don't get obsessed with id's being consecutive. They're bound to have gaps from time to time. Either an insert fails, or else you roll back a transaction, or you delete a record, etc. There are even some bugs in MySQL, that cause auto-increment to skip numbers sometimes. But there's no actual consequence to those bugs.
The way to avoid the race condition described above is that auto-increment must not decrement, even if an insert fails. It only increases. This does result in "lost" values sometimes.
I helped a company in exactly the same situation you are in, where registration of new usernames caused errors if the username was already taken. In their case, we found that someone using their site was creating a new account every day, and they were automating it. They would try "user1" and get an error, then try "user2" and "user3" and so on, until they found one that was available. But it was causing auto-increment values to be discarded, and every day the gap became larger and larger. Eventually, they ran out of integers, because the gaps were losing 1500+ id values for each registration.
(The site was an online gambling site, and we guessed that some user was trying to create daily throwaway accounts, and they had scripted a loop to register user1 through userN, but they start over at 1 each day, not realizing the consequences for the database.)
I recommended to them this fix: Change the registration code in their app to SELECT for the username first. If it exists, then return a friendly warning to the user and ask them to choose another name to register. This is done without attempting the INSERT.
If the SELECT finds there is no username, then try the INSERT, but be ready to handle the error anyway, in case some other session "stole" that username in the moment between the SELECT and the INSERT. Hopefully this will be rare, because it's a slim chance for someone to sneak their registration in between those two steps.
In any case, do not feel obliged to have consecutive id values.
Related
I have a central database containing millions of IDs. And I have a group of users (50-100 users), all being able to request extraction of IDs from this big database.
Atm what I do is when a user sends a GET request, I SELECT 100 ids then update them with the flag USED and return the 100. The problem is, if I get too many requests at the same time, multiple users will receive the same ids (because I dont lock the db when doing select and then update)
If I lock the database my problem will be solved, but it will also be slower.
What other alternative I have?
Thanks!
Look ahead another step... What if a "user" gets 100 rows, then keels over dead. Do you have a way to release those 100 for someone else to work on?
You need an extra table to handle "check out" and "check in". Also, use that table to keep track of the "next" 100 to assign to a user.
When a user checks out the 100, a record of that is stored in the table, together with a timestamp and "who" checked them out. If they don't "check them back in within, say, an hour, then you assign that 100 to another user.
Back on something more mundane... How to pick 100. If there is an auto_increment id with no gaps, then use simple math to chunk up the list. If there are a lot of gaps, then use SELECT id FROM tbl WHERE id > $leftoff ORDER BY id LIMIT 100, 1 to get the end of the next 100.
If each user has their own key, you could pull from the millions of IDs starting from their key*10000. For example, user #9 would first get IDs #90000 to #90099, then #90100 to #90199 next time.
You could set the IDs as "Used" before they get sent back, so one user requesting IDs multiple times will never get duplicates. This needn't lock the database for other users.
If they don't request keys more than 100 times before the database can update, this should avoid collisions. You might need to add logic to allow users who request often not to run out, like by having a pool of IDs that can repopulate their supply, but that depends on particulars that aren't clear from the original question.
I am using a MySql table to store a session record for the current logged in user. Once the user logs off, I update few fields in the same record and flags(revoked) it that it should not be used again. So for every LogIn a new record is created. This serves my purpose, but it turns out that the table is going to grow huge.
What should be the standard approach for storing Sessions? Should the ones, which are revoked be stored in a separate table, or should they be deleted or left in the same table?
I consider leaving the data in the same session table. While querying for a particular record, I query with two fields : (idPeople (not unique) and revoked (0 or 1)), for example SELECT * FROM session WHERE idPeople = "someValue" AND revoked = 0. and then update the record if needed while the user is, logged in or kogging out. Will the huge size of table affect this? or MySql will handle this? And what are other ramifications for this which I am unable to see?
First, it may be a good idea to add a unique field to your table (e.g. SESSION_ID, which could be a running auto-increment number), define this field as a unique ID, and use it to quickly find the record to be updated (i.e. revoke=1).
Second, this type of table always triggers the question you are asking, and the best answer can only be given after you assess and answer some preliminary questions, for instance:
When you wish to check the activities of a user, how far into the past does it make sense to go? One month? One year?
What is the longest period that you may wish to keep this information available (even using non routine queries to retrieve?
What type of questions (queries) I expect to be asked on this table?
One you answer those questions, you can consider the following options:
Have a routine process that would run once a day (at midnight or any other time your system can afford it) which would delete rows whose timestamp is older than, say, one month (or any other period suiting your needs), OR
Same as above but would first copy those records to an "history" table,
Change the structure of your table to a more efficient one, by adding some fields (as suggested above) and indices that would provide good answers for your "SELECT" needs.
I want to make a login system. I want a confirmation by sending an activation code (clickable link) by email. I considered storing the activation key in separate table from the user information, since these are only relevant for non-activated users. When a user registers a row containing user information would be inserted in the users table and the activation key in the activation table. Once the link is clicked, I remove the record from the activation table.
But since I have no way of using innodb on my hosting, this is not fault-proof since I can't use transactions. I have 2 options.
Option A:
I keep the key in the activation table.
I store a boolean in the user table to check whether activation is necessary. If activation is needed and there is no record to be found in activation table, there can be a new attempt to add the record and resend an email to the user.
* more checks (php, in case no record was found)
* joins in selects for checking
* more inserts/deletes/updates
Option B:
Or I can store the activation key in the user table, having to use more space, that is not always used.
*Does unused storage always take up space using myisam?
*what is the recommended length for an activation key?
*Is the boolean still needed or can I set the activation key to NULL in order to check whether a user has been activated or not?
What is the best solution and why? Speed, space,....?
The heart of your question seems to relate to the moment of activation. You seem to be concerned that you'll get a transactional race condition around activation, and you won't be able to prevent it because you must use MyISAM instead of InnoDB.
This doesn't seem to be a critical problem. No harm will be done if a new user attempts to activate multiple times with the same correct token, or if she attempts to activate with an incorrect token at the same time as a correct token.
What is a critical success factor? The performance of a normal authentication operation (login) for an active user. If your query must join to a separate activation-token table for every users that logs in, that's not going to give you ideal performance. Neither is a query containing a clause like this:
AND user.activation_token IS NOT NULL
You may want to use a Boolean column value (a short integer) to indicate "activation pending" in your user table. If that column comes up true during normal login, you can invoke the extra, infrequently used, logic for activation. If, for example, you need to be able to accelerate this operation:
SELECT hashed_password, activation_pending
FROM user
WHERE username = ?
you can create a compound index on (username, hashed_password, activation_pending) and make that operation very efficient. Then when you successfully complete the pending activation (a relatively infrequent operation), you can do these two operations.
UPDATE user
SET activation_pending = 0
WHERE user = ?;
DELETE
FROM activation_token
WHERE user = ?;
Once activation_pending is set to zero, that's enough for the race condition: your logic won't look at your activation_token table.
varchar columns don't take much space if they contain zero-length strings. char columns do.
My application is generating the ID numbers when registering a new customer then inserting it into the customer table.
The method for generating the ID is by reading the last ID number then incrementing it by one then inserting it into the table.
The application will be used in a network environment with more than 30 users, so there is a possibility (probability?) for at least two users to read the same last ID number at the saving stage, which means both will get the same ID number.
Also I'm using transaction. I need a logical solution that I couldn't find on other sites.
Please reply with a description so I can understand it very well.
use an autoincrement, you can get the last id issued with the mysql_insert_id property.
If for some reason that's not doable, you can craete another table to hold the last id used, then you increment that in a transaction, and then use it as the key for your insert into the table. Got to be two transctions though, otherwise you'll have the same issue you have now. That can get messy and is an extra level of maintenance though. (reset your next id table to zero when ther are still some in teh related table and things go nipples up quick.
Short of putting an exclusive lock on the table during the insert operation (not even slightly recomended), your current solution just can't work.
Okay expanded answer based on leaving schema as it is.
Option 1 in pseudo code
StartTransaction
try
NextId = GetNextId(...)
AddRecord(NextID...)
commit transaction
catch Primary Key Violation
rollback transaction
Do the entire thing again
end
Obviously you could end up in an infinite loop here, unlikely but possible, probably run out of stack space first.
You could some how queue the requests and then attempt to process them, if successful remove from queue.
BUT make customerid an auto inc the entire problem dispappears.
It will still be the primary key, you just don't have to work out what it needs to be any more, in fact you don't supply it in the insert statement, mysql will just take care of it for you.
The only thing you have to remember is if you need the id that has been automatically created is to request it in one transaction.
So your insert query needs to be in the form
Insert SomeTable(SomeColumns) Values(SomeValues)
Select mysql_insert_id
or if multiple statements gets in the way wrap two statements in a start stransaction commit transaction pair.
Currently I am having a hard time deciding/weighing the pros/cons of tracking login information for a member website.
Currently
I have two tables, login_i and login_d.
login_i contains the member's id, password, last login datetime, and total count of logins. (member id is primary key and obviously unique so one row per member)
login_d contains a list of all login data in history which tracks each and every time a login occurs. It contains member's id, datetime of login, ip_address of login. This table's primary key is simply an auto-incremented INT field, really purposeless but need a primary and the only unique single field (an index on the otherhand is different but still not concerned).
In many ways I see these tables as being very similar but the benefit of having the latter is to view exactly when a member logged in, how many times, and which IP it came from. All of the information in login_i (last login and count) truthfully exists in login_d but in a more concise form without ever needing to calculate a COUNT(*) on the latter table.
Does anybody have advice on which method is preferred? Two tables will exist regardless but should I keep record of last_login and count in login_i at all if login_d exists?
added thought/question
good comment made below - what about also tracking login attempts based on a username/email/ip? Should this ALSO be stored in a table (a 3rd table I assume).
this is called denormalization.
you ideally would never denormalize.
it is sometimes done anyway to save on computationally expensive results - possibly like your total login count value.
the downside is that you may at some point get into a situation where the value in one table does not match the values in the other table(s). of course you will try your best to keep them properly up to date, but sometimes things happen. In this case, you will possibly generate bugs in application logic if they receive an incorrect value from one of the sources.
In this specific case, a count of logins is probably not that critical to the successful running of the app - so not a big risk - although you will still have the overhead of maintaining the value.
Do you often need last login and count? If Yes, then you should store it in login_i aswell. If it's rarely used then you can take your time process the query in the giant table of all logins instead of storing duplicated data.