Please give me advice on what covering index should be created for the following query:
SELECT id
FROM user
WHERE email_address = '...' AND
hashed_password = SHA2(CONCAT(salt, '...'), 256))
Because salt is also a column in the table, I'm unsure which of these is correct:
INDEX (email_address, hashed_password, id)
INDEX (email_address, hashed_password, salt, id)
Then again, since email address returns one row, putting hashed_password in the index seems redundant.
You are correct, its a bit redundant.
As long as one of the indexes begins with email that is fine. You'll probably want a unique key on email alone to prevent duplicate email addresses being entered.
An innodb (and probably other engines) index will implicitly end in its primary key (assumed to be id) so there's never a need to explicitly add it.
While adding hashed_password, salt to the index will improve performance allowing the entire query to be served by the index, it does increase the index size slightly and I'm not sure you'll gain much from it, particularly as this just looks like a login.
As a general rule for index selection, you should look at all the queries that you will be running against that table and select an index using the smallest number of columns common to all of those queries.
A common choice might be the primary key (PK). In the case of a table of users, the PK is probably a good choice.
For example, if there is another use of the "user" table that, say, just returns the user's details (e.g. department, phone number, full name etc) that is used by a system administrator, would the system administrator have that user's password?
Answer, probably not. So I would be inclined in this case to just use the email_address - assuming that this is equivalent to the user ID.
Having said, that, I note that you are selecting "id" in your query. What is the id column? Is that the real user ID? i.e. is that the value they would normally enter when logging in / changing password? If so, then "id" is probably a better candidate for the index.
FWIW, as written, if the user gets the password (or the email_addr) wrong, then the query will return no records. If they get it right, assuming email_addr is unique in the table, then it will return one record. Is that what you were expecting?
Related
I have a users table, that contains many attributes like email, username, password, phone, etc.
I would like to save a new type of data (integer), let's call it "superpower", but only very few users will have it. the users table contains 10K+ records, while fewer than 10 users will have a superpower (for all others it will be null).
So my question is which of the following options is more correct and better in terms of performance:
add another column in the users table called "superpower", which will be null for almost all users
have a new table calles users_superpower, which will at most contains 10 records and will map users to superpowers.
some things i have thought about:
a. the first option seems wasteful of space, but it really just an ingeger...
b. the second option will require a left join every time i query the users...
c. will the answer change if "superpower" data was 5 columns, for example?
note: i'm using hibenate and mysql, if it changes the answer
This might be a matter of opinion. My viewpoint on this follows:
If superpower is an attribute of users and you are not in the habit of adding attributes, then you should add it as a column. 10,000*4 additional bytes is not very much overhead.
If superpower is just one attribute and you might add others, then I would suggest using JSON or another EAV table to store the value.
If superpower is really a new type of user with other attributes and dates and so on, then create another table. In this table, the primary key can be the user_id, making the joins between the tables even more efficient.
I would go with just adding a new boolean field in your user entity which keeps track of whether or not that user has superpowers.
Appreciate that adding a new table and linking it requires the creation of a foreign key in your current users table, and this key will be another column taking up space. So it doesn't really get around avoiding storage. If you just want a really small column to store whether a user has superpowers, you can use a boolean variable, which would map to a MySQL BIT(1) column. Because this is a fixed width column, NULL values would still take up a single bit of space, but this not a big storage concern most likely as compared to the rest of your table.
I have an application that authenticate with LDAP and returns a JWT with the sAMAccountname of the logged user.
This application have a MySQL database where I'd like to store the user in different tables (fields like createdBy, updatedBy, etc.) and I was wondering what is the correct way of handling this:
using the sAMAccount name as identifier (so the createdBy will be a VARCHAR(25))
using a link table to match the sAMAccountname with an autoincremented identifier
Normally I would choose the "id" way, it's faster and easier to read in my opinion, but I'm not really into linking users from LDAP dictionary and changing their id in my database, so honestly I would choose the first option.
What are the pro/cons of using a string as uid ? In my case it's likely to be only for statuses like updatedBy, cratedBy, deletedBy etc. so I won't have hardlinks between multiple tables using an user identifier.
I think you should create user table with a surrogate primary key (autoincrementing one) and make unique index on sAMAccount column.
Natural primary keys are good because they just naturally describe a record they point to. But the downsize of using them is that they consume too much space in the index. Index lookups / rebuilds are slower. Tables consume more space also.
I'd connect everything using an id as primary key.
ONe thing is that the sAMAccountName is not necessarilly unique. Think of a user changing her or his name. The sAMAccountName might then change but it's still the same user. When you connect everything via an ID you can change the sAMAccountName-field without breaking everything.
But that's just my 2 cent
I want to create a table with accounts. It should contain a AccountName (primary key), Password and an Email.
But should i put an identification number as the primary key in it or is the AccountName enough?
What are the standards, benefits, drawbacks?
If "AccountName" field is string I recomend you to create an AUTO INCREMENT ID, let's suppose that one day you have a lot of data in your table and you need to select 100,000 lines and order by all of them.
So if you have an INT AUTO_INCREMENT field to order it's nice.
It can helps you when you need to DELETE an item too, to indentity ID columns is faster and easy than VARCHAR fields.
My two cents would be to always have an id number as the primary key since it is independent of the actual business rules. What I mean is the only reason you have the option is that the Account Name today is going to be unique. But what if down the line that changes? If you have a separate field as the primary key, you won't run into any hassle down the line if your requirements change.
You must have a primary key. If it's an ID, a Name or an Email, it's your decision. You must have in consideration the data model and requirements.
An ID column is NOT required of course. But I have not seen any professional software that has no id column.
It is better to add an auto incremented primary key in order to have better performance on related tables. IDs are never changed once they are created. But account names, passwords or email addresses may be changed by the user.
Another point is that an integer occupies less memory than a string.
If you are using innoDB you will probably need indexes. So it is good to define a primary key, and make it numeric. the primary key should be sequential and small.
A couple of years ago, one of my past mistake was indexing tables over e-mail addresses. I was thinking that an e-mail address can only belong to one person. An email address (eg: x#c.com) holds at least 7 bytes. What if I had to relate 10 tables? It makes 70 bytes for a row only. So any new row would require 63 more bytes for vain.
You can assign primary to any column it's up to you but it is always better to assign auto increment Primary key to ID column.
In future, number of records increase then it will give problem if AccountName is same.
I've got the following situation: I want to store data, which represents, if a user is following another user. Another table, which I cannot touch, stores the users, where the username is the primary key (unfortunatly no id...).
The fact is, if one user follows another one, it doesn't mean, that the other one is following the first one.
Right now, I designed the table with two varchar's (128) and a UNIQUE INDEX on these two varchar's which represent the usernames.
The problem is, that I need to parse some old-styled system now, and I finished like 15% and I've got 550k entries on this table already.
The index is bigger then 16MB, and the data just 14MB.
What could I do, to save this data in a better way? As said, I cannot use id's instead of the usernames, because the user-table uses the username as primary key.
As you have noticed, creating a seperate index on all columns essentially forces MySQL to duplicate all data in the index.
Instead of creating a seperate unique index, you can create a primary key consisting of both of your fields. MySQL uses the primary key as a clustered index making sure your uniqueness constraint is still satisfied without increasing the size of your database.
You might consider building your own index table that contains ID > username.
You could then use the ID's to map the followers.
This will cause for some extra overhead if you want to retrieve all the data.
I have a MySQL table that records classified listings. We don't force users to join to post a listing, and therefore the listing will not always have a user_id associated with it.
I therefore need a method of recording the poster's email if they are not signed in.
Is it bad practice to create a column email that will sometimes be blank and sometimes be filled?
Or is there a better way to go about this that I don't realize?
Is it bad practice to create a column
email that will sometimes be blank and
sometimes be filled?
It is not a bad practice, no : juste use a NULL column -- that's why they exist ;-)
See 12.1.17. CREATE TABLE Syntax : in the column_definition part of the create table query, you can specify NULL or NOT NULL.
BTW: Using NULL, which literally means "no value" is better than using some kind of "impossible value", like an empty string : NULL really means "no value", and make your point obvious -- while an empty string could mean an error in your code.
And I don't really see another "logical" way, actually...
Note, though, that you'll have to handle a NULL value for the email, in your application's code, of course ;-)
this is exactly what NULL is for. but you already knew that because your user_id column will also sometimes be NULL, right?
I think the approach you have laid out is perfectly acceptable. As longneck points out, thats what NULL is for in SQL databases.
However, if you're truly concerned about it, you could save space (possibly a significant amount, depending on the column type and number of rows) if you use the user_id column for the userid and the email address, and then have another boolean column, say is_email to distinguish which type of value is stored in the user_id column. This may simplify things for you because it is likely that your application does not care, in many places, whether the data is actually a user_id or an email address.
I have a MySQL table that records classified listings. We don't force users to join to post a listing, and therefore the listing will not always have a user_id associated with it.
I therefore need a method of recording the poster's email if they are not signed in.
What is the business key of your user entity? Or, more directly: what is your user entity? Is every distinct email address a key for a User entity with some users having registered and their email set in some profile, and others not registered and giving an email address every time they post? Or do you have two distinct entities, RegisteredUser and UnknownPosterWithEmailAddress, with their attributes stored in separate places?
In the latter case, you would use a NULLable user_id and a NULLable email field, like you suggested, but then queries like "for a given post, find the email address the reply should be sent to" are going to be awkward, e.g. a list of all post with their respective reply addresses will look like this:
select post.id,
case when post.user_id is not null then user.email
else post.email end as email
from post
left join user on user.id=post.user_id;
This can get real messy after a while.
I'd rather use the former approach: each row in User is a dsitinct poster, with an non-NULLable unique email address, and a surrogate key as foreign key in posts:
create table user(id integer primary key,
email text not null unique,
is_registered boolean default false);
create table post(id integer primary key,
user_id integer not null references user(id),
content text);
If a non-registered user enters an email address, you look it up in the user table, and retrieve the user.id, adding a new entry in user if necessary. As a result, you can answer questions like: for a given email address, how many posts has this user made in the past week? via the foreign key field, without having to compare strings in some NULLable attribute field.
When a user chooses to register, you can add the registration data either in user itself or in some separate table (again with user.id as a foreign key, some might argue that a boolean field is_registered is actually redundant then). Added benefits:
If he has posted before under the same email address, now all of his old posts become associated with his new registered identity automatically.
If the user changes his email address in his profile, all replies to older posts of his "see" the new updated email address.