I'm creating a file hosting service, but right now I am creating the account email activation part of registering. So I had to come up with a database structure.
And right now it's:
users
id
first_name
last_name
email
password
since
active
hash_activate
But I can do it like a relational database too:
users
id
first_name
last_name
email
password
since
activation
id
user_id
hash
active
What would be the best way to go about it? And why?
If every person has only one activation hash active at at time, then it's feasible to store it in same table with users.
However, one advantage of separating it is that users only have an activation hash for a brief period of time, so to keep the user records smaller, you could store the hashes in a separate table. Keeping the user records small keeps it more performant. In this case, you wouldn't have active column. You'd just delete inactive hashes.
If you do store the activation columns in the user table, just be sure to select the columns by name. E.g. in most cases, you'll want do this:
SELECT id, first_name, last_name, email, password
FROM users
Instead of:
SELECT *
FROM users
You'd only want to select the activation columns when you needed them.
The second would only be sensible if one user could have multiple activations. You don't say whether this is true or false, so I couldn't possibly advise you.
If activations are a temporary thing, or having a hash defines someone as active, then make them different. Otherwise, that really won't matter.
However, neither is necessarily more or less relational than the other, without much more information. If you put a unique constraint on the combination of values in each row, and set each column up with a NOT NULL constraint, your first one would be quite relational.
You use a relational design when correctness of data, over time, is as important, if not more important, than what the application does with that data, and/or when data structure correctness/consistency is critical to the correct operation of an application, but might not necessarily be guaranteed by the application's own operation.
Related
I know that there is not a "good" answer to my question, and it is opinion based. But since I am now learning those things on my own, I need advice.
I have a table on Mysql about "Customer". In this table there are columns referred to customer's info like name, surname, date of birth, address, and so on.
Each customer has his own credentials (username, password).
Now my question is: It is better to keep credentials in "customer" table, or it has sense to create a separate table, in order to guarantee the protection of these credentials, and also keep track of the changes of them along time, without wasting space repeating all the others customers' info?
You need to answer some questions about your data:
Do the columns change? People change names, addresses, and so on.
The credentials will change, at least the password.
What sort of history do you need?
My recommendation would be different sets of tables for different purposes:
One table that defines the customer id and whatever other immutable information there is (perhaps the date of becoming a customer and related information).
One or more tables with PII (personally identifiable information). You want to keep PII separate for regulatory and privacy reasons.
Tables for history. How you do this depends on your data model and what you need. A simple method is a single archive table per table in your data model. However, I might recommend type-2 tables (i.e. those having version effective and end dates).
Separate tables for credentials. These are even more sensitive than PII and you will want to control access.
Remember to never store clear-text passwords. And often you want to keep a history of passwords to prevent users from using the previous one.
It is better to create personal information in the person table and additional customer information in another table that has a relationship with the customer, and if you have any other information about the person in another table and link it to the table of persons.
I have research some question in stackoverflow, but what I want is for later query purpose, not for logging purose.
I have a project that needs to get value from certain moment.
For example
I have a user table
User:
id
name
address
Pet:
id
name
type
Adoption:
id
user_id
pet_id
Data:
User:
1, John, One Street
Pet:
1, Lucy, Cat
Adoption:
1, 1, 1
Let's say the user change address so it look like
User:
1, John, Another Street
And what I need is
What is the address(or other field) of the user when they adopt the pet.
What I am thinking of is always create a new row in same table(in this case user) and refer the new row to the previous row
User:
2, 1, John, Another Street ( where 1 is referring to the previous id / updated from)
1, NULL, John, One Street, deleted (NULL means this is newly created data)
The advantage of using this is, it's easy to query(I just query like usual
The downside is the table will be so huge to record every update. Is there any solution?
Thank you
This is what i do sometimes:
For any field that i need to track value changes, i design a separate changes table.
For example, for the address field that is a concept associated with the user entity and is not a direct property of the adoption entity, i define the table:
UserAddressChanges(UserID, Address, ChangeDateTime, ChangerPersonID)
This way, the changes data may be used in any other sub-system or system, independent of your current adoption use-case.
I use in-table change tracking for very simple tables like:
UniversityManagers(PersonID, AssignDateTime, AssignorPersonID)
For more complex tables with frequent changes (and usually, few refers to previous data) where i need full record logging, i separate the main table (of current records) and the log table which have extra fields such as LogID, ChangeDateTime, ChangerPersonID, ChangerIP, ...
There are different approaches to this.
Perhaps the simplest is to denormalize the data. If there is data you need at the point of adoption, include it as columns in the adoption table. This address is the "point-in-time" address.
This method is useful for simple things, but it does not scale well. And you have to pre-define the columns you want.
The next step is to create audit tables for all your tables, or at least all tables of interest. Every time a record changes in user, a new record is added into userAudit. Audit tables are usually maintained using triggers.
The advantage of audit tables is that they do not clutter the existing table (and logic). The same queries work on the existing tables.
Finally, you can just cave in and realize that your data model is overly simplified. You really have slowly changing dimensions. This data can be represented using version effective dates and version end dates for each row. The user table ends up looking like:
user_id name address version_eff_dt version_end_dt
Because user_id is no longer a primary key, you might want two tables users and userHistory, or something like that.
This is a "correct" representation of the data at any point in time. However, it usually requires restructuring queries because a single user appears multiple times in the table -- and user_id is no longer the primary key.
I hope someone can help me with this:
I have a simple query combining a list of names and basic details with another table containing more specific information. Some names will necessarily appear more than once and arbitrary distinctions like "John Smith 1" and "John Smith 2" are not an option, so I have been using an autonumber to keep the records distinct.
The problem is that my query is creating two records for each name that appears more than once. For example, there are two clients named 'Sophoan', each with a different id number, and the query has picked up each one twice resulting in four records (in total there are 122 records when there should only be 102). 'Unique values' is set to 'yes'.
I've researched as much as I can and am completely stuck. I've tried to tinker with sql but it always comes back with errors, I presume because there are too many fields in the query.
What am I missing? Or is a query the wrong approach and I need to find another way to combine my tables?
Project in detail: I'm building a database for a charity which has two main activities: social work and training. The database is to record their client information and the results of their interactions with clients (issues they asked for help with, results of training workshops etc.). Some clients will cross over between activities which the organisation wants to track, hence all registered clients go into one list and individual tables spin of that to collect data for each specific activity the client takes part in. This query is supposed to be my solution for combining these tables for data entry by the user.
At present I have the following tables:
AllList (master list of client names and basic contact info; 'Social Work Register' and 'Participant Register' join to this table by
'Name')
Social Work Register (list of social work clients with full details
of each case)
Social Work Follow-up Table (used when staff call social work clients
to see how their issue is progressing; the register has too many
columns to hold this as well; joined to Register by 'Client Name')
Participants Register (list of clients for training and details of
which workshops they were attended and why they were absent if they
missed a session)
Individual workshop tables x14 (each workshop includes a test and
these tables records the clients answers and their score for each
individual test; there will be more than 20 of these when the
database is finished; all joined to the 'Participants Register' by
'Participant Name')
Queries:
Participant Overview Query (links the attendance data from the 'Register' with the grading data from each Workshop to present a read-only
overview; this one seems to work perfectly)
Social Work Query (non-functional; intended to link the 'Client
Register' to the 'AllList' for data entry so that when a new client
is registered it creates a new record in both tables, with the
records matched together)
Participant Query (not yet attempted; as above, intended to link the
'Participant Register' to the 'AllList' for data entry)
BUT I realised that queries can't be used for data entry, so this approach seems to be a dead end. I have had some success with using subforms for data entry but I'm not sure if it's the best way.
So, what I'm basically hoping to achieve is a way to input the same data to two tables simultaneously (for new records) and have the resulting records matched together (for new entries to existing records). But it needs to be possible for the same name to appear more than once as a unique record (e.g. three individuals named John Smith).
[N.B. There are more tables that store secondary information but aren't relevant to the issue as they are not and will not be linked to any other tables.]
I realised that queries can't be used for data entry
Actually, non-complex queries are usually editable as long as the table whose data you want to edit remains 'at the core' of the query. Access applies a number of factors to determine if a query is editable or not.
Most of the time, it's fairly easy to figure out why a query has become non-editable.
Ask yourself the question: if I edit that data, how will Access ensure that exactly that data will be updated, without ambiguity?
If your tables have defined primary keys and these are part of your query, and if there are no grouping, calculated fields (fields that use some function to change or test the value of that field), or complex joins, then the query should remain editable.
You can read more about that here:
How to troubleshoot errors that may occur when you update data in Access queries and in Access forms
Dealing with Non-Updateable Microsoft Access Queries and the Use of Temporary Tables.
So, what I'm basically hoping to achieve is a way to input the same data to two tables simultaneously (for new records) and have the resulting records matched together (for new entries to existing records). But it needs to be possible for the same name to appear more than once as a unique record (e.g. three individuals named John Smith).
This remark actually proves that you have design issues in your database.
A basic tenet of Database Design is to remove redundancy as much as possible. One of the reasons is actually to avoid having to update the same data in multiple places.
Another remark: you are using the Client's name as a Natural Key. Frankly, it is not a very good idea. Generally, you want to make sure that what constitutes a Primary key for a table is reliably unique over time.
Using people's names is generally the wrong choice because:
people change name, for instance in many cultures, women change their family name after they get married.
There could also have been a typo when entering the name and now it can be hard to correct it if that data is used as a Foreign Key all in different tables.
as your database grows, you are likely to end up with some people having the same name, creating conflicts, or forcing the user to make changes to that name so it doesn't create a duplicate.
The best way to enforce uniqueness of records in a table is to use the default AutoNumber ID field proposed by Access when you create a new table. This is called a Surrogate key.
It's not mean to be edited, changed or even displayed to the user. It's sole purpose is to allow the primary key of a table to be unique and non-changing over time, so it can reliably be used as a way to reference a record from one table to another (if a table needs to refer to a particular record, it will contain a field that will hold that ID. That field is called a Foreign Key).
The names you have for your tables are not precise enough: think of each table as an Entity holding related data.
The fact that you have a table called AllList means that its purpose isn't that well-thought of; it sounds like a catch-all rather than a carefully crafted entity.
Instead, if this is your list of clients, then simply call it Client. Each record of that table holds the information for a single client (whether to use plural or singular is up to you, just stick to your choice though, being consistent is hugely important).
Instead of using the client's name as a key, create an ID field, an Autonumber, and set it as Primary Key.
Let's also rename the "Social Work Register", which holds the Client's cases, simply as ClientCase. That relationship seems clear from your description of the table but it's not clear in the table name itself (by the way, I know Access allows spaces in table and field names, but it's a really bad idea to use them if you care at least a little bit about the future of your work).
In that, create a ClientID Number field (a Foreign Key) that will hold the related Client's ID in the ClientCase table.
You don't talk about the relationship between a Client and its Cases. This is another area where you must be clear: how many cases can a single Client have?
At most 1 Case ? (0 or 1 Case)
exactly 1 Case?
at least one Case? (1 or more Cases)
any number of Cases? (0 or more Cases)
Knowing this is important for selecting the right type of JOIN in your queries. It's a crucial part of the design assumptions when building your database.
For instance, in the most general case, assuming that a Client can have 0 or more cases, you could have a report that displays the Client's Name and the number of cases related to them like this:
SELECT Client.Name,
Count(ClientCase.ID) AS CountOfCases
FROM Client
LEFT JOIN ClientCase
ON Client.ID = ClienCase.ClientID
GROUP BY Client.Name
You've described your basic design a bit more, but that's not enough. Show us the actual table structures and the SQL of the queries you tried. From the description you give, it's hard to really understand the actual details of the design and to tell you why it fails and how to make it work.
Currently I am having a hard time deciding/weighing the pros/cons of tracking login information for a member website.
Currently
I have two tables, login_i and login_d.
login_i contains the member's id, password, last login datetime, and total count of logins. (member id is primary key and obviously unique so one row per member)
login_d contains a list of all login data in history which tracks each and every time a login occurs. It contains member's id, datetime of login, ip_address of login. This table's primary key is simply an auto-incremented INT field, really purposeless but need a primary and the only unique single field (an index on the otherhand is different but still not concerned).
In many ways I see these tables as being very similar but the benefit of having the latter is to view exactly when a member logged in, how many times, and which IP it came from. All of the information in login_i (last login and count) truthfully exists in login_d but in a more concise form without ever needing to calculate a COUNT(*) on the latter table.
Does anybody have advice on which method is preferred? Two tables will exist regardless but should I keep record of last_login and count in login_i at all if login_d exists?
added thought/question
good comment made below - what about also tracking login attempts based on a username/email/ip? Should this ALSO be stored in a table (a 3rd table I assume).
this is called denormalization.
you ideally would never denormalize.
it is sometimes done anyway to save on computationally expensive results - possibly like your total login count value.
the downside is that you may at some point get into a situation where the value in one table does not match the values in the other table(s). of course you will try your best to keep them properly up to date, but sometimes things happen. In this case, you will possibly generate bugs in application logic if they receive an incorrect value from one of the sources.
In this specific case, a count of logins is probably not that critical to the successful running of the app - so not a big risk - although you will still have the overhead of maintaining the value.
Do you often need last login and count? If Yes, then you should store it in login_i aswell. If it's rarely used then you can take your time process the query in the giant table of all logins instead of storing duplicated data.
I am doing the design of a database, that will have eventually thousands of users. Each user has your profile and specific data associated.
In your opinion, it is best practice a table for id, username, activationLink and hash and another for address, age, photo, job, or it is best a unique table for all stuff?
thanks for your time
If:
All (or almost all) users have all data filled
Most of the time you query for all fields
then keep them in a single table, otherwies split them.
In your model, activationLink seems to be queried for only once per activation, so I'd move it into a separate table (which would allow deleting it after the account had been activated).
Address, age, photo and job are usually shown along with the username, so it would be better to merge them into a single table.
Don't allow your initial design to limit the ability (or just make it difficult) to expand your requirements in the future.
At the moment, a user may have one address so you might put it in the users table - what if you want them to be able to store "work" and "home" addresses in future, or a history of past addresses?
A user may only be allowed to have a single photo, but if you put it (or a URL for it) in users.photo, then you'd have to change your data structure to allow a user to have a history of profile photos
As Quassnoi mentions, there are performance implications for each of these decisions - more tables means more complexity, and more potential for slow queries. Don't create new tables for the sake of it, but consider your data model carefully as it quickly becomes very hard to change it.
Any values that are a strict 1-to-1 relationship with a user entity, and are unlikely to ever change and require a history for (date of birth is a good example) should go in the table with the core definition. Any potential 1-to-many relationships (even if they aren't right now) are good candidates for their own tables.