Database structure - most common queries span 3-4 tables. Should I reduce tables? - mysql

I am creating a new DB in MySQL for an application and wondered if anyone could provide some advice on the following set up. I'll try and simplify things as best as I can.
This DB is designed to store alerts which are related to specific items created by a user. In turn there is the need to store notes related to the items and/or alerts. At first I considered the following structure...
USERS table - to store basic app user info (e.g. user_id. name, email) - this is the only bit I'm fairly certain does not need to be changed
ITEMS table: contains info on particular item (4 fields or so). Contains user_id to indicate which user created/owns this item
ALERTS table: contains info on the alert, item_id to indicate which item the alert is related to, contains user_id to indicate which user created alert
NOTES table: contains note info, user_id of note owner, item_id if associated with an item, alert_id if associated with alert
Relationships:
An item does not always have an an alert associated with it
An item or alert does not always have a note associated with it
An alert is always associated with an item. More than one alert can be associated with the same item.
A note is always associated with an item or alert. More than one note can be associated with the same item or alert.
Once first created item info is unlikely to be updated by a user.
For arguments sake let's say that each user will create an average of 10 items, each item will have an average of 2 alerts associated with it. There will be an average of 2 notes per item/alert.
Very common queries that will be run:
1) Return all items created by a particular user with any associated alerts and notes. Given a user_id this query would span 3 tables
2) Checking each day for alerts that need to be sent to a user's email address. WHERE alert date==today, return user's email address, item name and any associated notes. This would require a query spanning 4 tables which is why I'm wondering if I need to take a different approach...
Option 1) one table to cover items, alerts and notes. user_id owner for each row. Every time you add a note to an item or alert you are repeating the alert and/or item info. Seems a bit wasteful but item and alert info won't be large.
Option 2) I don't foresee the need to query notes (famous last words?) so how about serializing note data so multiple notes are stored in one row in either the item or alert table (or just a combined alert/item table)
Option 3) Anything else you can think of? I'm asking this question as each option I've considered doesn't feel quite right.
I appreciate this is currently a small project and so performance shouldn't be of great concern and I should just go with the 4 tables. It's more that my common queries will end up being relatively complex that makes me think I need to re-evaluate the structure.

I would say that the common wisdom is to normalize to start and denormalize only when performance data suggest that it's necessary.
Make sure that your tables are indexed properly, with foreign key relationships for JOINs.
If you think you'll end up with a lot of data, this might be a good time to think about a partitioning strategy. Partitioning your fast-growing tables by time would be a good first step.

Four tables is not complex. I commonly write report queries that hit 15 or more tables in a database structure that has hundreds of tables (most with millions of records) and I wouldn't even say our dbs are anything more than medium sized (a typical db in our system might have around 200 gigs of data, so not large at all as databases go). Because they are properly indexed, they still run fast unless I am doing very complex calculations. Normalize, don't even consider denormalizing until you are an experienced database designer who knows better than to worry about the number of tables.

Related

Should I use more columns or more rows?

I need to create a table where each user (approx 60 atm) would have a defined task for each day. Right now the database have one column for each user with the task name in it (which is bad in my opinion as each new user would need to change the scheme of the table) and a "date" column.
A solution would be to have a "user" column and add a "task" column but that would mean there would be 60 (number of current users) rows per day.
I don't really know what's the best situation in this case.
Should I use more columns or more rows?
They're two completely different things, so this comparison doesn't make much sense...
Right now the database have one column for each user
Bad idea. Full stop. A user is a record of data, not a structural element of the database itself. For example, a table of users might contain columns like Username, Email, RegistrationDate, etc. It would not be a single row of data in which you add a column for each new user.
This would be a nightmare to maintain, would render things like Foreign Keys useless (and, honestly, render the entire concept of a relational database useless), would reach resource limits very quickly, etc.
Each record of information is a row, not a column (or table). In this case, each row in your table is a "User Task". It defines (or has a Feorign Key to) a User and defines (or has a Foreign Key to) a Task.
but that would mean there would be 60 (number of current users) rows per day
If the number of records in the table starts to become a problem, you can start looking into things like sharding and partitioning, archiving old data, etc. You've got time though, because "dozens of records per day" is sustainable for thousands of years. (And by then I imagine the hardware will be at least twice as good as it is today.)
Right now the database have one column for each user with the task
name in it (which is bad in my opinion as each new user would need > to change the scheme of the table)
You're right, this is very bad. Using one column for user, one for the task and one for the date, will be much better.
60 rows per day is not much. This means 21.900 rows per years and 219.000 rows in ten years. Mysql is able to handle millions of rows in a table
If you have two indexes, one for user and one for the date, searching for data will be fast enough.
Knowing nothing else about your database or schema, why not create a dimension table to store your users and fact table to track your task details?
That way you can more easily add new users and the tasks table would continue to grow as new facts are added. It would also be very easy to denormalize this model for query and/or reporting purposes.
Adding columns is a nuisance and can be slow. Instead have a table with columns (user, task, etc)
Even "60 rows per second" is not a problem. 600/second might be.
See the tag [pivot-table] for how to turn rows into columns for output display.

having trouble normalizing this database

Currently, I have 48 fields.
I'm completely new to access. This is how I decided to connect everything together.
It doesn't seem to be very effective. Could somebody help me understand how to normalize this database?
Should I try to put employee information in one table, job information in another table and then have an equipment lookup table?
The current job, last job, and previous job can all the SAME table. If you sort this table by descending job start date, then then you have current, last and previous. You thus don’t need nor want a separate table for each of these which really amounts to the concept of a “job”. If sorting by date is not enough, then you could add a column called Job Type (current, previous, etc.). Again, we still only using the one table.
The same goes for Equipment. You really don’t care if the limit is 3 last, or 300 last. By building a normalized table, then ONE form can edit all types and you save MASSIVE amounts of coding and building of tables, User interface software, and that of building quires to retrieve + show the last 3 jobs in a form.
The fact that your design with FAR LESS cost of development allows 3 or 300 last jobs is really moot. More important if some manager comes along and now wants you to save the last 4 jobs, you don’t have some massive re-design here. And you can on the fly add new job types. So in place of current, and say previous, you can also have un-completed, or failed jobs. So adding new business rules means again you don’t add a new type of job table, but only a “type” to the one column you already using to define the job as current or previous.
Identify like objects and make one table to store all of them. In your design you have three tables for equipment but each item of equipment has the same fields; they should be one table. Similarly for jobs, each job is pretty much the same; they should be one table. The same for departments.
Figure out one or more column in each table that can uniquely identify the row in the table (that is, if you know the values for those columns it is impossible for there ever to be two rows with those values). These are your primary keys for your tables.
Identify cases in which an item in one table needs to "point to" (refer to) an item in another table. In this case, make sure that the referring table has a set of columns that match the referred-to table.
When you've done that, you'll have the beginnings of a correctly factored relational database design.

Access query is duplicating unique records / Linked table issues

I hope someone can help me with this:
I have a simple query combining a list of names and basic details with another table containing more specific information. Some names will necessarily appear more than once and arbitrary distinctions like "John Smith 1" and "John Smith 2" are not an option, so I have been using an autonumber to keep the records distinct.
The problem is that my query is creating two records for each name that appears more than once. For example, there are two clients named 'Sophoan', each with a different id number, and the query has picked up each one twice resulting in four records (in total there are 122 records when there should only be 102). 'Unique values' is set to 'yes'.
I've researched as much as I can and am completely stuck. I've tried to tinker with sql but it always comes back with errors, I presume because there are too many fields in the query.
What am I missing? Or is a query the wrong approach and I need to find another way to combine my tables?
Project in detail: I'm building a database for a charity which has two main activities: social work and training. The database is to record their client information and the results of their interactions with clients (issues they asked for help with, results of training workshops etc.). Some clients will cross over between activities which the organisation wants to track, hence all registered clients go into one list and individual tables spin of that to collect data for each specific activity the client takes part in. This query is supposed to be my solution for combining these tables for data entry by the user.
At present I have the following tables:
AllList (master list of client names and basic contact info; 'Social Work Register' and 'Participant Register' join to this table by
'Name')
Social Work Register (list of social work clients with full details
of each case)
Social Work Follow-up Table (used when staff call social work clients
to see how their issue is progressing; the register has too many
columns to hold this as well; joined to Register by 'Client Name')
Participants Register (list of clients for training and details of
which workshops they were attended and why they were absent if they
missed a session)
Individual workshop tables x14 (each workshop includes a test and
these tables records the clients answers and their score for each
individual test; there will be more than 20 of these when the
database is finished; all joined to the 'Participants Register' by
'Participant Name')
Queries:
Participant Overview Query (links the attendance data from the 'Register' with the grading data from each Workshop to present a read-only
overview; this one seems to work perfectly)
Social Work Query (non-functional; intended to link the 'Client
Register' to the 'AllList' for data entry so that when a new client
is registered it creates a new record in both tables, with the
records matched together)
Participant Query (not yet attempted; as above, intended to link the
'Participant Register' to the 'AllList' for data entry)
BUT I realised that queries can't be used for data entry, so this approach seems to be a dead end. I have had some success with using subforms for data entry but I'm not sure if it's the best way.
So, what I'm basically hoping to achieve is a way to input the same data to two tables simultaneously (for new records) and have the resulting records matched together (for new entries to existing records). But it needs to be possible for the same name to appear more than once as a unique record (e.g. three individuals named John Smith).
[N.B. There are more tables that store secondary information but aren't relevant to the issue as they are not and will not be linked to any other tables.]
I realised that queries can't be used for data entry
Actually, non-complex queries are usually editable as long as the table whose data you want to edit remains 'at the core' of the query. Access applies a number of factors to determine if a query is editable or not.
Most of the time, it's fairly easy to figure out why a query has become non-editable.
Ask yourself the question: if I edit that data, how will Access ensure that exactly that data will be updated, without ambiguity?
If your tables have defined primary keys and these are part of your query, and if there are no grouping, calculated fields (fields that use some function to change or test the value of that field), or complex joins, then the query should remain editable.
You can read more about that here:
How to troubleshoot errors that may occur when you update data in Access queries and in Access forms
Dealing with Non-Updateable Microsoft Access Queries and the Use of Temporary Tables.
So, what I'm basically hoping to achieve is a way to input the same data to two tables simultaneously (for new records) and have the resulting records matched together (for new entries to existing records). But it needs to be possible for the same name to appear more than once as a unique record (e.g. three individuals named John Smith).
This remark actually proves that you have design issues in your database.
A basic tenet of Database Design is to remove redundancy as much as possible. One of the reasons is actually to avoid having to update the same data in multiple places.
Another remark: you are using the Client's name as a Natural Key. Frankly, it is not a very good idea. Generally, you want to make sure that what constitutes a Primary key for a table is reliably unique over time.
Using people's names is generally the wrong choice because:
people change name, for instance in many cultures, women change their family name after they get married.
There could also have been a typo when entering the name and now it can be hard to correct it if that data is used as a Foreign Key all in different tables.
as your database grows, you are likely to end up with some people having the same name, creating conflicts, or forcing the user to make changes to that name so it doesn't create a duplicate.
The best way to enforce uniqueness of records in a table is to use the default AutoNumber ID field proposed by Access when you create a new table. This is called a Surrogate key.
It's not mean to be edited, changed or even displayed to the user. It's sole purpose is to allow the primary key of a table to be unique and non-changing over time, so it can reliably be used as a way to reference a record from one table to another (if a table needs to refer to a particular record, it will contain a field that will hold that ID. That field is called a Foreign Key).
The names you have for your tables are not precise enough: think of each table as an Entity holding related data.
The fact that you have a table called AllList means that its purpose isn't that well-thought of; it sounds like a catch-all rather than a carefully crafted entity.
Instead, if this is your list of clients, then simply call it Client. Each record of that table holds the information for a single client (whether to use plural or singular is up to you, just stick to your choice though, being consistent is hugely important).
Instead of using the client's name as a key, create an ID field, an Autonumber, and set it as Primary Key.
Let's also rename the "Social Work Register", which holds the Client's cases, simply as ClientCase. That relationship seems clear from your description of the table but it's not clear in the table name itself (by the way, I know Access allows spaces in table and field names, but it's a really bad idea to use them if you care at least a little bit about the future of your work).
In that, create a ClientID Number field (a Foreign Key) that will hold the related Client's ID in the ClientCase table.
You don't talk about the relationship between a Client and its Cases. This is another area where you must be clear: how many cases can a single Client have?
At most 1 Case ? (0 or 1 Case)
exactly 1 Case?
at least one Case? (1 or more Cases)
any number of Cases? (0 or more Cases)
Knowing this is important for selecting the right type of JOIN in your queries. It's a crucial part of the design assumptions when building your database.
For instance, in the most general case, assuming that a Client can have 0 or more cases, you could have a report that displays the Client's Name and the number of cases related to them like this:
SELECT Client.Name,
Count(ClientCase.ID) AS CountOfCases
FROM Client
LEFT JOIN ClientCase
ON Client.ID = ClienCase.ClientID
GROUP BY Client.Name
You've described your basic design a bit more, but that's not enough. Show us the actual table structures and the SQL of the queries you tried. From the description you give, it's hard to really understand the actual details of the design and to tell you why it fails and how to make it work.

Indefinite number of tables vs indefinite number of row with multiple columns

Which one would be better (performance wise and maintenance), a database which creates table dynamically or just adding rows dynamically?
Suppose I am building a project in which I let users to register. Say I have a table which store only basic personal infos, like name, dob, Date of joining, address, phone, etc. Say 10 columns.
Now is the tricky part.
Scene 1: Creating multiple tables
When a user complete registration, a message table is created. So each table is created for each users. The rows of each message table varies for each user.
In the same way there is a cart table for each user like the message table.
For this scene 1, 2 tables are created with every registration.
Scene 2: Adding Rows
The scenario is same here as well, but in this case I have 2 tables for message and cart. Rows are added only when there is an activity.
Note:
You must assume that the number of users is more than 2000 and expect 50+ users to be active all the time. Which means the message and cart tables are always busy for both the cases. Like there is always a query for update, add, delete, insert, select etc. simultaneously.
Also which scene will consume more disk space.
While writing this, it make me wonder what technique would Facebook and others use. If they use the Scene 2 style (all users (billions) use the same big long message table)... Just wondering
Databases has some basic rules defined for Database Design called
"Database Normalization", These basic rules allow us eliminating
redundant data.
1st Normal Form
Store One piece of information in only One Column, A column should store only One piece of information.
2ns Normal Form
A Table should have only the columns that are related to each other. All the related columns should be in One table.
Now if you look at your advised design, A Separate Table for each USER
will split SAME information/Columns about all the user in 1000's of
tables. Which violates the 2nd Normal Form.
You need to Create One Table and put all the related Columns in that
one table for all the users. and you can make use of normal t-sql to
query your data but if you have a table for each user my guess is your
every query that you execute from your application will be built
dynamically and for every query you will be using dynamic sql. which
is one of the Sql Devils and you want to avoid using it whenever
possible.
My suggestion would be read more about Database Design. Once you have
some basic understanding of database design. Draw it on a piece of
paper and see if it provides you everything that your business
requires / expects from this application , Spend sometime on it now it
will save you a lot of pain later.

Database user table design, for specific scenario

I know this question has been asked and answered many times, and I've spent a decent amount of time reading through the following questions:
Database table structure for user settings
How to handle a few dozen flags in a database
Storing flags in a DB
How many database table columns are too many?
How many columns is too many columns?
The problem is that there seem to be a somewhat even distribution of supporters for a few classes of solutions:
Stick user settings in a single table as long as it's normalized
Split it into two tables that are 1 to 1, for example "users" and "user_settings"
Generalize it with some sort of key-value system
Stick setting flags in bitfield or other serialized form
So at the risk of asking a duplicate question, I'd like to describe my specific scenario, and hopefully get a more specific answer.
Currently my site has a single user table in mysql, with around 10-15 columns(id, name, email, password...)
I'd like to add a set of per-user settings for whether to send email alerts for different types of events (notify_if_user_follows_me, notify_if_user_messages_me, notify_when_friend_posts_new_stuff...)
I anticipate that in the future I'd be infrequently adding one off per-user settings which are mostly 1 to 1 with users.
I'm leaning towards creating a second user_settings table and stick "non-essential" information such as email notification settings there, for the sake of keeping the main user table more readable, but is very curious to hear what expects have to say.
Seems that your dilemma is to vertically partition the user table or not. You may want to read this SO Q/A too.
i'm gonna cast my vote for adding two tables... (some sota key-value system)
it is preferable (to me) to add data instead of columns... so,
add a new table that links users to settings, then add a table for the settings...
these things: notify_if_user_follows_me, notify_if_user_messages_me, notify_when_friend_posts_new_stuff. would then become row insertions with an id, and you can reference them at any time and extend them as needed without changing the schema.