I recently took over a website where people can register to help tutor kids. Part of the user's details is which areas they could work, represented by postal codes. The problem is, my predecessor designed the site such that in the database there is a Boolean column for every postal code. As such, the user table has almost 270 columns and can be quite slow at times (plus it's a nightmare to administer).
Most users select only a few postal codes so there is surely a better way to do it. I was thinking about a varchar that could save the selected areas comma separated, e.g. 6043,8811,1234
Any advice from somebody who's had the same problem?
both your predecessor's and your solution are... strange.
You should simply have a relationship table between user and localities (assuming you have a locality table, with a postalCode field and a surrogate key (id)).
UserLocality(userId int, localityId int)
so a locality could have many user, and a user could have many localities.
Coma separated fields is a really bad idea, when query time comes.
You should throw that entire idea out of your head and look into properly normalized data.
A possible solution to this problem would be a table for tutors, which has an id column to uniquely identify one tutor.
Then you would have a table for just Postal Codes (each with unique ids as well) and finally a tutor_availability table that creates one record of (t_id, pc_id) for each postal code a tutor wishes to offer their services, again with a unique id to avoid duplication risks in the case they can select the same location twice.
Related
quick question.
consider the following table (UK):
CustomerID (PK)
First Name
Surname
House_No/name
street
City
Postcode
Would you split off address into another table?
basic business assumption is that a customer cannot have more than one address.
originally i seperated this off to look something like this:
Customer Table
CustomerID (PK)
FirstName
Surname
AddressID (FK)
Address Table
AddressID(PK)
Postcode(FK)
House_Number_name
Postcode Table:
Postcode (PK)
StreetName
CityID(FK)
City Table
CityID (PK)
CityName
unless i have my assumptions wrong that a postcode uniquely identifies a streetname and city is this not in 3NF?
personally, i would put address in another table, and link them together.
the business assumption/rule may change and when you split these things you have the best chance of accommodating any possible business rule without a major redo.
for instance - oops, the customer has a different billing address than their shipping address, or oops, we need to know where something actually shipped last year even though the customer changed their address for this year, etc.
basic business assumption is that a customer cannot have more than
one address.
If this is an actual rule and not an assumption, I'd just keep them in the one table.
However, assume puts the 'ass' in 'u' and 'me'.
So play safe and sperate the address into another table.
But it looks like you are taking normalisation too far with that from your eample.
Yes, I would split off the address into a separate table.
However, the reason is not normalization per se (under most circumstances). The primary reason is that it is a slowly changing dimension and it might be useful to look up a previous addresses.
Whether you go ahead an normalize things like postal code is a matter of taste. In a more "amateur" database, I don't think it is necessary. However, for a large database of real customers, I would be inclined to split it off. It helps ensure that the postal codes are accurate. Also, they change over time. And, you might be purchasing additional information at the postal code level, for instance.
It all depends to your requirements, but as you mentioned above customer can't have more than one address so there's no need to another one to one relationship because you can put it in the same relation. But I suggest you break it into another one to many relationship because of future requirements.
I hope someone can help me with this:
I have a simple query combining a list of names and basic details with another table containing more specific information. Some names will necessarily appear more than once and arbitrary distinctions like "John Smith 1" and "John Smith 2" are not an option, so I have been using an autonumber to keep the records distinct.
The problem is that my query is creating two records for each name that appears more than once. For example, there are two clients named 'Sophoan', each with a different id number, and the query has picked up each one twice resulting in four records (in total there are 122 records when there should only be 102). 'Unique values' is set to 'yes'.
I've researched as much as I can and am completely stuck. I've tried to tinker with sql but it always comes back with errors, I presume because there are too many fields in the query.
What am I missing? Or is a query the wrong approach and I need to find another way to combine my tables?
Project in detail: I'm building a database for a charity which has two main activities: social work and training. The database is to record their client information and the results of their interactions with clients (issues they asked for help with, results of training workshops etc.). Some clients will cross over between activities which the organisation wants to track, hence all registered clients go into one list and individual tables spin of that to collect data for each specific activity the client takes part in. This query is supposed to be my solution for combining these tables for data entry by the user.
At present I have the following tables:
AllList (master list of client names and basic contact info; 'Social Work Register' and 'Participant Register' join to this table by
'Name')
Social Work Register (list of social work clients with full details
of each case)
Social Work Follow-up Table (used when staff call social work clients
to see how their issue is progressing; the register has too many
columns to hold this as well; joined to Register by 'Client Name')
Participants Register (list of clients for training and details of
which workshops they were attended and why they were absent if they
missed a session)
Individual workshop tables x14 (each workshop includes a test and
these tables records the clients answers and their score for each
individual test; there will be more than 20 of these when the
database is finished; all joined to the 'Participants Register' by
'Participant Name')
Queries:
Participant Overview Query (links the attendance data from the 'Register' with the grading data from each Workshop to present a read-only
overview; this one seems to work perfectly)
Social Work Query (non-functional; intended to link the 'Client
Register' to the 'AllList' for data entry so that when a new client
is registered it creates a new record in both tables, with the
records matched together)
Participant Query (not yet attempted; as above, intended to link the
'Participant Register' to the 'AllList' for data entry)
BUT I realised that queries can't be used for data entry, so this approach seems to be a dead end. I have had some success with using subforms for data entry but I'm not sure if it's the best way.
So, what I'm basically hoping to achieve is a way to input the same data to two tables simultaneously (for new records) and have the resulting records matched together (for new entries to existing records). But it needs to be possible for the same name to appear more than once as a unique record (e.g. three individuals named John Smith).
[N.B. There are more tables that store secondary information but aren't relevant to the issue as they are not and will not be linked to any other tables.]
I realised that queries can't be used for data entry
Actually, non-complex queries are usually editable as long as the table whose data you want to edit remains 'at the core' of the query. Access applies a number of factors to determine if a query is editable or not.
Most of the time, it's fairly easy to figure out why a query has become non-editable.
Ask yourself the question: if I edit that data, how will Access ensure that exactly that data will be updated, without ambiguity?
If your tables have defined primary keys and these are part of your query, and if there are no grouping, calculated fields (fields that use some function to change or test the value of that field), or complex joins, then the query should remain editable.
You can read more about that here:
How to troubleshoot errors that may occur when you update data in Access queries and in Access forms
Dealing with Non-Updateable Microsoft Access Queries and the Use of Temporary Tables.
So, what I'm basically hoping to achieve is a way to input the same data to two tables simultaneously (for new records) and have the resulting records matched together (for new entries to existing records). But it needs to be possible for the same name to appear more than once as a unique record (e.g. three individuals named John Smith).
This remark actually proves that you have design issues in your database.
A basic tenet of Database Design is to remove redundancy as much as possible. One of the reasons is actually to avoid having to update the same data in multiple places.
Another remark: you are using the Client's name as a Natural Key. Frankly, it is not a very good idea. Generally, you want to make sure that what constitutes a Primary key for a table is reliably unique over time.
Using people's names is generally the wrong choice because:
people change name, for instance in many cultures, women change their family name after they get married.
There could also have been a typo when entering the name and now it can be hard to correct it if that data is used as a Foreign Key all in different tables.
as your database grows, you are likely to end up with some people having the same name, creating conflicts, or forcing the user to make changes to that name so it doesn't create a duplicate.
The best way to enforce uniqueness of records in a table is to use the default AutoNumber ID field proposed by Access when you create a new table. This is called a Surrogate key.
It's not mean to be edited, changed or even displayed to the user. It's sole purpose is to allow the primary key of a table to be unique and non-changing over time, so it can reliably be used as a way to reference a record from one table to another (if a table needs to refer to a particular record, it will contain a field that will hold that ID. That field is called a Foreign Key).
The names you have for your tables are not precise enough: think of each table as an Entity holding related data.
The fact that you have a table called AllList means that its purpose isn't that well-thought of; it sounds like a catch-all rather than a carefully crafted entity.
Instead, if this is your list of clients, then simply call it Client. Each record of that table holds the information for a single client (whether to use plural or singular is up to you, just stick to your choice though, being consistent is hugely important).
Instead of using the client's name as a key, create an ID field, an Autonumber, and set it as Primary Key.
Let's also rename the "Social Work Register", which holds the Client's cases, simply as ClientCase. That relationship seems clear from your description of the table but it's not clear in the table name itself (by the way, I know Access allows spaces in table and field names, but it's a really bad idea to use them if you care at least a little bit about the future of your work).
In that, create a ClientID Number field (a Foreign Key) that will hold the related Client's ID in the ClientCase table.
You don't talk about the relationship between a Client and its Cases. This is another area where you must be clear: how many cases can a single Client have?
At most 1 Case ? (0 or 1 Case)
exactly 1 Case?
at least one Case? (1 or more Cases)
any number of Cases? (0 or more Cases)
Knowing this is important for selecting the right type of JOIN in your queries. It's a crucial part of the design assumptions when building your database.
For instance, in the most general case, assuming that a Client can have 0 or more cases, you could have a report that displays the Client's Name and the number of cases related to them like this:
SELECT Client.Name,
Count(ClientCase.ID) AS CountOfCases
FROM Client
LEFT JOIN ClientCase
ON Client.ID = ClienCase.ClientID
GROUP BY Client.Name
You've described your basic design a bit more, but that's not enough. Show us the actual table structures and the SQL of the queries you tried. From the description you give, it's hard to really understand the actual details of the design and to tell you why it fails and how to make it work.
I am developing a database to store test data. Each piece of data has 11 tags of metadata. Currently I have a separate table for each of the metadata options. I have seen a few questions on here regarding best practices for numerous small tables, but I thought I'd pose the question for my own project because I didn't get a clear answer from the other questions asked.
Here is my table list, with the fields in each table:
Source Type - id, name, description
For Flight - id, name, description
Site - id, name, abrv, description
Stand - id, site (FK site table), name, abrv, descrition
Sensor Type - id, name, channels, descrition
Vehicle - id, name, abrv, descrition
Zone - id, vehicle (FK vehicle table), name, abrv, description
Event Type - id, name, description
Event - id, event type (FK to event type Table), name, descrition
Analysis - id, name, descrition
Bandwidth - id, name, descrition
You can see the fields are more or less the same in each of these tables. There are three tables that reference another table.
Would it be better to have just one large table called something like Meta with the following fields:
Meta: id, metavalue, name, abrv, FK, value, descrition
where metavalue = one of the above table names
and FK = a reference to another row in the Meta table in place of a foreign key?
I am new to databases and multiple tables seems most intuitive, but one table makes the programming easier.
So questions are:
Is it good practice to reduce the number of tables and put all static values in one table.
Is it bad to have a self referencing table.
FYI I am making this web database using django and mysql on a windows server with NTFS formatting.
Tips and best practices appreciate.
thanks.
"Would it be better to have just one large table" - emphatically and categorically, NO!
This anti-pattern is sometimes referred to as 'The one table to rule them all"!
Ten Common Database Design Mistakes: One table to hold all domain values.
Using the data in a query is much easier
Data can be validated using foreign key constraints very naturally,
something not feasible for the other
solution unless you implement ranges
of keys for every table – a terrible
mess to maintain.
If it turns out that you need to keep more information about a
ShipViaCarrier than just the code,
'UPS', and description, 'United Parcel
Service', then it is as simple as
adding a column or two. You could even
expand the table to be a full blown
representation of the businesses that
are carriers for the item.
All of the smaller domain tables will fit on a single page of disk.
This ensures a single read (and likely
a single page in cache). If the other
case, you might have your domain table
spread across many pages, unless you
cluster on the referring table name,
which then could cause it to be more
costly to use a non-clustered index if
you have many values.
You can still have one editor for all rows, as most domain tables will
likely have the same base
structure/usage. And while you would
lose the ability to query all domain
values in one query easily, why would
you want to? (A union query could
easily be created of the tables easily
if needed, but this would seem an
unlikely need.)
Most of these look like they won't do anything but expand codes into descriptions. Do you even need the tables? Just define a bunch of constants, or codes, and then have a dictionary of long descriptions for the codes.
The field in the referring table just stores the code. eg: "SRC_FOO", "EVT_BANG" etc.
This is also often known as the One True Lookup Table (OTLT) - see my old blog entry OTLT and EAV: the two big design mistakes all beginners make.
i'm designing a web site for a friend and i'm not sure what's the best way is to go in regards to one of my database tables.
To give you an idea, this is roughly what i have
Table: member_profile
`UserID`
`PlanID`
`Company`
`FirstName`
`LastName`
`DOB`
`Phone`
`AddressID`
`website`
`AllowNonUserComments`
`AllowNonUserBlogComments`
`RequireCaptchaForNonUserComments`
`DisplayMyLocation`
the last four
AllowNonUserComments
AllowNonUserBlogComments
RequireCaptchaForNonUserComments
DisplayMyLocation
(and possibly more such boolean fields to be added in the future) will control certain website functionality based on user preference.
Basically i'm not sure if i should move those fields to a
new table : member_profile_settings
`UserID`
`AllowNonUserComments`
`AllowNonUserBlogComments`
`RequireCaptchaForNonUserComments`
`DisplayMyLocation`
or if i should just leave it be part of the member_profile table since every member is going to have their own settings.
The target is roughly 100000 members on the long run and 10k to 20k in the short run. My main concern is database performance.
And while i'm at it question #2) would it make sense to move contact information of the member such as address street, city, state, zip , phone etc into the member_profile table instead of having address table and having the AddressID like i currently have.
Thank you
I would say "no" and "yes, but" as the answers to 1) and 2). For #1, your queries are going to be a lot easier to manage if you create columns for each preference. The best systems I've worked with were done that way. Moving the preferences into a separate table with "user, preference, value" triples leads to complex queries that join multiple tables just to check a setting.
For #2: there's no reason to put the address in another table, because the single "AddressID" column means there's just one address per member, anyway, and again, it's just going to complicate the queries. If you turn it around backwards and have an address table that embeds userids then that might make sense; it makes even more sense to do phone numbers that way, since people often have multiple phone numbers.
If each member in the database has exactly ONE value for each of the attributes you have listed, then your database is already normalized and thus in a quite convenient form. So, to answer #1, moving these fields to a different table would improve nothing and just make querying more difficult.
As for #2, if you wanted to contemplate the possibility of a member having multiple addresses or phone numbers, you should definitely put those in different tables, allowing many-to-one relationships. This might also make sense if you expect that a number of users will share the same address; this way, you will not be duplicating information by having to store all the same address information for multiple users, you would just reference an addresses table that would have the relevant information one time per address.
However, if you need neither multiple addresses per member nor multiple members per address, then putting the addresses information in another table is just unnecessary complexity. Which solution is more convenient depends on the needs of your specific application.
Since each member has exactly one value in this table, it's already normalized. However, considering query efficiency, sometimes denormalization should be considered.
Except the ID field, the others could seperate into 2 groups: profile group and settings group. If your website usaually use these two groups of data seperately, you should consider to have news table for different usage.
For example, if the profile fields only shows in profile page and the settings fields works in whole site, it's not necessary to look up profile fields all the time.
I'm planning a database who has a couple of tables who contain plenty of address information, city, zip code, email address, phone #, fax #, and so on (about 11 columns worth of it), a table is an organizations table containing (up to) 2 addresses (legal contacts and contacts they should actually be used), plus every user has the same information tied to him.
We are going to have to run some geolocation stuff on those addresses too (like every address that's within X Kilometers from another address).
I have a bunch of options, each with its own problem:
I could put all the information inside every table but that would make for tables with a very large amount of columns which I'd have problems indexing, and if I change my address format it'll take a while to fix it.
I could put all the information inside an array and serialize it, then store the serialized information in one field, same problem with the previous method with a little less columns and much less availability through mysql queries
I could create a separate table with address information and link it to the other tables either by
putting an address_id column in the users and organizations table
putting a related_id and related_table columns in the addresses table
That should keep stuff tidier, but it might create some unforeseen problems with excessive joining or whatever.
Personally I think that solution 3.2 is the best, but I'm not too confident about it, so I'm asking for opinions.
Option 2 is definitely out as it would put the filtering logic into your codes instead of letting the DBMS handle them.
Option 1 or 3 will depend on your need.
if you need fast access to all the data, and you usually access both addresses along with the organization information, then you might consider option 1. But this will make it difficult to query out (i.e. slow) if the table get too big in mysql.
option 3 is good provided you index the tables correctly.