I have a database that I am trying to set up and I would like it to be in at least 3NF. However, some fields are not necessary in all situations, and the necessity of this field, not the value itself, depends on another.
In essence, I want to keep track of jobs that are on hold for one reason or another.
My main table right now includes these fields:
Job No (primary Key) | Short Text | Storage Location | Coordinator
I have other tables for employee list and storage locations. Now my problem is if the job is in the storage location "LAB," then it will have an associated Lab Ticket number that I want to track. I will have another table of Lab Tickets that contains status, ECD, etc. If the storage location is "MR" then the job should have a Notification number, and a separate table will contain info about the Notifications.
Although a job can only have 1 storage location at any given time, it can move. For instance, if a job goes to "LAB" and fails the test, it will get moved to "MR" and have a Notification created.
Is it a violation of 3NF, or otherwise just bad form, to have my tblJobs have fields:
Job No (primary Key) | Short Text | Storage Location | Coordinator | Lab Ticket | Notification | ...
even if not all fields are populated or used for every job? BTW I'm using MS Access, though I don't think that matters.
Edit: I see the related posts about Null values, but my question is less about the programming (I can easily enter a non-null value [ex. "N/A"] in the not-applicable fields), and more on an abstract database design level: In short, is it bad form to have fields that may not apply to a majority of the records? I normally hate seeing a bunch of N/A fields in any table, but I'm starting think some well thought-out Queries will allow me to see only the relevant information for a specific subset. Ex. for all items in "LAB", show the lab number status.
Related
I have research some question in stackoverflow, but what I want is for later query purpose, not for logging purose.
I have a project that needs to get value from certain moment.
For example
I have a user table
User:
id
name
address
Pet:
id
name
type
Adoption:
id
user_id
pet_id
Data:
User:
1, John, One Street
Pet:
1, Lucy, Cat
Adoption:
1, 1, 1
Let's say the user change address so it look like
User:
1, John, Another Street
And what I need is
What is the address(or other field) of the user when they adopt the pet.
What I am thinking of is always create a new row in same table(in this case user) and refer the new row to the previous row
User:
2, 1, John, Another Street ( where 1 is referring to the previous id / updated from)
1, NULL, John, One Street, deleted (NULL means this is newly created data)
The advantage of using this is, it's easy to query(I just query like usual
The downside is the table will be so huge to record every update. Is there any solution?
Thank you
This is what i do sometimes:
For any field that i need to track value changes, i design a separate changes table.
For example, for the address field that is a concept associated with the user entity and is not a direct property of the adoption entity, i define the table:
UserAddressChanges(UserID, Address, ChangeDateTime, ChangerPersonID)
This way, the changes data may be used in any other sub-system or system, independent of your current adoption use-case.
I use in-table change tracking for very simple tables like:
UniversityManagers(PersonID, AssignDateTime, AssignorPersonID)
For more complex tables with frequent changes (and usually, few refers to previous data) where i need full record logging, i separate the main table (of current records) and the log table which have extra fields such as LogID, ChangeDateTime, ChangerPersonID, ChangerIP, ...
There are different approaches to this.
Perhaps the simplest is to denormalize the data. If there is data you need at the point of adoption, include it as columns in the adoption table. This address is the "point-in-time" address.
This method is useful for simple things, but it does not scale well. And you have to pre-define the columns you want.
The next step is to create audit tables for all your tables, or at least all tables of interest. Every time a record changes in user, a new record is added into userAudit. Audit tables are usually maintained using triggers.
The advantage of audit tables is that they do not clutter the existing table (and logic). The same queries work on the existing tables.
Finally, you can just cave in and realize that your data model is overly simplified. You really have slowly changing dimensions. This data can be represented using version effective dates and version end dates for each row. The user table ends up looking like:
user_id name address version_eff_dt version_end_dt
Because user_id is no longer a primary key, you might want two tables users and userHistory, or something like that.
This is a "correct" representation of the data at any point in time. However, it usually requires restructuring queries because a single user appears multiple times in the table -- and user_id is no longer the primary key.
I've got a annoying design issue when designing a database and it's models. Essentially, the database got clients and customers which should be able to make appointments with eachother. The clients should have their availability (on a general week basis) stored in the database, and this needs to be added to the appointment model. The solution does not require or want precise hours for the availability, just one value for each day - ranging from "not available", to "maybe available " to "available". The only solution i've come up with so far includes having all 7 days stored in a row for each client, but it looks nasty.
So here's some of what I got so far:
Client model:
ClientId
Service,
Fee
Customer-that-uses-Client model:
CustomerId
ServiceNeed
Availability-model:
ClientID (FK/PK)
Monday, (int)
...
...
Sunday (int)
And finally, appointment model:
AppointmentId
ClientID
CustomerID
StartDate
Hourse
Problem: is there any way i can redesign the avilability model to ... well, need less fields and still get each day stored with a (1-3) value depending on the clients availability ? Would also be really good if the appointment model wouldnt need to reference all that data from the availability model...
Problem
Answering the narrow question is easy. However, noting the Relational Database tag, there are a few problems in your model, that render it somewhat less than Relational.
Eg. the data content in each logical row needs to be unique. (Uniqueness on the Record id, which is physical, system-generated, and not from the data, cannot provide row uniqueness.) The Primary Key must be "made up from the data", which is of course the only way to make the data row unique.
Eg. values such as Day of availability and AvailabilityType are not constrained, and they need to be.
Relational Data Model
With the issues fixed, the answer looks like this:
Notation
All my data models are rendered in IDEF1X, the Standard for modelling Relational databases since 1993.
My IDEF1X Introduction is essential reading for those who are new to the Relational Model or data modelling.
Content
In the Relational Model, there is a large emphasis on constraining the data, such that the database as a whole contains only valid data.
The only solution i've come up with so far includes having all 7 days stored in a row for each client, but it looks nasty.
Yes. What you had was a repeating attribute (they are named Monday..Sunday, which may not look like a repeating attribute, but it is one, no less than a CSV list). That breaks Codd's Second Normal Form.
The solution is to place the single element in a subordinate table ProviderAvailable.
Day of availability and AvailabilityType are now constrained to a set of values.
The rows in Provider (sorry, the use of "Client" in this context grates on me) and Customer are now unique, due to addition of a Name. The users will not use an internal number to identify such entities, they will use a name, usually a ShortName.
Once the model is tightened up, and all the columns are defined, if Name (not a combination of LastName, FirstName, Initial) is unique, you can eliminate the RecordId, and elevate the Name AK to the PK.
Not Modelled
You have not asked, and I have not modelled these items, but I suspect they will come up as you progress in the development.
A Provider (Client) provides 1 Service. There may be more than 1 in future.
A Customer, seeking 1 Service, can make an Appointment with any Provider (who may or may not provide that Service). You may want to constrain each Appointment to a Provider who provides the sought Service.
As per my comment. It depends on how tight you want this Availability/Reservation system to be. Right now, there is nothing to prevent more than one Customer reserving one Provider on a particular Day, ie. a double-booking.
Normalize that availability table: instead of
ClientID (FK/PK)
Monday, (int)
...
...
Sunday (int)
go with
ClientID (PK/FK)
weekday integer value (0-6 or maybe 1-7) (PK)
availability integer value 1-3
This table has a compound primary key, made of (ClientID, weekday) because each client may have either zero or one entry for each of the seven weekdays.
In this table, you might have these rows:
43 2 3 (on Tuesdays = 2, client 43 is Available =3)
43 3 2 (on Wednesdays = 3, client 43 is MaybeAvailable =2)
If the row is missing, it means the client is unavailable. an availability value of 1 also means that.
I am creating a website where users can post a listing of their home. I have checkboxes where users can check the characteristics their home contains such as a pool, fireplace, attached/detached garage etc.
I had to designs in mind but I was wondering which is more correct:
Create a column in the home listing table for each characteristics and give it a type of enum('0','1') where 0 stands for not checked and 1 stands for checked
Create a table which holds all the characteristics a property can have like: garage, pool, fireplace etc.. and then create a second table that pulls the characteristic id and pairs it with a home listing id
For eg: home_1 has a pool so a row will be created like this:
| home_1 | 1 |
where home_1 is the listing id and 1 is the id of pool in the characteristics table
Which option should I go with?
Option 1 seems good, because if you go with 2nd option then there will be joins while querying the database. And join are expensive and time taking in MySQL.
more can be found here https://www.percona.com/blog/2013/07/19/what-kind-of-queries-are-bad-for-mysql/
If you want to query the data like "count all detached houses"
Enum with seperate columns will work faster and easier to handle db operations.
If you are willing to query houses ONLY ON addresses, price and such NOT those features. 2nd method is easier to develop and maintain.
In short, use 2nd method if u are not going to query those house characteristics
individually.
It all depends on your method of using the data after you save them. But the basic idea should be to consider mappings in these ways:
Go with the second option when:
If the two entities are many-many (many homes, many characteristics) you should go with the second option (even if it adds little cost of using joins in future).
Since your full db mapping is not known, I am proposing one more option IF the characteristics are independent of property. Meaning, if you are planning to use characteristics to reference some other entities of other tables, then it will be best again to go with your second option.
Go with the first option when
If it is just one-many relationship (one home, many characteristics), your first option works good because not only it would reduce cost while fetching but also will update/remove the dependent characteristics of home when your home record gets updated/deleted.
Lastly, Its only up to you to decide the mapping type and dependencies of data models.
I am currently working on restructuring my site's database. As the schema I have now is not one of the best, I thought it would be useful to hear some suggestions from you.
To start off, my site actually consists of widgets. For each widget I need a table for settings (where each instance of the widget has its user defined settings), a table for common (shared items between instances of the same widget) and userdata (users' saved data within an instance of a widget).
Until now, I had the following schema, consisting of 2 databases:
the first database, where I had all site-maintenance tables (e.g. users, widgets installed, logs, notifications, messages etc.) PLUS a table where I joined each widget instance to each user that instanciated it, having assigned a unique ID (so, I have the following columns: user_id, widget_id and unique_id).
the second database, where I kept all widget-related data. That means, for each widget (unique by its widget_id) I had three tables: [widget_id]_settings, [widget_id]_common and [widget_id]_userdata. In each of these tables, each row held that unique_id of the users' widget. Actually here was all the users' data stored within a widget.
To give a short example of how my databases worked:
First database:
In the users table I have user_id = 1
In the widgets table I have widget_id = 1
In the users_widgets table I have user_id = 1, widget_id = 1, unique_id = 1
Second database:
In the 1_settings I have unique_id = 1, ..., where ... represents the user's widget settings
In the 1_common I have several rows which represent shared data between instances of the same widget (so, no user specific data here)
In the 1_userdata I have unique_id = 1, ..., where ... represents the user's widget data. An important notice here is that this table may contain several rows with the same unique_id (e.g. For a tasks widget, a user can have several tasks for a widget instance)
Hope you understood in the rough my database schema.
Now, I want to develop a 'cleaner' schema, so it won't be necessary to have 2 databases and switch each time from one to another in my application. It would be also great if I found a way NOT to dinamically generate tables in the second database (1_settings, 2_settings, ... , n_settings).
I will greatly appreciate any effort in suggesting any better way of achieving this. Thank you very much in advance!
EDIT:
Shall I have databases like MongoDB or CouchDB in my mind when restructurating my databases? I mean, for the second database, where it would be better if I didn't have a fixed schema.
Also, how would traditional SQL's and NoSQL's get along on the same site?
A possible schema for the users_widgets table could be:
id | user_id | widget_id
You don't need the unique_id field in the users_widgets table, unless you want to hide the primary key for some reason. In fact, I would rename this table to something a little more memorable like widget_instances, and use widget_instance_id in the remaining tables of the second database.
One way to handle the second set of tables is by using a metadata style:
widget_instance_settings
id | widget_instance_id | key | value
This would include the userdata, because user_id is related to the widget_instance_id, unless you want to allow a user to create multiple instances of the same widget, and have the same data across all instances for some reason.
widget_common_settings
id | widget_id | key | value
This type of schema can be seen in packages like Elgg.
Do you know the settings a widget class and widget instance could have? In this case these settings could be made columns of the widget_class table (for common settings) and widget_instance (for instance specific settings).
If you don't know them, then you could have a widget_class_settings table that has a many to one relation with the widget_class table and a widget_instance_settings that has a many to one relation to the widget_instance table. Between the widget_instance and the widget_class you could, again, have a many to one relation. The widget_instance could also have a foreign key in the users table, so that you know which user created a specific widget.
What's the best storage mechanism (from the view of the database to be used and system for storing all the records) for a system built to track whois record changes? The program will be run once a day and a track should be kept of what the previous value was and what the new value is.
Suggestions on database and thoughts on how to store the different records/fields so that data is not redundant/duplicated
(Added) My thoughts on one mechanism to store data
Example case showing sale of one domain "sample.com" from personA to personB on 1/1/2010
Table_DomainNames
DomainId | DomainName
1 example.com
2 sample.com
Table_ChangeTrack
DomainId | DateTime | RegistrarId | RegistrantId | (others)
2 1/1/2009 1 1
2 1/1/2010 2 2
Table_Registrars
RegistrarId | RegistrarName
1 GoDaddy
2 1&1
Table_Registrants
RegistrantId | RegistrantName
1 PersonA
2 PersonB
All tables are "append-only". Does this model make sense? Table_ChangeTrack should be "added to" only when there is any change in ANY of the monitored fields.
Is there any way of making this more efficient / tighter from the size point-of-view??
The primary data is the existence or changes to the whois records. This suggests that your primary table be:
<id, domain, effective_date, detail_id>
where the detail_id points to actual whois data, likely normalized itself:
<detail_id, registrar_id, admin_id, tech_id, ...>
But do note that most registrars consider the information their property (whether it is or not) and have warnings like:
TERMS OF USE: You are not authorized
to access or query our Whois database
through the use of electronic
processes that are high-volume and
automated except as reasonably
necessary to register domain names or
modify existing registrations...
From which you can expect that they'll cut you off if you read their databases too much.
You could
store the checksum of a normalized form of the whois record data fields for comparison.
store the original and current version of the data (possibly in compressed form), if required.
store diffs of each detected change (possibly in compressed form), if required.
It is much like how incremental backup systems work. Maybe you can get further inspiration from there.
you can write vbscript in an excel file to go out and query a webpage (in this case, the particular 'whois' url for a specific site) and then store the results back to a worksheet in excel.