I need to design and implement a database for employee's attendance system. The db need not to be a non-relational, I can go with whatever suits best with the requirements. The requirements are simple, I need to store employees information along with their clock in and out times.
Data requirements are as follows:
Number of employees will not be big (20-50).
Ability to retrieve all attendance times for all employees for a specific day or range of days (for a month for example).
Ability to add/modify/remove attendance times for specific employees.
Ability to retrieve calculated late attendance for each employee. (Employee is considered late according to some business rules related to attendance times and employee's information).
-Is using MongoDB better that using relational SQL (like mySQL)?
-What's the suggested high level design of the DB that will best simplifies DB implementation, data access, and application development?
This design can be achieved by both MongoDB or a relational database, with strengths and weaknesses of each. The schema design by user641887 is a perfectly valid approach with MongoDB, although I wouldn't use "date" as the "_id" in attendence, as two employees on the same day will have the same "_id" which is invalid, I would leave the "_id" of attendee of an Object_id. However just be aware about the limitations in mongo with collection joins, as you will need to look into the '$lookup' function (https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/), which was only added in Mongo-3.2. The advanatage of a mongo design is that it allows each document in the attendance table that user641887 proposed by be dynamic, and should this database grow very large, it shouldn't be too hard to scale the database. But I doubt that will be a concern if there are only 50 staff with 1 entry per day (50*365 = 18250 per year), even 10 years of data is a very small amount.
The above requirements can also be achieved using a relational structure, where you would again have 2 tables as described by user641887. Depending how many additional pieces of information you want to store in the "other attribute/parameters" you have a couple of options. If there are only a few known possible other attributes, then you can add a few nullable fields to each table. But if there are many fields which could exist or you don't know what to expect before you add them, then you can have two additional tables associated with employee:
employee_attributes:
employee _id : the _id code that matches the employee _id in the
employee table
attribute_code : an integer code that links to the code_description table (below)
attribute_value: the value of the attribute
NOTE: This approach with a single attribute table is limited to the attribute_value only having one data type (most likely string), but if you need to have multiple data types you can resolve that by having multiple employee attribute tables for each data type, e.g. employee_attribute_i (for ints), employee_attribute_s (for strings), employee_attribute_b (for booleans).
attribute_code_description:
attribute_code : the int code of this attribute
attribute_meaning: a string description of what this attribute is for (e.g. "allergies", "probation", "start_time", ...)
This same approach can be used for the "other attendance parameters".
With regards to "calculated late attendance for each employee", then you can set up triggers/rules to automatically fire that can add to a counter for each employee to monitor if they are late. This will work by firing a trigger upon insert into the the attendee table where the in_time field that can then by compared with the employees "start_time", if it is greater that that, +1 to a counter that logs how often they are late. I know that can be done in several relational databases (postgres/ingres certainly, and I'm sure many others). I don't know if that can be done on a mongo server.
you could have 2 collections one for the employee and one for the attendance.
employee collection can have attributes related to the employee
_id : Object_id
name : string
email : string
... other employee attributes
and attendance collection can have attributes related to attendance.
_id : date (you can store date as string or any other format to make it unique per day
in_time : date
out_time : date
other attendance parameters....
employee_id : (_id for employee)
HTH.
Related
I've got a annoying design issue when designing a database and it's models. Essentially, the database got clients and customers which should be able to make appointments with eachother. The clients should have their availability (on a general week basis) stored in the database, and this needs to be added to the appointment model. The solution does not require or want precise hours for the availability, just one value for each day - ranging from "not available", to "maybe available " to "available". The only solution i've come up with so far includes having all 7 days stored in a row for each client, but it looks nasty.
So here's some of what I got so far:
Client model:
ClientId
Service,
Fee
Customer-that-uses-Client model:
CustomerId
ServiceNeed
Availability-model:
ClientID (FK/PK)
Monday, (int)
...
...
Sunday (int)
And finally, appointment model:
AppointmentId
ClientID
CustomerID
StartDate
Hourse
Problem: is there any way i can redesign the avilability model to ... well, need less fields and still get each day stored with a (1-3) value depending on the clients availability ? Would also be really good if the appointment model wouldnt need to reference all that data from the availability model...
Problem
Answering the narrow question is easy. However, noting the Relational Database tag, there are a few problems in your model, that render it somewhat less than Relational.
Eg. the data content in each logical row needs to be unique. (Uniqueness on the Record id, which is physical, system-generated, and not from the data, cannot provide row uniqueness.) The Primary Key must be "made up from the data", which is of course the only way to make the data row unique.
Eg. values such as Day of availability and AvailabilityType are not constrained, and they need to be.
Relational Data Model
With the issues fixed, the answer looks like this:
Notation
All my data models are rendered in IDEF1X, the Standard for modelling Relational databases since 1993.
My IDEF1X Introduction is essential reading for those who are new to the Relational Model or data modelling.
Content
In the Relational Model, there is a large emphasis on constraining the data, such that the database as a whole contains only valid data.
The only solution i've come up with so far includes having all 7 days stored in a row for each client, but it looks nasty.
Yes. What you had was a repeating attribute (they are named Monday..Sunday, which may not look like a repeating attribute, but it is one, no less than a CSV list). That breaks Codd's Second Normal Form.
The solution is to place the single element in a subordinate table ProviderAvailable.
Day of availability and AvailabilityType are now constrained to a set of values.
The rows in Provider (sorry, the use of "Client" in this context grates on me) and Customer are now unique, due to addition of a Name. The users will not use an internal number to identify such entities, they will use a name, usually a ShortName.
Once the model is tightened up, and all the columns are defined, if Name (not a combination of LastName, FirstName, Initial) is unique, you can eliminate the RecordId, and elevate the Name AK to the PK.
Not Modelled
You have not asked, and I have not modelled these items, but I suspect they will come up as you progress in the development.
A Provider (Client) provides 1 Service. There may be more than 1 in future.
A Customer, seeking 1 Service, can make an Appointment with any Provider (who may or may not provide that Service). You may want to constrain each Appointment to a Provider who provides the sought Service.
As per my comment. It depends on how tight you want this Availability/Reservation system to be. Right now, there is nothing to prevent more than one Customer reserving one Provider on a particular Day, ie. a double-booking.
Normalize that availability table: instead of
ClientID (FK/PK)
Monday, (int)
...
...
Sunday (int)
go with
ClientID (PK/FK)
weekday integer value (0-6 or maybe 1-7) (PK)
availability integer value 1-3
This table has a compound primary key, made of (ClientID, weekday) because each client may have either zero or one entry for each of the seven weekdays.
In this table, you might have these rows:
43 2 3 (on Tuesdays = 2, client 43 is Available =3)
43 3 2 (on Wednesdays = 3, client 43 is MaybeAvailable =2)
If the row is missing, it means the client is unavailable. an availability value of 1 also means that.
and each customer have basic info (name, age, address) and can have few solutions that the website offers, currently there is only 1 maybe it will be 2 or 3 in the future (not something that will be big).
the one solution that currently the website offers is:
PaymentSolution
Now PaymentSolution have its own services:
PersonalPaymentService
OfficePaymentService
both of those services have active parameter that can be true/false
and for each service there are 3 payment options that can also be active true/false
For example:
PersonalPaymentService(active true)
paymentType startPaymentDate(date value) finishPaymentData(date value) active(boolean)
apartment payment 1/1/2018 1/1/2019 true
car payment 1/1/2018 1/1/2019 true
jetSki payment 1/1/2018 1/1/2019 false
OfficePaymentService(active false)
paymentType startPaymentDate(date value) finishPaymentData(date value) active(boolean) contactEmail(string value)
office payment 1/1/2018 1/1/2019 true john#rentOffice.com
computers payment 1/1/2018 1/1/2019 false john#rentComputers.com
Im trying to figure out how to model such information structure to sql tables, can someone please offer some directions please?
im having hard time how to break this into the right relations and having this model able to scale to more solutions in the future
queries I want to be able tp perform are something like:
give me PersonalPaymentService data for customer id 35.
give me OfficePaymentService for account id 43.
give me all data for customer id 67
anything can help! thanks!
I think you should revisit your design, based on your comment,
for example give me all PersonalPaymentService data for customer id
35. or give me OfficePaymentService for account id 43...or of course give me all data for customer id 67
I think the PersonalPaymentService and OfficePaymentService are logical representation of data and your actual payments are only the 6 types you described.
If you want to keep your tables clean and if you are using your mysql as a datastore, I would say just create a table like
CustomerID, PaymentType, StartPaymentDate, EndPaymentDate
where the PaymentType belongs to only the 6 subtypes.
and use your logical programming (PHP, Java whatever) to categorize the payments into office and personal types.
The advantage of using this schema is that in future if you decide to introduce another payment type, you will not require any schema changes.
However, if your constraint is that SQL should answer your queries whether its personal or office payment, you could theoretically introduce another type (column) called PaymentCategory, which has personal or office payment.
As mentioned in other post, Creating one table and adding a type discriminator column to that table would be the better approach for now . Also writing queries against one table would be easy . There are some situations when you have to add new types with some data(columns) specific to that type in your case column like contactEmail and later you could end up with a table with lot of columns and most of them have null values.
In those type of situations creating a one table(PaymentService) with common columns and creating separate table for each type(PersonalPayment,OfficePayment etc.. ) then adding foreign key constraint to maintain one to one relationship with the common table and each type tables could be a better solution.Writing queries against is not easy compare to previous approach and which requires to join type table with the common table.
Other option is to create separate tables for each type.
I'm creating a database to keep track on various statistics on my self and I'm wondering if there's a better way to store multiple entries for a single date.
E.g. from my table I have AllergyMedicine which can track multiple medicines taken on the same date, is there a better way to do this?
Also the tables Food and Allergy seems unnecessary, is there a better way to group tables?
Any suggestions are appreciated!
I find it helps to state the problem in a semi structured way, as below.
The system monitors one or more **persons**.
Each person consumes zero or more **items**. Each consumption has an attribute of date and time.
Items can be **food**, or **medicines**.
Food can be of the types **snack**, **fruit** or **meal**.
A meal has a **type**.
A person may report **symptoms**. Each report will cover a period of time, and be reported at a specific date/time.
Symptoms may be associated with zero or more **allergies**.
I do not believe that "date" is an entity in your schema - it's an attribute of events that occur, e.g. consuming something, or noticing a symptom.
If the statements above are true, the schema might be:
Persons
ID
name
...
FoodItemType
ID
Name
FoodItem
ID
Name
FoodItemTypeID (FK)
Medicine
ID
Name
FoodConsumption
PersonID
FoodID
ConsumptionDateTime
MedicineConsumption
PersonID
MedicineID
ConsumptionDateTime
Symptom
ID
Name
....
SymptomObservation
PersonID
SymptomID
SymptomStartDateTime
SymptomEndDateTime
SymptomReportDateTime
Allergy
ID
Name
AllergySymptom
AllergyID
SymptomID
Of course, if you take more than one medicine on one day, why not isolate that day (=date) in its own table?
So you'll have a table "days" with only dates, that you either prefill (like a calendar) or only fill with those days when you really took that medicine.
That way, you save a lot of space by "centering" the date in one table and relating everything else to it. Which is actually a very precise model of reality.
All your "FoodSnack", "FoodMeal", "AllergyMedicine" etc. with a date in them will become plain N:M mapping tables then.
You could even abstract further, reduce tables and make just three tables:
symptoms
causes
treatment
All of those related to the central "day" table (I wouldn't call it "Date", cause that's a keyword and easily mistaken also), plus related to each other, where applicable.
In Meetup.com, when you join a meetup group, you are usually required to complete a profile for that particular group. For example, if you join a movie meetup group, you may need to list the genres of movies you enjoy, etc.
I'm building a similar application, wherein users can join various groups and complete different profile details for each group. Assume the 2 possibilities:
Users can create their own groups and define what details to ask users that join that group (so, something a bit dynamic -- perhaps suggesting that at least an EAV design is required)
The developer decides now which groups to create and specify what details to ask users who join that group (meaning that the profile details will be predefined and "hard coded" into the system)
What's the best way to model such data?
More elaborate example:
The "Movie Goers" group request their members to specify the following:
Name
Birthdate (to be used to compute member's age)
Gender (must select from "male" or "female")
Favorite Genres (must select 1 or more from a list of specified genres)
The "Extreme Sports" group request their member to specify the following:
Name
Description of Activities Enjoyed (narrative form)
Postal Code
The bottom line is that each group may require different details from members joining their group. Ideally, I would like anyone to create a group (ala MeetUp.com). However, I also need the ability to query for members fairly well (e.g. find all women movie goers between the ages of 25 and 30).
For something like this....you'd want maximum normalization, so you wouldn't have duplicate data anywhere. Because your user-defined tables could possibly contain the same type of record, I think that you might have to go above 3NF for this.
My suggestion would be this - explode your tables so that you have something close to 6NF with EAV, so that each question that users must answer will have its own table. Then, your user-created tables will all reference one of your question tables. This avoids the duplication of data issue. (For instance, you don't want an entry in the "MovieGoers" group with the name "John Brown" and one in the "Extreme Sports" group with the name "Johnny B." for the same user; you also don't want his "what is your favorite color" answer to be "Blue" in one group and "Red" in another. Any data that can span across groups, like common questions, would be normalized in this form.)
The main drawback to this is that you'd end up with a lot of tables, and you'd probably want to create views for your statistical queries. However, in terms of pure data integrity, this would work well.
Note that you could probably get away with only factoring out the common fields, if you really wanted to. Examples of common fields would include Name, Location, Gender, and others; you could also do the same for common questions, like "what is your favorite color" or "do you have pets" or something to that extent. Group-specific questions that don't span across groups could be stored in a separate table for that group, un-exploded. I wouldn't advise this because it wouldn't be as flexible as the pure 6NF option and you run the risk of duplication (how do you predetermine which questions won't be common questions?) but if you really wanted to, you could do this.
There's a good question about 6NF here: Would like to Understand 6NF with an Example
I hope that made some sense and I hope it helps. If you have any questions, leave a comment.
Really, this is exactly a problem for which SQL is not a right solution. Forget normalization. This is exactly the job for NoSQL document stores. Every user as a document, having some essential fields like id, name, pwd etc. And every group adds possibility to add some fields. Unique fields can have names group-id-prefixed, shared fields (that grasp some more general concept) can have that field name free.
Except users (and groups) then you will have field descriptions with name, type, possible values, ... which is also very good for a document store.
If you use key-value document store from the beginning, you gain this freeform possibility of structuring your data plus querying them (though not by SQL, but by the means this or that NoSQL database provides).
First i'd like to note that the following structure is just a basis to your DB and you will need to expand/reduce it.
There are the following entities in DB:
user (just user)
group (any group)
template (list of requirement united into template to simplify assignment)
requirement (single requirement. For example: date of birth, gender, favorite sport)
"Modeling":
**User**
user_id
user_name
**Group**
name
group_id
user_group
user_id (FK)
group_id (FK)
**requirement**:
requirement_id
requirement_name
requirement_type (FK) (means the type: combo, free string, date) - should refers to dictionary)
**template**
template_id
template_name
**template_requirement**
r_id (FK)
t_id (FK)
The next step is to model appropriate schema for storing restrictions, i.e. validating rule for any requirement in any template. We have to separate it because for different groups the same restrictions can be different (for example: "age"). You can use the following table:
**restrictions**
group_id
template_id
requirement_id (should be here as template_id because the same requirement can exists in different templates and any group can consists of many templates)
restriction_type (FK) (points to another dict: value, length, regexp, at_least_one_value_choosed and so on)
So, as i said it is the basis. You can feel free to simplify this schema (wipe out tables, multiple templates for group). Or you can make it more general adding opportunity to create and publish temaplate, requirements and so on.
Hope you find this idea useful
You could save such data as JSON or XML (Structure, Data)
User Table
Userid
Username
Password
Groups -> JSON Array of all Groups
GroupStructure Table
Groupid
Groupname
Groupstructure -> JSON Structure (with specified Fields)
GroupData Table
Userid
Groupid
Groupdata -> JSON Data
I think this covers most of your constraints:
users
user_id, user_name, password, birth_date, gender
1, Robert Jones, *****, 2011-11-11, M
group
group_id, group_name
1, Movie Goers
2, Extreme Sports
group_membership
user_id, group_id
1, 1
1, 2
group_data
group_data_id, group_id, group_data_name
1, 1, Favorite Genres
2, 2, Favorite Activities
group_data_value
id, group_data_id, group_data_value
1,1,Comedy
2,1,Sci-Fi
3,1,Documentaries
4,2,Extreme Cage Fighting
5,2,Naked Extreme Bike Riding
user_group_data
user_id, group_id, group_data_id, group_data_value_id
1,1,1,1
1,1,1,2
1,2,2,4
1,2,2,5
I've had similar issues to this. I'm not sure if this would be the best recommendation for your specific situation but consider this.
Provide a means of storing data as XML, or JSON, or some other format that delimits the data, but basically stores it in field that has no specific format.
Provide a way to store the definition of that data
Provide a lookup/index table for the data.
This is a combination of techniques indicated already.
Essentially, you would create some interface to your clients to create a "form" for what they want saved. This form would indicated what pieces of information they want from the user. It would also indicate what pieces of information you want to search on.
Save this information to the definition table.
The definition table is then used to describe the user interface for entering data.
Once user data is entered, save the data (as xml or whatever) to one table with a unique id. At the same time, another table will be populated as an index with
id where the xml data was saved
name of field data is stored in
value of field data stored.
id of data definition.
now when a search commences, there should be no issue in searching for the information in the index table by name, value and definition id and getting back the id of the xml/json (or whatever) data you stored in the table that the data form was stored.
That data should be transformable once it is retrieved.
I was seriously sketchy on the details here, I hope this is enough of an answer to get you started. If you would like any explanation or additional details, let me know and I'll be happy to help.
if you're not stuck to mysql, i suggest you to use postgresql which provides build-in array datatypes.
you can define a define an array of varchar field to store group specific fields, in your groups table. to store values you can do the same in the membership table.
comparing to string parsing based xml types, this array approach will be really fast.
if you dont like array approach you can check out xml datatypes and an optional hstore datatype which is a key-value store.
Looking for a scalable, flexible and fast database design for 'Build your own form' style website - e.g Wufoo.
Rules:
User has only 1 Form they can build
User can create their own fields or choose from 'standard' fields
User's 1 Form has as many fields as the user wants
Values can be the sibling of another value E.g A photo value could have name, location, width, height as sibling values
Special Rules:
User can submit their form a maximum of 5 times a day
Value's Date is important
Flexibility to report on values (for single user, across all users, 1 field, many fields) is very important -- data visualization (most will be chronologically based e.g. all photos for July 2009 for all users).
Table "users"
uid
Table "field_user" - assign a field to a users form
fid
uid
weight - int - used to order the fields on the users form
Table "fields"
fid
creator_uid - int - the field 'creator'
label - varchar - e.g. Email
value_type - varchar - used to determine what field in the 'values' table will be filled in (e.g. if 'int' then values of this field will submit data into the values.type_int field - and all other .type_x fields will be NULL).
field_type - varchar - e.g. 'email' - used for special conditions e.g. validation rules
Table "values"
vid
parent_vid
fid
uid
date - date
date_group - int - value 1-5 (user may submit max of 5 forms per day)
type_varchar - varchar
type_text - text
type_int - int
type_float - float
type_bool - bool
type_date - date
type_timestamp - timestamp
I understand that this approach will mean records in the 'Value' table will only have 1 piece of data with other .type_x fields containing NULL's... but from my understanding this design will be the 'fastest' solution (less queries, less join tables)
At OSCON yesterday, Josh Berkus gave a good tutorial on DB design, and he spent a good fraction of it mercilessly tearing into such "EAV"il tables; you should be able to find his slides on the OSCON site soon, and eventually the audio recording of his whole tutorial online (the latter will probably take a while).
You'll need a join per attribute (multiple instances of the values table, one per attribute you're fetching or updating) so I don't know what you mean by "less join tables". Joining many instances of the same table isn't a particularly fast operation, and your design makes indices nearly unfeasible and unusable.
At least as a minor improvement use per-type separate tables for your attributes' values (maybe some indexing might be applicable in that case, though with MySQL's limitation to one index per query per table even that's somewhat dubious).
You should really look into schema-free dbs like CouchDB, problems like this are exactly those these types of DBs want to solve.
y'know, create table, alter, add a column, etc are operations you can do at run time in many modern rdbms implementations. Why be EAVil? Especially if you are using dynamic sql.
It's not for the fainthearted. I recall an implementation at Boeing which resulted in 70,000 tables in a database.
Obviously there are pitfalls in dynamic table creation, but they also exist for EAV tables. Things like two attributes for the same key expressing the same fact. Or transitive dependencies and other normalization gotchas. So why not at least leverage the power of the RDBMS on your behalf?
I agree with john owen.
dynamically creating a query from the schema is a small price to pay compared to querying EVA tables. Especially if the tables are large.
Usually table columns are considered an "interface". A design that relies on a dynamically changing interface is usually bad, but EAV data is a special case where you don't have many options. You have to choose between slow unintuitive queries or dynamic schema.