Preferable database design for job posts - mysql

I have two types of job posts -- company job posts and project job posts. The following would be the input fields (on the front-end) for each:
*company*
- company name
- company location
- company description
*project*
- project name
- project location
- project description
- project type
What would be best database design for this -- one table, keeping all fields separate -
`job_post`
- company_name
- company_location
- company_description
- project_name
- project_description
- project_type
- project_location
One table combining the fields -
`job_post`
- name
- location
- description
- project_type
- is_company (i.e., whether a company (True) or project (False)
Or two tables, or something else? Why would that way be preferable over the other ways?

Depending on a lot of factors including the maximum size of this job, I would normalize the data even further than 2 separate tables, perhaps having a company name table, etc... as joining multiple tables results in much faster queries than one long never ending table full of all of your information. What if you want to add more fields to projects but not companies?
In short, I would definitely use multiple tables.

You have identified 3 major objects in your OP; the job, the project, and the company. Each of these objects will have their own attributes, none of which will are associated to the other objects, so I would recommend something akin to (demonstrative only):
job
id
name
company
id
name
project
id
name
link_job_company_project
job_id
company_id
project_id
This type of schema will allow you to add object specific attributes without affecting the other objects, yet combine them all together via the linking table into a 'job post'.

This surely got to do with volume of data stored in the table . In an abstract view one table looks pretty simple . But as Ryan says what if requirements change tomorrow and more columns to be added for one type either company or project. If you are with the prediction that large amount of data top be stored in the table in future even I will prefer 2 tables which will avoid unnecessary filtering of large amount of data which is a performance overhead.

Related

How to structure database of an application that will utilize multiple datasources for its models?

I'm creating an application that will use at minimum two different API for data of its models. For example, I'll have an entity / model of a person. The data for this person comes at minimum two different API and more may be relevant in the future. For example, email and phonenumber may come from one data source and first name and second name from another.
How would one structure an database for this, that is flexible enough to handle adding / removing of the "data sources".
I initially though of an database structure looking like this:
[Person]
- ID
- FirstName
- LastName
- Email
- PhoneNumber
[Person_FIRST_SOURCE]
- Person_ID
- Person_FIRST_SOURCE_ID
[Person_SECOND_SOURCE]
- Person_ID
- Person_SECOND_SOURCE_ID
Which I then would be able to in the data synchronization script to look up what "Person" entity in my database, the current iterating person from the source represents. This would allow me to easily connect/add more sources as I go.
EDIT: I would like to add that it's not always that Person A exist within both sources of data.
EDIT 2: The main question is not really dependent on the "Person" model. This could have been any type of model; Book, School, Company etc. Just some form of model, where its data comes from multiple sources.

Is it typical to have the same value for FKs in my fact table across all FK columns?

I'm new to multidimensional data warehousing and have been tasked my workplace in
developing a data warehousing solution for reporting purposes, so this might be a
stupid question but here it goes ...
Each record in my fact table have FK columns that link out to their respective dimension tables (ex. dimCustomer, dimGeography, dimProduct).
When loading the data warehouse during the ETL process, I first loaded up the dimension tables with the details, then I loaded the fact table and did lookup transformations to find the FK values to put in the fact table. In doing so it seems each row in the fact table has FKs of the same value (ex. row1 has a FK of 1 in each column across the board, row2 has value 2... etc.)
I'm just wondering if this is typical or if I need to rethink the design of the warehouse and ETL process.
Any suggestions would be greatly appreciated.
Thanks
Based on your comments, it sounds like there's a missed step in your ETL process.
For a call center / contact center, I might start out with a fact table like this:
CallFactID - unique key just for ETL purposes only
AssociateID - call center associate who initially took the call
ProductID - product that the user is calling about
CallTypeID - General, Complaint, Misc, etc
ClientID - company / individual that is calling
CallDateID - linked to your Date (by day) Dimension
CallTimeOfDayID - bucketed id for call time based on business rules
CallStartTimestamp - ANSI timestamp of start time
CallEndTimestamp - ANSI timestamp of end time
CallDurationTimestamp - INTERVAL data type, or integer in seconds, call duration
Your dimension tables would then be:
AssociateDim
ProductDim
CallTypeDim
ClientDim
DateDim
TimeOfDayDim
Your ETL will need to build the dimensions first. If you have a relational model in your source system, you would typically just go to the "lookup" tables for various things, such as the "Products" table or "Associates" table, and denormalize any relationships that make sense to be included as attributes. For example, a relational product table might look like:
PRODUCTS: ProductKey,
ProductName,
ProductTypeKey,
ProductManufacturerKey,
SKU,
UPC
You'd denormalize this into a general product dimension by looking up the product types and manufacturer to end up with something like:
PRODUCTDIM: PRODUCTID (DW surrogate key),
ProductKey,
ProductName,
ProductTypeDesc,
ManufacturerDesc,
ManufacturerCountry,
SKU,
UPC
For attributes that are only on your transaction (call record) tables but are low cardinality, you can create dimensions by doing SELECT DISTINCT on the these tables.
Once you have loaded all the dimensions, you then load the fact by doing a lookup against each of the dimensions based on the natural keys (which you've preserved in the dimension), and then assign that key to the fact row.
For a more detailed guide on ETL with DW Star Schemas, I highly recommend Ralph Kimball's book The Data Warehouse ETL Toolkit.

How to structure table Activities in a database?

I have a site written in cakephp with a mysql database.
Into my site I want to track the activities of every users, for example (like this site) if a user insert a product I want to put this activity into my database.
I have 2 ways:
1) One table called Activities with:
- id
- user_id
- title
- text
- type (the type of activity: comment, post edit)
2) more table differenced by activities
- table activities_comment
- table activities_post
- table activities_badges
The problem is when I go to the page activities of a user I can have different type of activities and I don't know which of this solution is better because a comment has a title and a comment, a post has only a text, a badge has an external id to its table (for example) ecc...
Help me please
I'm not familiar with CakePHP, but from purely database perspective your data model should probably look similar to this:
The symbol denotes category (aka. inheritance, subclass, subtype, generalization hierarchy etc.). Take a look at "Subtype Relationships" in ERwin Methods Guide for more info.
There are generally 3 strategies for implementing the category:
All types in single table. This requires a lot of NULLs and requires CHECKs to make sure separate subtypes are not inappropriately "intermingled".
All concrete types in separate tables (excluding the base, which is ACTIVITY in your case), which means common fields and relationships must be repeated in all child tables.
All types in separate tables (including the base). This implementation requires a little more JOINing, but is flexible and clean. It should be your default, unless there are strong reasons against it.

Is it good practice to consolidate small static tables in a database?

I am developing a database to store test data. Each piece of data has 11 tags of metadata. Currently I have a separate table for each of the metadata options. I have seen a few questions on here regarding best practices for numerous small tables, but I thought I'd pose the question for my own project because I didn't get a clear answer from the other questions asked.
Here is my table list, with the fields in each table:
Source Type - id, name, description
For Flight - id, name, description
Site - id, name, abrv, description
Stand - id, site (FK site table), name, abrv, descrition
Sensor Type - id, name, channels, descrition
Vehicle - id, name, abrv, descrition
Zone - id, vehicle (FK vehicle table), name, abrv, description
Event Type - id, name, description
Event - id, event type (FK to event type Table), name, descrition
Analysis - id, name, descrition
Bandwidth - id, name, descrition
You can see the fields are more or less the same in each of these tables. There are three tables that reference another table.
Would it be better to have just one large table called something like Meta with the following fields:
Meta: id, metavalue, name, abrv, FK, value, descrition
where metavalue = one of the above table names
and FK = a reference to another row in the Meta table in place of a foreign key?
I am new to databases and multiple tables seems most intuitive, but one table makes the programming easier.
So questions are:
Is it good practice to reduce the number of tables and put all static values in one table.
Is it bad to have a self referencing table.
FYI I am making this web database using django and mysql on a windows server with NTFS formatting.
Tips and best practices appreciate.
thanks.
"Would it be better to have just one large table" - emphatically and categorically, NO!
This anti-pattern is sometimes referred to as 'The one table to rule them all"!
Ten Common Database Design Mistakes: One table to hold all domain values.
Using the data in a query is much easier
Data can be validated using foreign key constraints very naturally,
something not feasible for the other
solution unless you implement ranges
of keys for every table – a terrible
mess to maintain.
If it turns out that you need to keep more information about a
ShipViaCarrier than just the code,
'UPS', and description, 'United Parcel
Service', then it is as simple as
adding a column or two. You could even
expand the table to be a full blown
representation of the businesses that
are carriers for the item.
All of the smaller domain tables will fit on a single page of disk.
This ensures a single read (and likely
a single page in cache). If the other
case, you might have your domain table
spread across many pages, unless you
cluster on the referring table name,
which then could cause it to be more
costly to use a non-clustered index if
you have many values.
You can still have one editor for all rows, as most domain tables will
likely have the same base
structure/usage. And while you would
lose the ability to query all domain
values in one query easily, why would
you want to? (A union query could
easily be created of the tables easily
if needed, but this would seem an
unlikely need.)
Most of these look like they won't do anything but expand codes into descriptions. Do you even need the tables? Just define a bunch of constants, or codes, and then have a dictionary of long descriptions for the codes.
The field in the referring table just stores the code. eg: "SRC_FOO", "EVT_BANG" etc.
This is also often known as the One True Lookup Table (OTLT) - see my old blog entry OTLT and EAV: the two big design mistakes all beginners make.

Critique my MySQL Database Design for Unlimited DYNAMIC Fields

Looking for a scalable, flexible and fast database design for 'Build your own form' style website - e.g Wufoo.
Rules:
User has only 1 Form they can build
User can create their own fields or choose from 'standard' fields
User's 1 Form has as many fields as the user wants
Values can be the sibling of another value E.g A photo value could have name, location, width, height as sibling values
Special Rules:
User can submit their form a maximum of 5 times a day
Value's Date is important
Flexibility to report on values (for single user, across all users, 1 field, many fields) is very important -- data visualization (most will be chronologically based e.g. all photos for July 2009 for all users).
Table "users"
uid
Table "field_user" - assign a field to a users form
fid
uid
weight - int - used to order the fields on the users form
Table "fields"
fid
creator_uid - int - the field 'creator'
label - varchar - e.g. Email
value_type - varchar - used to determine what field in the 'values' table will be filled in (e.g. if 'int' then values of this field will submit data into the values.type_int field - and all other .type_x fields will be NULL).
field_type - varchar - e.g. 'email' - used for special conditions e.g. validation rules
Table "values"
vid
parent_vid
fid
uid
date - date
date_group - int - value 1-5 (user may submit max of 5 forms per day)
type_varchar - varchar
type_text - text
type_int - int
type_float - float
type_bool - bool
type_date - date
type_timestamp - timestamp
I understand that this approach will mean records in the 'Value' table will only have 1 piece of data with other .type_x fields containing NULL's... but from my understanding this design will be the 'fastest' solution (less queries, less join tables)
At OSCON yesterday, Josh Berkus gave a good tutorial on DB design, and he spent a good fraction of it mercilessly tearing into such "EAV"il tables; you should be able to find his slides on the OSCON site soon, and eventually the audio recording of his whole tutorial online (the latter will probably take a while).
You'll need a join per attribute (multiple instances of the values table, one per attribute you're fetching or updating) so I don't know what you mean by "less join tables". Joining many instances of the same table isn't a particularly fast operation, and your design makes indices nearly unfeasible and unusable.
At least as a minor improvement use per-type separate tables for your attributes' values (maybe some indexing might be applicable in that case, though with MySQL's limitation to one index per query per table even that's somewhat dubious).
You should really look into schema-free dbs like CouchDB, problems like this are exactly those these types of DBs want to solve.
y'know, create table, alter, add a column, etc are operations you can do at run time in many modern rdbms implementations. Why be EAVil? Especially if you are using dynamic sql.
It's not for the fainthearted. I recall an implementation at Boeing which resulted in 70,000 tables in a database.
Obviously there are pitfalls in dynamic table creation, but they also exist for EAV tables. Things like two attributes for the same key expressing the same fact. Or transitive dependencies and other normalization gotchas. So why not at least leverage the power of the RDBMS on your behalf?
I agree with john owen.
dynamically creating a query from the schema is a small price to pay compared to querying EVA tables. Especially if the tables are large.
Usually table columns are considered an "interface". A design that relies on a dynamically changing interface is usually bad, but EAV data is a special case where you don't have many options. You have to choose between slow unintuitive queries or dynamic schema.