A friend and I are working on a database that stores information about cPanel hosting accounts, such as what settings, apps, and features each account is using.
Most of the fields are boolean, such as whether or not the account has any wordpress sites, any php 5.4 driven sites, any ruby on rails sites, etc...
A small number of fields are non-boolean data like disk usage in MB, hostname of the server the account resides on, and the username of the account, etc...
In my mind, it makes sense to store ALL this information in one single table.
So the table might have the following columns:
php54 boolean,
wordpress boolean,
ror boolean,
username varchar(8),
hostname varchar(20),
usage_mb int(9),
I figure that the primary key could be (username,hostname).
However, my friend has already set up the database with multiple tables that look like this:
Fact Table:
id int(11),
php54 boolean,
wordpress boolean,
ror boolean,
usage_mb int(9),
User Table:
id int(11),
factid int(11),
hostid int(11),
username varchar(8)
Hostname Table:
id int(11),
hostname varchar(20),
ip varchar(15),
Where each table's primary key is "id" and the user table references the hostname table and fact table using 'hostid' and 'factid' foreign keys (respectively).
I believe my friend's rationale behind multiple tables is to organize the data based on the type of data, despite all the data being related to one single, unique account.
My rationale is that since all the data belongs to one unique account, and therefore every single row is 1:1, does it make sense to have multiple tables?
I would think multiple tables would be sensible if a row in one table can reference multiple rows in another table... But in this case each row from each table can only be associated with one single row from any other table... so i think one table is fine.
Should this data be in multiple tables, or in one single table?
We're both sort of noobs figuring things out as we go.
At which point does it make sense to use multiple tables?
Currently its really difficult to write an API to add the data associated with one single account to three separate tables, as all the primary keys auto increment, and other than that there isn't any key that is unique to the account which would make it easy to update existing data.
Sorry if none of this makes sense
In your case, I dont't think having multiple tables with one to one relationships is the right way.
It is not forbidden and in some cases it can be helpfull (
Is there ever a time where using a database 1:1 relationship makes sense?), but you'll have to deal with unecessary joins in your requests.
Ignoring ids, the way you find out what your CKs (candidate keys) are and whether you should decompose is the topic of normalization to higher NFs (normal forms). This formalizes your notion of "a row in one table can reference multiple rows in another" (among others). Guessing using common sense here, there's no particular need to decompose. Introducing ids not visible at the business level is always technically unnecessary but happens per its own practical/ergonomic reasons. Further explanation/justification is information modeling & database design textbook chapters on design, CKs, NFs & surrogates--read some. Vague notions like "same type of data" are not helpful.
(TL;DR "At what point should I create a separate table?" is a basic question with a complex answer that requires learning some stuff.)
Related
I am a total novice to this whole database world and I have a question. I am building a database for my final project for my masters class. The database includes cities, counties, and demographic data for the state of Colorado. The database ultimately will be used as a spatial database. At this point I have all my tables built in Access, and have a ODBC connection to PostgreSQL to import the tables after they are created. Access does not allow for shapefiles to be added to the database, PostgreSQL does.
My question is about primary keys, each of my tables in Access share an FIPS code (this code allows me to join the demographic data to a shapefile and display the data in ArcMap with the proper coordinates). I have a many demographic data tables with this FIPS code. Is it acceptable to set the FIPS as the primary key for each table? Or does each table need its own individual primary key that is different from the others?
Thanks for the help!
The default PK is “ID”, so there really no problem with using this default for all tables.
In fact it means for any table or code you write you can now always rest easy as to what the primary key is going to be.
And if you copy or re-name a table, then again you know the ID.
Some people do prefer having the table name as part of the PK, but that does violate normalizing of data since now your attaching an external attribute to that PK column.
However for a FK (foreign key), since the VERY definition of the column is an external dependency, then I tend to include the table name like this:
Customers_ID
And once again due to this naming convention, then you can always “guess” or “know” the name of a FK column (table name + ID).
At the end of the day, there is not really a convention on this issue. However I will recommend for all tables you create, you do allow access to create that default PK of “id”. This of course assumes your database design is not using natural keys. And the debate of natural keys vs surrogate key (an auto number pk “id”) has many pros and cons. You can google natural keys vs surrogate keys for endless discussions on this issue.
I had one single table that had lots of problems. I was saving data separated by commas in some fields, and afterwards I wasn't able to search them. Then, after search the web and find a lot of solutions, I decided to separate some tables.
That one table I had, became 5 tables.
First table is called agendamentos_diarios, this is the table that I'm gonna be storing the schedules.
Second Table is the table is called tecnicos, and I'm storing the technicians names. Two fields, id (primary key) and the name (varchar).
Third table is called agendamento_tecnico. This is the table (link) I'm goona store the id of the first and the second table. Thats because there are some schedules that are gonna be attended by one or more technicians.
Forth table is called veiculos (vehicles). The id and the name of the vehicle (two fields).
Fith table is the link between the first and the vehicles table. Same thing. I'm gonna store the schedule id and the vehicle id.
I had an image that can explain better than I'm trying to say.
Am I doing it correctly? Is there a better way of storing data to MySQL?
I agree with #Strawberry about the ids, but normally it is the Hibernate mapping type that do this. If you are not using Hibernate to design your tables you should take the ID out from agendamento_tecnico and agendamento_veiculos. That way you garantee the unicity. If you don't wanna do that create a unique key on the FK fields on thoose tables.
I notice that you separate the vehicles table from your technicians. On your model the same vehicle can be in two different schedules at the same time (which doesn't make sense). It will be better if the vehicle was linked on agendamento_tecnico table which will turn to be agendamento_tecnico_veiculo.
Looking to your table I note (i'm brazilian) that you have a column called "servico" which, means service. Your schedule table is designed to only one service. What about on the same schedule you have more than one service? To solve this you can create a table services and create a m-n relationship with schedule. It will be easier to create some reports and have the services well separated on your database.
There is also a nome_cliente field which means the client for that schedule. It would be better if you have a cliente (client) table and link the schedule with an FK.
As said before, there is no right answer. You have to think about your problem and on the possible growing of it. Model a database properly will avoid lot of headache later.
Better is subjective, there's no right answer.
My natural instinct would be to break that schedule table up even more.
Looks like data about the technician and the client is duplicated.
There again you might have made a decisions to de-normalise for perfectly valid reasons.
Doubt you'll find anyone on here who disagrees with you not having comma separated fields though.
Where you call a halt to the changes is dependant on your circumstances now. Comma separated fields caused you an issue, you got rid of them. So what bit of where you are is causing you an issue now?
looks ok, especially if a first try
one comment: I would name PK/FK (ids) the same in all tables and not using 'id' as name (additionaly we use '#' or '_' as end char of primary / foreighn keys: example technicos.technico_ and agendamento_tecnico has fields agend_tech_ and technico_. But this is not common sense. It makes queries a bit more coplex (because you must fully qualify the fields), but make the databse schema mor readable (you know in the moment wich PK belong to wich FK)
other comment: the two assotiative (i never wrote that word before!) tables, joining technos and agendamento_tecnico have an own ID field, but they do not need that, because the two (primary/unique) keys of the two tables they join, are unique them selfes, so you can use them as PK for this tables like:
CREATE TABLE agendamento_tecnico (
technico_ int not null,
agend_tech_ int not null,
primary key(technico_,agend_tech_)
)
I am trying to create a simple Registration Program using VB.Net and MySQL for its database. Here's my simple table for the basic Information
However, I am attempting to improve my basic knowledge in normalization of table and that's why I separated the Date field to avoid, let say in one day, the repeated insertion of the same date. I mean, when 50 individuals registered in one day, it will simply add a single date(record) in tblRegDate table instead of adding it up for 50 times in a table. Is there any way to do this? Is it possible in VB.Net and MySQL? Or rather, should I add or modify some field? or should I make a condition in VB.Net? The table above is what my friend taught me but I discovered that it doesn't eliminate the redundancy. Kindly give me any instruction or direct me to site where there's a simple tutorial for this. Thanks in advance!
here's my MySQL codes:
CREATE TABLE tblInfo(
Number INT AUTO_INCREMENT,
LastName VARCHAR(45),
FirstName VARCHAR(45),
MiddleName VARCHAR(45),
Gender ENUM(M,F),
BirthDate DATE,
PRIMARY KEY(Number));
CREATE TABLE tblRegDate(
IDRegDate INT AUTO_INCREMENT,
Date TIMESTAMP,
Number INT,
PRIMARY KEY(IDRegDate),
FOREIGN KEY(Number) REFERENCES tblInfo(Number));
As I see it in this case you don't have advanages of seperating a single field. You'll loose a lot of performance.
Table normalization isn't about don't having any redundant value. It's more about "Seperating the concerns"
Also it is important to not have an exploding complexity in your database. seperating single fills would end up in a database no one would be able to understand.
The Question is: Are there more informations on registration ? For Example Webpage, IP, .....
Than you should have two tables for example "Person" and "Registration". Then you would have two semantic different things which shouldn't be mixed up.
There are a lot of examples and information you can find via google. and wikipedia
http://en.wikipedia.org/wiki/Database_normalization
Actually it is not a good idea to seperate timestamp from the table.
You would need another table namely i.e timeTable. It would have two columns id and timestmap and you should reference this id in your tblRegDate table as foreign key.
Foreign key is an integer and has the size 4 bytes. Date on the other hand 3 bytes.
Therefore I would recommend you to keep date in your tblRegDate and not in a extra table
When you normalize DB structures, always keep it mind of ACID - http://en.wikipedia.org/wiki/ACID
Based on the fields you have, you should just keep it as a single table. Separating out the registration date is not a good design because you'll have to do a look up every time. In real life, you can consider indexing the reg date if your app always search or sort by regdate.
And if you FK RegDate table to the user table, it is also not efficient.
p.s. Also keep in mind that there are 4 levels of DB normalization. If you are new to DB design, you should consider learning how to move a DB design from 1st to 2nd, and 2nd to 3rd normal forms.
We rarely use 4th normal form in real life situation. Transaction systems usually stay at 3rd most of the time.
Hope that make sense.
I am attempting to build a database for inventory control using a large number of tables and enforced relationships, and I just ran into the 32-relationship (index) limit for an Access table (using Access 2007).
Just to clarify: the problem isn't that the Employees table has 32 explicit indexes. Rather, the problem is the limitation on the number of times the Employee table can be referenced in FOREIGN KEY constraints. For example:
CREATE TABLE Employees (employee_number INTEGER NOT NULL UNIQUE)
;
CREATE TABLE Table01 (employee_number INTEGER NOT NULL REFERENCES Employees (employee_number))
;
CREATE TABLE Table02 (employee_number INTEGER NOT NULL REFERENCES Employees (employee_number))
;
CREATE TABLE Table03 (employee_number INTEGER NOT NULL REFERENCES Employees (employee_number))
;
...
CREATE TABLE Table30 (employee_number INTEGER NOT NULL REFERENCES Employees (employee_number))
;
CREATE TABLE Table31 (employee_number INTEGER NOT NULL REFERENCES Employees (employee_number))
;
CREATE TABLE Table32 (employee_number INTEGER NOT NULL REFERENCES Employees (employee_number))
;
An exception is thrown on the last line above, "Could not create index; too many indexes defined.
What options do I have to work around this limitation?
I've heard that creating a duplicate table with a 1:1 relationship is one method. I'm new to database design, so please correct me if I'm wrong; but given a table Employees with 31 indexes, I would create a table Employees2(with one field?) with a 1:1 relationship to Employees and relationships to this new table from any remaining relations in which EmployeeID is a foreign key. What's the best way to ensure the second table is populated alongside the first?
Is there another approach?
Based on the lack of information available, it seems this may be a rare problem with a properly-designed database, or the solution is common knowledge. Forgive the noob!
Update: Immediate consensus is that my design is borked or far too ambitious. This could very well be the case. However, I'd rather have a general design discussion within a separate question, so for the sake of argument, can someone answer this one? If the answer is simply "Don't ever do that" I'll have to accept it.
I've run into this limitation a number of times with my apps. And I can assure the other posters that my apps are very well designed.
One problem is that Access creates indexes due to relationships and lookup fields that aren't viewable on the main index property box but they are accessible via DAO collections. These indexes are frequently duplicate indexes to indexes you have created as well.
I have a tool consisting of several forms you import into your BE MDB that allows you to remove the duplicate indexes. As I haven't yet made this available on my website please email me for it.
I'd suggest just not defining all the relationships/indexes to implementing a 1:1 relationship to get around it. Neither solution is optimal, but the later is going to create a much higher maintenance burden and data anomaly potential.
I am not going to decry the design as quicky as some of the others, but it does have me intrigued. Could you list the fields of the employee table that are foreign keys? There is a good liklihood that some normalization is in order and maybe some of the smart people on SO could make some design suggestions to work around the issue.
It is hard for me to believe that an Employee table would need 32 indexes; if it actually does you should consider migrating to at least SQL Express.
... I would create a table
Employees2(with one field?) with a 1:1
relationship to Employees and
relationships to this new table from
any remaining relations in which
EmployeeID is a foreign key.
That is workable. Presumably your main table might have an Autonumber field as the primary key, or you generate an index number. Your Employees2 table obviously must echo that.
What's the best way to ensure the
second table is populated alongside
the first?
That depends somewhat on how you are adding records. But in general, of course you must comply with the rules for integrity. This usually comes down to appending to tables in the correct order and ensuring each record is saved before trying to add a related record elsewhere.
I'm creating a user-based website. For each user, I'll need a few MySQL tables to store different types of information (that is, userInfo, quotesSubmitted, and ratesSubmitted). Is it a better idea to:
a) Create one database for the site (that is, "mySite") and then hundreds or thousands of tables inside this (that is, "userInfo_bob", "quotessubmitted_bob", "userInfo_shelly", and"quotesSubmitted_shelly")
or
b) Create hundreds or thousands of databases (that is, "Bob", "Shelly", etc.) and only a couple tables per database (that is, Inside of "Bob": userInfo, quotesSubmitted, ratesSubmitted, etc.)
Should I use one database, and many tables in that database, or many databases and few tables per database?
Edit:
The problem is that I need to keep track of who has rated what. That means if a user has rated 300 quotes, I need to be able to know exactly which quotes the user has rated.
Maybe I should do this?
One table for quotes. One table to list users. One table to document ALL ratings that have been made (that is, Three columns: User, Quote, rating). That seems reasonable. Is there any problem with that?
Use one database.
Use one table to hold users and one table to hold quotes.
In between those two tables you have a table that contains information to match users to quotes, this table will hold the rating that a user has given a quote.
This simple design will allow you to store a practically unlimited number of quotes, unlimited users, and you will be able to match up each quote to zero or more users and vice versa.
The table in the middle will contain foreign keys to the user and quote tables.
You might find it helpful to review some database design basics, there are plenty of related questions here on stackoverflow.
Start with these...
What is normalisation?
What is important to keep in mind when designing a database
How many fields is 'too many'?
More tables or more columns?
Should I use one database, and many
tables in that database, or many
databases and few tables per database?
Neither, you should use one database, with a table for users, a table for quotes, a table for rates, etc.
You then have a column in (e.g.) your quotes table which says which user the quote is for.
CREATE TABLE user (
user INT(10) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
...
);
CREATE TABLE quote (
quote INT(10) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
user INT(10) UNSIGNED NOT NULL,
...
);
CREATE TABLE rate (
rate INT(10) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
user INT(10) UNSIGNED NOT NULL,
...
);
You then use SQL JOINs in your SELECT statements to link the tables together.
EDIT - the above was assuming a many-to-one relationship between users and rates - where there are 'many-to-many' relationships you need a table for each sort of data, and then another table with rows for each User <-> Rate pair.
Two problems with many databases (actually many more, but start with these.)
You can't use parameters for database names.
What will you do whe you make your first change to a table? (Hint: Work X (# databases) ).
And "many tables" suggests you're thinking about tables per user. That's another equally problematic idea.
Neither. You create a single database with a single table for each type of data, then use foreign keys to link the data for each user together. If you had only a few, fixed users, this wouldn't be so bad, but what you're suggesting is simply not at all scalable.
we've got a similar system, having many users and their relevant datas. We've followed a single database and common tables approach. This way you would have a single table holding user information and a table holding all their data. Along with the data we have a reference to the userid which helps us segregate the information.
For comparison, a time when you actually do want separate databases would be when you have multiple webhosting clients that want to do their own things with the database - then you can set up security so they can access only their own data.
But if you are writing the code to interface with the data base, not them, then what you want is normalized tables as described in several other answers here.
Your design is flawed.. You should constrain the tuples using foreign keys to the correct user, rather than adding new entities for each account.
What is the difference between many tables in one database or many databases with same tables? Is it for better security or for different types of backups?
I m not sure about mySQL but in MSSQL it is like this:
If you need to backup databases in different way you need to consider keeping tables in different data files. By default they all are in PRIMARY file. You can specify different storage.
All transactions are hold in tempdb. This is not very good because if it transaction log becomes full then all databases stop functioning. Then you can end up with separate SQL servers for each user. Which is sort of nonsense if you are talking about thousands of clients.
One table with properly created indexes per each required entity set (one table for submitted quotes, one table for submitted rates).
CREATE TABLE quotesSubmtited (
userid INTEGER,
submittime DATETIME,
quote INTEGER,
quotedata INTEGER,
PRIMARY KEY (userid, submittime),
FOREIGN KEY quote REFERENCES quotesList (quoteId),
FOREIGN KEY userid REFERENCES userList (userId)
);
CREATE INDEX idx1 ON quotesSubmitted (quote);
Remember: more indexes you create, slower the updating. So take a closer look at what you use in queries and create indexes for that. A good database optimization tutorial will be of invaluable help in understanding what indexes you need to create (I cannot summarize it in this answer).
I also presume you don't know about JOINs and FOREIGN KEYs so make sure you read about them as well. Quite useful!
Use one database and one table. You will require a table "User" and this table will be linked (Primary--Foreign key) to these table.