Here's how I do it:
Table names are lower case, uses underscores to separate words, and are singular (e.g. foo, foo_bar, etc.
I generally (not always) have a auto increment PK. I use the following convention: tablename_id (e.g. foo_id, foo_bar_id, etc.).
When a table contains a column that is a foreign key, I just copy the column name of that key from whatever table it came from. For example, say table foo_bar has the FK foo_id (where foo_id is the PK of foo).
When defining FKs to enforce referential integrity, I use the following: tablename_fk_columnname (e.g. furthering example 3, it would be foo_bar_foo_id). Since this is a table name/column name combination, it is guaranteed to be unique within the database.
I order the columns like this: PKs, FKs, then the rest of columns alphabetically
Is there a better, more standard way to do this?
I would say that first and foremost: be consistent.
I reckon you are almost there with the conventions that you have outlined in your question. A couple of comments though:
Points 1 and 2 are good I reckon.
Point 3 - sadly this is not always possible. Think about how you would cope with a single table foo_bar that has columns foo_id and another_foo_id both of which reference the foo table foo_id column. You might want to consider how to deal with this. This is a bit of a corner case though!
Point 4 - Similar to Point 3. You may want to introduce a number at the end of the foreign key name to cater for having more than one referencing column.
Point 5 - I would avoid this. It provides you with little and will become a headache when you want to add or remove columns from a table at a later date.
Some other points are:
Index Naming Conventions
You may wish to introduce a naming convention for indexes - this will be a great help for any database metadata work that you might want to carry out. For example you might just want to call an index foo_bar_idx1 or foo_idx1 - totally up to you but worth considering.
Singular vs Plural Column Names
It might be a good idea to address the thorny issue of plural vs single in your column names as well as your table name(s). This subject often causes big debates in the DB community. I would stick with singular forms for both table names and columns. There. I've said it.
The main thing here is of course consistency!
Consistency is the key to any naming standard. As long as it's logical and consistent, you're 99% there.
The standard itself is very much personal preference - so if you like your standard, then run with it.
To answer your question outright - no, MySQL doesn't have a preferred naming convention/standard, so rolling your own is fine (and yours seems logical).
MySQL has a short description of their more or less strict rules:
https://dev.mysql.com/doc/internals/en/coding-style.html
Most common codingstyle for MySQL by Simon Holywell:
http://www.sqlstyle.guide/
See also this question:
Are there any published coding style guidelines for SQL?
Thankfully, PHP developers aren't "Camel case bigots" like some development communities I know.
Your conventions sound fine.
Just so long as they're a) simple, and b) consistent - I don't see any problems :)
PS:
Personally, I think 5) is overkill...
Simple Answer: NO
Well, at least a naming convention as such encouraged by Oracle or community, no, however, basically you have to be aware of following the rules and limits for identifiers, such as indicated in MySQL documentation: https://dev.mysql.com/doc/refman/8.0/en/identifiers.html
About the naming convention you follow, I think it is ok, just the number 5 is a little bit unnecesary, I think most visual tools for managing databases offer a option for sorting column names (I use DBeaver, and it have it), so if the purpouse is having a nice visual presentation of your table you can use this option I mention.
By personal experience, I would recommed this:
Use lower case. This almost ensures interoperability when you migrate your databases from one server to another. Sometimes the lower_case_table_names is not correctly configured and your server start throwing errors just by simply unrecognizing your camelCase or PascalCase standard (case sensitivity problem).
Short names. Simple and clear. The most easy and fast is identify your table or columns, the better. Trust me, when you make a lot of different queries in a short amount of time is better having all simple to write (and read).
Avoid prefixes. Unless you are using the same database for tables of different applications, don't use prefixes. This only add more verbosity to your queries. There are situations when this could be useful, for example, when you want to indentify primary keys and foreign keys, that usually table names are used as prefix for id columns.
Use underscores for separating words. If you still want to use more than one word for naming a table, column, etc., so use underscores for separating_the_words, this helps for legibility (your eyes and your stressed brain are going to thank you).
Be consistent. Once you have your own standard, follow it. DonĀ“t be the person that create the rules and is the first who breaking them, that is shameful.
And what about the "Plural vs Singular" naming? Well, this is most a situation of personal preferences. In my case I try to use plural names for tables because I think a table as a collection of elements or a package containig elements, so a plural name make sense for me; and singular names for columns because I see columns as attributes that describe singularly to those table elements.
Consistency is what everyone strongly suggest, the rest is upto you as long as it works.
For beginners its easy to get carried away and we name whatever we want at that time. This make sense at that point but a headache later.
foo foobar or foo_bar is great.
We name our table straight forward as much as possible and only use underscore if they are two different words. studentregistration to student_registration
like #Zbyszek says, having a simple id is more than enough for the auto-increment. The simplier the better. Why do you need foo_id? We had the same problem early on, we named all our columns with the table prefix. like foo_id, foo_name, foo_age. We dropped the tablename now and kept only the col as short as possible.
Since we are using just an id for PK we will be using foo_bar_fk (tablename is unique, folowed by the unique PK, followed by the _fk) as foreign key. We don't add id to the col name because it is said that the name 'id' is always the PK of the given table. So we have just the tablename and the _fk at the end.
For constrains we remove all underscores and join with camelCase (tablename + Colname + Fk) foobarUsernameFk (for username_fk col). It's just a way we are following. We keep a documentation for every names structures.
When keeping the col name short, we should also keep an eye on the RESTRICTED names.
+------------------------------------+
| foobar |
+------------------------------------+
| id (PK for the current table) |
| username_fk (PK of username table) |
| location (other column) |
| tel (other column) |
+------------------------------------+
as #fabrizio-valencia said use lower case. in windows if you export mysql database (phpmyadmin) the tables name will converted to lower case and this lead to all sort of
problems.
see Are table names in MySQL case sensitive?
Related
I am aware of benefits of using integers (amount of space, performance, indexes) as primary keys as opposite to strings.
Considering situation below...
I have a lookup table called ap_habitat (habitat values are also unique)
id habitat
1 Forest 1
2 Forest 2
Referenced table (fauna)
Especie habitat
X 1
Y 1
Referenced table is not very human readable (I know end users should not care about that, but for me would be useful to directly see in fauna table the NAME of the habitat).
To get a list of fauna and its habitat name I have to do a join...
select fauna.habitat, fauna.especie, AP_h.habitat from fauna INNER JOIN ap_habitat AS AP_h on AP_h.id=1
I could create a view, but if I have to create a view for each table referencing a foreign key...
Just wanna check what more experienced people recommend me.
Databases and, in general, computers are not designed to make your life more simple. They are designed to handle more data than a human mind can ever hope to remember in less time than it takes a human to blink. ;-)
Readability (especially in ideas conceived the before-Apple age) is not an issue at all.
On top of that: If you enjoy strange problems, data mapping impedance and spending endless nights writing workarounds for problems that using real-world names as primary keys get you for free, then be our guest. But please, don't ask for our help. We already know all the problems that you'll run into and it will be very hard for us to restrain our spite.
So: Never, ever use anything but an ID (UUID or long sequence) for a primary key. There are no (good) reasons to do it and if you found one, then you simply don't see the whole picture.
Yes, it makes a couple of things harder (like understanding what your data actually means). But as I said above, computers are meant to solve "lots of data" and "too slow" and nothing else.
Create a view or write a small helper application that can run your most important queries at the click of a button.
That said, I had some success with an application which runs a query and then displays a list of check boxes where I can pull in the foreign key relations to the data that the query returns (i.e. one checkbox per FK).
You ask about number or string as primary key. But based on your example if you use a string it wouldn't be a primary key at all, because you would no longer have a lookup table for it to be the primary key of. Perhaps you would still have the table for reasons not shown, like populating a drop down or storing extended descriptions beyond just the name.
Doing needless joins is not a good thing for performance. And having needless tables might be bad for storage size as well, depending on the length of the strings and the ratio of the sizes of the two tables.
You could also consider enumerated types, in which the data is stored as numbers (more or less) but the database translates them to and from strings automatically.
We are looking to extend MySQL by adding extra columns to each "column". Right now you have the following.
Field, Type, Null, Key, Default, Extra
We want to be able to add to the "column" definition an extra column like, Attributes. Our system has certain design specifications that we need to describe more data per "column". How can we accomplish this in MySQL?
The query to return back all of the columns is as follows.
SHOW COLUMNS FROM MyDB.MyTable;
EDIT 1
I should have added this to begin with, and I apologize for not doing so. We are currently describing attributes in the Comments section for each column type, and we understand that this is a very dirty solution, but it was the only one we could think of at the time. We have built a code generator that revolves around the DB structure and is what really stems from this initiative. We want to describe code attributes for a column so the code generator can pick up the changes and refresh the code base on each change or run.
First, terminology: "field" and "column" are basically synonyms in this context. There is no distinction between fields and columns. Some MySQL commands even allow you to use these two words interchangeably (e.g. SHOW FIELDS FROM MyDB.MyTable).
We want to assign attributes to each column in a table. Adding "field_foo" for "field" would repeat the same data over and over again for each row.
Simple answer:
If you want more attributes that pertain to a given column foo, then you should create another table, where foo is its primary key, so each distinct value gets exactly one row. This is part of the process of database normalization. This allows you to have attributes to describe a given value of foo without repeating data, even when you use that value many times in your original table.
It sounds like you might also need to allow for extensibility and you want to allow new columns at some future time, but you don't know which columns or how many right now. This is a pretty common project requirement.
You might be interested in my presentation Extensible Data Modeling, in which I give an overview of different solutions in SQL for this type of problem.
Extra Columns
Entity-Attribute-Value
Class Table Inheritance
Serialized LOB
Inverted Indexes
Online Schema Changes
Non-Relational Databases
None of these solutions are foolproof, each has their strengths and weaknesses. So it is worth learning about all of them, and then decide which ones have strengths that matter to your specific project, while their weaknesses are something that doesn't inconvenience you too much (that's the decision process for many software design choices).
We are currently describing attributes in the Comments section for each column type
So you're using something like the "Serialized LOB" solution.
Would the following relationships between the tables work out?
There are over 4000 rows for Airline Data, 150k rows for RAW DATA and
about 2000 rows for Airports.
I cannot create a primary key for RAW DATA because there are many repeated values.
http://i108.photobucket.com/albums/n32/lurker3345/ACCESSHELP-1.png
The relationships look fine. I assume many things -- for starters, that the data types match where they are linked. The diagram doesn't communicate much, and there could be many reasons why the schema shown is not optimal.
You certainly can create a PK for RAW DATA, and you had better because it is voluminous.
A common approach is to select multiple fields to serve as the key because together they obtain a unique value. This is called a compound key. It's helpful (even essential) because it naturally ensures the unique combination is not unintentially duplicated. (In most situations you will want to make sure all key fields are set to not allow a zero-length or null entry.)
There is a simpler approach that serves many situations. Maybe you don't need this kind of data integrity, or you aren't sure yet what would make up a compound key, or you just want to get a provisional PK in place. Merely add an autonumber field and declare that as PK.
Some developers take that easy approach and accomplish data validation outside the table...and some ignore data validation needs, which can result in a disaster.
Once you have the PK declared, making sure the table has indexes on critical fields (in addition to the PK) is important for efficiency.
Really, before all else, do yourself a favor and rename all tables and fields so there are no spaces. While at it, rethink every name and try for most descriptive and standardized name possible. Access is cruel when it comes to renaming things later on. Avoiding spaces is a practice that will help you greatly further down the road.
I have a doubt about best practices and how the database engine works.
Suppose I create a table called Employee, with the following columns:
SS ID (Primary Key)
Name
Sex
Age
The thing is.. I see a lot of databases that all its tables has and aditional column called ID, wich is a sequencial number. Should I put and ID field in my table here? I mean, it already has a Primary Key to be indexed. Will the database works faster with a sequencial ID field? I dont see how it helps if I wont use it to link or research any table.
Does it helps? If so, why, what happens in the database?
thanks!
EDIT -----
This is just a silly example. Forget about the SS_ID, I know there are better ways for choosing a primary key. The main topi is because some people I know just ask me to add the collumn named ID, even if I know we wont use it for any SQL query. They just think it helps the database's performance in some way, specially because some database tools like Microsoft Access always asks us if we want it to add this new column.
This is wrong, right?
If SS means "Social Security", I'd strongly advise against using that as a PK. An auto-incremented identity is the way to go.
Using keys with business logic built in is a bad idea. Lots of people are sensitive about giving SS information. Your app could be eliminating part of their audience if they use SS as primary key. Laws like HIPPA can make it impossible for you to use.
The actual performance gain in having a sequential id is going to depend a lot on how you use the table.
If you're using some ORM framework, these generally work better having a sequential ID of an integral type [1], which is typically achieved with an sequential id column.
If you don't use an ORM framework, having an idkey that you never use and a surrogate ss_id key which is effectively what you always use makes little sense.
If you're referencing employees from other database table (foreign-key), then it'll probably be more efficient to have an id column, as storing that integer is going to consume less space in the child tables than storing the ss_id (which I assume is a CHAR or VARCHAR) everywhere.
On the ss_id, assuming it's a social security number (looks like it would be), there might be legal & privacy concerns attached to it that you should care about - my answer assumes you do have valid reasons to have social security numbers in your database, and that you would be legally allowed to use & store them.
[1] This is usually explained by the fact the ORM frameworks rely on having highly specialized cache mechanisms, that are tailored for typical ORM use - which usually implies having a sequential id primary key, and letting application deal with actual business identity. This is in fact related to consideration very similar to these of the "foreign key" considerations.
US Social Security numbers are not sufficiently identifying. And banks certainly do not use them in that way. Not everybody has one. Errors result in duplicates. Foreigners don't have them. They are far too fragile to use as a database PK.
Most importantly: the are resused after death
Do some research: SSN as Primary Key
What's more important (obviously) is that you have a primary key, as long as the data you put use for that primary key will be uniquely identifiable. In your example, SSN's are uniquely identifiable which is why banks use them and will work. The problem with this example is that your Employee ID is likely to be used as a Foreign Key in other tables, which means you're taking personal information (that is legally protected) and spraying it across your data model. You might do better using an Auto Incremented field in this case.
I am working on a database schema, and am trying to make some decisions about table names. I like at least somewhat descriptive names, but then when I use suggested foreign key naming conventions, the result seems to get ridiculous. Consider this example:
Suppose I have table
session_subject_mark_item_info
And it has a foreign key that references
sessionSubjectID
in the
session_subjects
table.
Now when I create the foreign key name based on fk_[referencing_table]__[referenced_table]_[field_name] I end up with this maddness:
fk_session_subject_mark_item_info__session_subjects_sessionSubjectID
Would this type of a foreign key name cause me problems down the road, or is it quite common to see this?
Also, how do the more experienced database designers out there handle the conflict between descriptive naming for readability vs. the long names that result?
I am using MySQL and MySQL Workbench if that makes any difference.
UPDATE
Received the answers I needed below, but I wanted to mention that after some testing, I discovered that MySQL does have a limit on how long the FK name can be. So using the naming convention I mentioned, and descriptive table names, meant that in two instances in my db I had to shorten the names to avoid the MySQL 1059 error
http://dev.mysql.com/doc/refman/5.1/en/error-messages-server.html#error_er_too_long_ident
Why do you care what the FK names are? You never see them in code or use them. We also name our tables quite descriptively and commonly have names like this, using SQL Server. It doesn't matter to us, because we never seen them. They are just there to enforce data.
FK names are important for maintenance. Generally I only refernce the FK and the two table names, not the fields in the names. If you have named your fields correctly, it will be obvious what the fields are.
Although it probably makes no difference. I will say that i've had table names both ways. And in my opinion using long descriptive table names is overkill, and when working in code or even at the command line these long table names become burdensome and tedius. I mean seriously, who in their right mind would have a nearly 30 character table name, ie. stationchangelogmasterreport. Now imagine tens or even hundreds of these in a database system. from a developers point of view, this is just dumb!! My recommendation... put some thought into it, use abbreviations (when you can) and keep it short and to the point. for example, the above table name could be shortened to: stnchangelog, and if someone absolutely NEEDS a huge description explaining every meaning and use case for the table, then put this description in the table metadata, ie. the comments on the table. This keeps us developers from going crazy (and hating you for it), and offers the "meaning" of the table if needed.