Does MySQL database require a unique identifier? - mysql

I'm really new to databases in general but could use some advice. I have a database that has about 6000 records (and growing but not crazy amounts). I'd like to build an API so that I can retrieve a property's price history, but have been advised I need a unique ID but I'm not so sure. Can anyone advise?
DB looks like this:
address
price
date_created
status
Address one, main street
£150,000
13/10/2022
new data
Address one, main street
£140,000
16/10/2022
update data
Address two, side road
£350,000
13/10/2022
new data

Maybe you will need to add/edit your data using a CRUD, maybe reference other tables, make foreign keys, etc. I recommend that you add a primary key. I've never worked on tables without a primary key.
I can hear my database teacher in 2001: you should always have a primary key and another unique key apart from the primary one (compound or on a single column) if you don't want your table to be just another excel sheet. And I remember that it was a mathematical demonstration of this rule with injective functions.
Of course, you can ignore these rules, but they are best practice advice for your future self :)

You might find an integer primary key useful, but it's not mandatory.
If the address is sufficient to be the unique column by which you can reference any row, then that's fine. It's called a "natural key." In practice, most of us have experienced that any column you think should be unique ends up not being 100% unique eventually. So a lot of developers recommend adding a "pseudokey" which has no reason not to be unique.
I wrote a chapter called "ID Required" describing the pros and cons of pseudokeys and natural keys in my book SQL Antipatterns, Volume 1: Avoiding the Pitfalls of Database Programming.

Related

Modify database table best practices

As my title states, I'm curious about the best practices for modifying an existing table in a (mysql) database. In my scenario, I have a table that is already full of data and has a column named product_id that is currently the primary key for the table. I'm working on a feature where I'm finding product_id doesn't necessarily need to be unique or the primary key, since I want to allow multiple records for the same product. Database design isn't a strength of mine yet, but in my head I feel like what I would want to do is run the command DROP PRIMARY KEY for the product_id column, then add a column called id and making this the new primary key. Then I would need to update the id column for each record with a unique id for it to be a valid primary key. As far as database design is concerned, is this the best practice for doing this or is it better to create a new table with the updated structure and copying the current records into the new table?
EDIT:
More about the feature I'm working on. The products are books and I'm trying to allow multiple sections of these books to be previewed. In order to do this, I'm storing page ranges that can be previewed. Right now, only one page range is allowed, which is why the product id doesn't need to be unique anymore.
A primary key is ALWAYS unique.
Why do you don't want it to be unique? It sounds like you are exposing the key outside the database, that the PK is visible somehow and some user(s) think it should behave differently. If this is the case then this is a really bad practice.
This is the typical case of the notorious "natural keys". They are a disaster waiting to happen; I don't like big time bombs. I've been strongly opposed to them for some time now. It's good they teach them in schools so you know what not to use in the real world.
Now for the solution. If product_id is exposed, then it shouldn't be the PK at all. Solution?
Create a new column (id maybe?) that is internal, that is unique, and not exposed to the user, while keeping product_id. This new column could have the exact same value as product_id at first.
Change all FK references from other tables to the new id column.
Then, remove the PK constraint from product_id and do whatever you want to do with it.
Add the PK contraint to the new id column.

Can I create a composite key with an extra character?

I'm building a new DB using MySQL to store lessons learned across a variety of projects. When we talk about this in the office, we refer to lessons by the Project Number and Lesson Number, i.e. PR12-81, where PR12 refers to the project and 81 refers to the specific lesson within that project. I want the primary key in my DB to have a hyphen in it as well.
When defining a composite key in SQL, I can make it reference the project and lesson but without the hyphen, i.e. PR1281. I've also considered creating a separate column of data type CHAR(1), putting a hyphen in every row and delcaring that the PK is made of 3 columns.
Is there another way that I can specify the primary key to be formatted in the preferred way?
Let your table's primary key be a nonsensical auto-increment number with no "meaning" whatsoever. Then, within that table, define two columns: project_number and lesson_number. If the two need to be unique, define a UNIQUE index encompassing the two fields.
Don't(!) create database keys which embed information into them, even if the business does so. If the business needs to refer to strings like PR12, so be it ... create a column to store the appropriate value, or use a one-to-many table. Use indexes as needed to enforce uniqueness.
Notice(!) that I've now described four columns:
The auto-increment based "actual" primary key, which contains no information.
The project_number column, probably a foreign key to a projects table.
Ditto the lesson_number. (With a UNIQUE composite index if needed.)
The column (or table) which contains "the string that the business uses."
Over time, business practices do change. And someday you just might .. no, you will... ... encounter a "business-used string" that was incorrectly assigned by the human-beings who do such things! Your database design needs to gracefully handle this. The schema I've described is so-called third-normal form. Do a Google-search on "normal forms" if you haven't already.

Is it okay to use the same column as a primary key for different tables?

I am a total novice to this whole database world and I have a question. I am building a database for my final project for my masters class. The database includes cities, counties, and demographic data for the state of Colorado. The database ultimately will be used as a spatial database. At this point I have all my tables built in Access, and have a ODBC connection to PostgreSQL to import the tables after they are created. Access does not allow for shapefiles to be added to the database, PostgreSQL does.
My question is about primary keys, each of my tables in Access share an FIPS code (this code allows me to join the demographic data to a shapefile and display the data in ArcMap with the proper coordinates). I have a many demographic data tables with this FIPS code. Is it acceptable to set the FIPS as the primary key for each table? Or does each table need its own individual primary key that is different from the others?
Thanks for the help!
The default PK is “ID”, so there really no problem with using this default for all tables.
In fact it means for any table or code you write you can now always rest easy as to what the primary key is going to be.
And if you copy or re-name a table, then again you know the ID.
Some people do prefer having the table name as part of the PK, but that does violate normalizing of data since now your attaching an external attribute to that PK column.
However for a FK (foreign key), since the VERY definition of the column is an external dependency, then I tend to include the table name like this:
Customers_ID
And once again due to this naming convention, then you can always “guess” or “know” the name of a FK column (table name + ID).
At the end of the day, there is not really a convention on this issue. However I will recommend for all tables you create, you do allow access to create that default PK of “id”. This of course assumes your database design is not using natural keys. And the debate of natural keys vs surrogate key (an auto number pk “id”) has many pros and cons. You can google natural keys vs surrogate keys for endless discussions on this issue.

Do I need a primary key for every table in MS Access?

I am new to MSAccess so I'm not sure about this; do I have to have a primary key for every single table in my database? I have one table which looks something like this:
(http://i108.photobucket.com/albums/n32/lurker3345/ACCESSHELP.png?t=1382688844)
In this case, every field/column has a repeating term. I have tried assigning the primary key to every field but it returns with an error saying that there is a repeated field.
How do I go about this?
Strictly speaking, Yes, every row in a relational database should have a Primary Key (a unique identifier). If doing quick-and-dirty work, you may be able to get away without one.
Internal Tracking ID
Some database generate a primary key under-the-covers if you do not assign one explicitly. Every database needs some way to internally track each row.
Natural Key
A natural key is an existing field with meaningful data that happens to identify each row uniquely. For example, if you were tracking people assigned to teams, you might have an "employee_id" column on the "person" table.
Surrogate Key
A surrogate key is an extra column you add to a table, just to assign an arbitrary value as the unique identifier. You might assign a serial number (1, 2, 3, …), or a UUID if your database (such as Postgres) supports that data type. Assigning a serial number or UUID is so common that nearly every database engine provides a built-in facility to help you automatically create such a value and assign to new rows.
My Advice
In my experience, any serious long-term project should always use a surrogate key because every natural key I've ever been tempted to use eventually changes. People change their names (get married, etc.). Employee IDs change when company gets acquired by another.
If, on the other hand, you are doing a quick-and-dirty job, such as analyzing a single batch of data to produce a chart once and never again, and your data happens to have a natural key then use it. Beware: One-time jobs often have a way of becoming recurring jobs.
Further advice… When importing data from a source outside your control, assign your own identifier even if the import contains a candidate key.
Composite Key
Some database engines offer a composite key feature, also called compound key, where two or more columns in the table are combined to create a single value which once combined should prove unique. For example, in a "person" table, "first_name" and "last_name", and "phone_number" fields might be unique when considered together. Unless two people married and sharing the same home phone number while also happening to each be named "Alex" with a shared last name! Because of such collisions as well as the tendency for meaningful data to change and also the overhead of calculating such combined values, it is advisable to stick with simple (single-column) keys unless you have a special situation.
If the data doesn't naturally have a unique field to use as the primary key, add an auto-generated integer column called "Id" or similar.
Read the "how to organize my data" section of this page:
http://www.htmlgoodies.com/primers/database/article.php/3478051
This page shows you how to create one (under "add an autonumber primary key"):
http://office.microsoft.com/en-gb/access-help/create-or-remove-a-primary-key-HA010014099.aspx
In you use a DataAdapter and a Currency Manager, your tables must have a primary key in order to push updates, additions and deletions back to the database. Otherwise, they will not register and you will receive an error.
I lost one week figuring that one out until I added this to the Try-Catch-End Try block: MsgBox(er.ToString) which mentioned "key". From there, I figured it out.
(NB : Having a primary key was not a requisite in VB6)
Not having a primary key usually means your data is poorly structured. However, it looks like you're dealing with summary/aggregate data there, so it's probably doesn't matter.

Always create unique keys whenever possible?

Should you always create unique keys whenever possible?
For example let's say I have a table with three fields, student ID, first name, last name and the student ID is the primary key.
If no two students have the first & last name, should I create a unique key for those two fields?
Yes, you should use unique indexes even when you already have a primary key when the column or combination of columns are unique. It's good to have constraints in your database to prevent bad data. However, this is not what you have in your case. Even if you currently have no students with duplicate names that can easily happen in the future. Names are not unique in the world.
U.S. Social Security numbers are almost always unique (they can be reused after a number of years, but it's unlikely to ever happen in your case), so they might make for a good candidate for a unique index. If you have non-U.S. students though then you would need to make the column nullable.
Yes, usually having unique IDs (surrogate keys) is best. In this case, last name and first name are not enough for a primary key. Even if you no duplicate names now, you can't be sure you won't have two John Smith's in the future.
Don't make the assumption that no two students will have the same name.
When the underlying model suggests it, it is a good idea to create unique keys. Constraints like these will ensure cohesive data and prevent errors. But in your case the underlying model does not suggest this to be the case.
Unique keys should follow business definitions; if the studentID is a "semi-natural" key (it has unique meaning that exists beyond your specific database), then that should suffice as your unique key.
If the studentID is simply an identity value that is assigned by the database as a row-number, then you probably need some other unique key to avoid entering the same student twice.
Primitive primary key with no relation to data domain is one of widely accepted best practices
( just imagine - one of your students decides to marry )
Another good practice (though from NoSql) world is to use GUID - this way keys are unique, and different datasets can be mixed in same table without collisions.
PS: you could save some storage space, but today it is cheap and there is no need to sacrifice good practices for it
Yes!
If you ever need to update or delete rows from the table, it is very advantageous to have something to uniquely identify each row in the table.
With your example, I don't think it's possible to guarantee no two students will share the same name. Even adding a date of birth still can't guarantee they'll always be unique. I'd recommend adding an auto incrementing INT or BIGINT as the primary key.
You can always add the Unique constraint as well and remove it if it becomes an issue.
A simple way to do it is use an auto-generated Guid (Globally Unique Identifier) to identify a student. It is "guarenteed" to be unique every time it is generated. Names can change (like when somebody gets married), but some auto generated value has no meaning so should never need to be changed.
http://en.wikipedia.org/wiki/Globally_unique_identifier
Your database constraints should be DBMS understood business rules. Is there a business rule that states that no two students may have the same first and last name combination? I presume not, therefore do not create a unique key for those two fields. Perhaps best not to presume, though, and ask a business domain expert e.g. the enrolment officer.
Note that a row in this table is a proposition I.e. that there exists a student enrolled with first name 'x' and last name 'y' and student ID 'z'. Clearly the DBMS has not concept of whether this proposition is true in the real world. What normally happens is that there will be a trusted source to verify data. The enterprise will authorize an officer (director etc) in this role. Let's say it is the enrolment officer who is responsible for verifying that 'x y' is a real person, that they are eligible to be enrolled, and the person is who they say they are. Typically, they will require sight of documents (certificates, passport, etc), take up references, interview the person, check public records, etc. Of course, the enrolment officer may delegate their responsibility to other members of staff or engage an agent.
At some point they will be satisfied and for convenience will issue they own identifier, the student ID. Mistakes do happen and it may turn out that this value is not unique, in which case it would be the enrolment officer's responsibility to resolve the problem and issue a new student to. Perhaps they will use software to generate the value to mitigate against such problems. The student ID will be issued to the student and will be used within the enterprise to identify the person for the convenience of all concerned. They may even be issued with a document (e.g. photo ID card) to assist in identification, based on the level of trust in a given context (e.g. may need to produce photo ID to sit an exam). If the student forgets their ID, loses their issued documents, etc then the enrolment office will be able to retrieve it from records e.g. with reference to copy documents taken during the verification process; they are unlikely to use first name and last name alone.
The point is, the trusted source for the identifier is the enrolment officer on behalf of the enterprise, rather than the database, the DBMS or any other kind of software involved in the process. Therefore, it probably is acceptable to make student ID the sole identifier for stents within the database. Consider, however, that an auto-increment column generated on one hardware build of a single DBMS within the enterprise is probably not suitable for the allocation of such significant identifier values.