I need to create a table which stores 'events' from different clients, each event has a event_id. The event_id is unique for a specific client, implies that the combination of event_id(integer) and client(varchar) can be made a primary key. I intend to use this table as a data provider for my Java application which uses hibernate. The use cases will be adding of events, updating of events and processing the events to generate reports.
I want to ensure fast and accurate update, which requires fetching of the exact row and updating it in hibernate.
Please advice what should be the primary key:
Create a primary_key using event_id and client column
Create a additional id column with auto_increment and create a unique index using event_id and client
I am confused whether to create a id with auto_increment column or not.
Based on comments from JB Nizet
Prefer non-functional, single-column, purely technical, autogenerated primary keys.
Because the rest of your application will be able to reference the client event by a single numerical ID rather than by the combination of two information's, one of them being textual. Because if you add a third information to the functional key of an event later, or change the value of your textual client IDs, you won't have to alter all the tables having a foreign key to the client event table. Because accessing an event by a single-column, numeric PK is faster than accessing it by a composite, textual one. Because the Hibernate mapping and the code using it will be much easier to write, etc.
Related
I started to design a database that tracks system events by following some online tutorials, and some easy examples start by assigning auto-incrementing IDs as primary keys. I looked at my database, I don't really need IDs. Out of all my columns, the timestamp and device ID are the two columns that together identifies an unique event.
What my program does right now is to pull some events from system log in the past x minutes and insert these events to the database. However, I could be going too much into the past that the events overlap with what's already in the database. As I mentioned before, timestamp and device ID are the two fields that uniquely identify an event. My question is, should I use these two fields as my primary key and use "Insert ignore" from now on so I can avoid having duplicate records?
It is a good practise to never have your business values as table's primary key and always to use synthetic, e.g. autoincrement, values for this. You will make your life easier in the future when business requirements change :)
We are currently struggling with exactly this situation. Have a column with business values as a primary key for 2 years and now painfully introducing an autoincrement one.
You may need to use foreign key from other table to this in the future to link some rows between two tables. It is easier with one-column primary key.
But if you don't need it now - no need to create column special for index. Table can be altered in future to add such column with autoincrement and move primary key to it.
I have a table that does not require a primary key. It consists of 4 columns which are email,date,time,message. Each time a user logs in, logs out, or does any particular action that is important, I log the email, date, time and action (message). Currently the table is setup with email as the Primary Key but I am unable to insert more than one record with the same email. I suppose I could use time as the PK but there is the possibility that two actions fall on the same time. How can I just use the table without a PK? I have tried turning it off for the email column but it does not allow me to.
Yes as you have defined email field as your primary, it can hold unique data only and no duplication allowed.
So you have two options:
1: Remove email field as a primary key
2: Add new integer field as a Primary key with auto increment (I would prefer this one)
You could use a natural primary key that would be a combination of Email + Date + Time + Action. That combination would be unique. It is impossible for the same user to do 2 different actions at the same time. That will help you to keep integrity of your information.
Hope this helps you.
To make a decision on a table' primary key one may start with considering these points (applicable to innodb):
How the data is going to be accessed after it is written (if you don't query it, why store it?). If you care about read performance you should query your data by the primary key, since for innodb primary key is the only possible clustered index.
The data is stored ordered by the primary key, so if you care about write performance, you should write data ideally ordered by your primary key, which always happens automatically if you have an auto_increment. Also table for which you don't explicitly specify a primary key are going to have a hidden auto_increment field which you won't be able to access, i.e. you get less for the same cost.
I am currently rebuilding a database which is used to store patient records. In the current database, the primary key for a patient is their name and date of birth, (a single column, ie "John Smith 1970-01-01", it is not composite). This is also a foreign key in many other tables to reference the patients table. I am planning to replace this key with an auto-generated integer key (since there will obviously be duplicate keys one day under the current system). How can I add a new primary key to this table and add appropriate foreign keys on all the other tables? Keep in mind that there is already a very large amount of data (~500,000 records) and these data references cannot be broken.
Thanks!
If up to me..
Add a new future-PK column as a non-null unique index (it must be a KEY, but not necessarily the PK) with auto_increment.
Add the appropriate new-FK columns to all the related tables, these should be initially nullable.
Set the new-FK value to the appropriate future-PK value based on the current-PK/FK relationships. Use an "UPDATE .. JOIN" for this step.
Enable the Referential Integrity Constraints (DRI) on the relevant tables. It only needs to be KEY/FK, not PK/FK, which is why the future-PK can be used. Every existing DRI constraint using the current-PK should likely be updated during this step.
Remove the new-FK column nullability based on modeling requirements.
Remove any residue old-FK columns as they are now redundant data.
Switch the old-PK and the new/future-PK (this can be done in one command and may take awhile to physically reorganize all the rows). Remove the old PK column as applicable, or perhaps simply remove the KEY status.
I would also offline the database during the process, review and test the process (use a testing database for dry-runs), and maintain backups.
The Data-Access Layer and any Views/etc will also need to be updated. These should be done at the same time, again through a review and testing process.
Also, even when adding an auto-increment PK, the table should generally still have an appropriate covering natural key enforced with unique constraints.
I solved the problem using the following method:
1- Assigned added a new primary key to the patients table and assigned unique values to all existing records
2- Created materialized views (without triggers) for each of the referencing tables that included all fields in the referencing table as well as the newly created id field in the patients table (via a join).
3- Deleted the source referencing tables
4- Renamed the materialized views to the names of the original source tables
The materialized views are now the dependent tables.
A reference for materialized views: http://www.fromdual.com/mysql-materialized-views
I am new to MSAccess so I'm not sure about this; do I have to have a primary key for every single table in my database? I have one table which looks something like this:
(http://i108.photobucket.com/albums/n32/lurker3345/ACCESSHELP.png?t=1382688844)
In this case, every field/column has a repeating term. I have tried assigning the primary key to every field but it returns with an error saying that there is a repeated field.
How do I go about this?
Strictly speaking, Yes, every row in a relational database should have a Primary Key (a unique identifier). If doing quick-and-dirty work, you may be able to get away without one.
Internal Tracking ID
Some database generate a primary key under-the-covers if you do not assign one explicitly. Every database needs some way to internally track each row.
Natural Key
A natural key is an existing field with meaningful data that happens to identify each row uniquely. For example, if you were tracking people assigned to teams, you might have an "employee_id" column on the "person" table.
Surrogate Key
A surrogate key is an extra column you add to a table, just to assign an arbitrary value as the unique identifier. You might assign a serial number (1, 2, 3, …), or a UUID if your database (such as Postgres) supports that data type. Assigning a serial number or UUID is so common that nearly every database engine provides a built-in facility to help you automatically create such a value and assign to new rows.
My Advice
In my experience, any serious long-term project should always use a surrogate key because every natural key I've ever been tempted to use eventually changes. People change their names (get married, etc.). Employee IDs change when company gets acquired by another.
If, on the other hand, you are doing a quick-and-dirty job, such as analyzing a single batch of data to produce a chart once and never again, and your data happens to have a natural key then use it. Beware: One-time jobs often have a way of becoming recurring jobs.
Further advice… When importing data from a source outside your control, assign your own identifier even if the import contains a candidate key.
Composite Key
Some database engines offer a composite key feature, also called compound key, where two or more columns in the table are combined to create a single value which once combined should prove unique. For example, in a "person" table, "first_name" and "last_name", and "phone_number" fields might be unique when considered together. Unless two people married and sharing the same home phone number while also happening to each be named "Alex" with a shared last name! Because of such collisions as well as the tendency for meaningful data to change and also the overhead of calculating such combined values, it is advisable to stick with simple (single-column) keys unless you have a special situation.
If the data doesn't naturally have a unique field to use as the primary key, add an auto-generated integer column called "Id" or similar.
Read the "how to organize my data" section of this page:
http://www.htmlgoodies.com/primers/database/article.php/3478051
This page shows you how to create one (under "add an autonumber primary key"):
http://office.microsoft.com/en-gb/access-help/create-or-remove-a-primary-key-HA010014099.aspx
In you use a DataAdapter and a Currency Manager, your tables must have a primary key in order to push updates, additions and deletions back to the database. Otherwise, they will not register and you will receive an error.
I lost one week figuring that one out until I added this to the Try-Catch-End Try block: MsgBox(er.ToString) which mentioned "key". From there, I figured it out.
(NB : Having a primary key was not a requisite in VB6)
Not having a primary key usually means your data is poorly structured. However, it looks like you're dealing with summary/aggregate data there, so it's probably doesn't matter.
I'm using before and after insert triggers to generate ids (primary key) of the form "ID_NAME-000001" in several tables. At the moment, the value of the hibernate generator class of these pojos is assigned. A random string is assigned to the object to be persisted and when it's inserted by hibernate, the trigger assigns a correct id value.
The problem with this approach is that I'm unable to retrieve the persisted object because the id only exists in the database, not in the object I just saved.
I guess I need to create a custom generator class that could retrieve the id value assigned by the trigger. I've seen an example of this for oracle (https://forum.hibernate.org/viewtopic.php?f=1&t=973262) but I haven't been able to create something similar for MySQL. Any ideas?
Thanks,
update:
Seems that this is a common and, yet, not solved problem. I ended up creating a new column to serve as a unique key to use a select generator class.
Hope this won't spark a holy war for whether using surrogate key or not. But it's time to open the conversation here.
Another approach would be just, use the generated key as surrogate key and assign a new field for your trigger assigned id. The surrogate key is the primary key. You have the logically named key (such as the "ID_NAME-000001" in your example). So your database rows will have 2 keys, the primary key is surrogate key (could be UUID, GUID, running number).
Usually this approach is preferable, because it can adapt to new changes better.
Say, you have these row using surrogate key instead of using the generated id as natural key.
Surrogate key:
id: "2FE6E772-CDD7-4ACD-9506-04670D57AA7F", logical_id: "ID_NAME-000001", ...
Natural key:
id: "ID_NAME-000001", ...
When later a new requirement need the logical_id to be editable, auditable (was it changed, who changed it when) or transferable, having the logical_id as primary key will put you in trouble. Usually you cannot change your primary key. It's horribly disadvantage when you already have lots of data in your database and you have to migrate the data because of the new requirement.
With surrogate key solution, it'll be easy, you just need to add
id: "2FE6E772-CDD7-4ACD-9506-04670D57AA7F", logical_id: "ID_NAME-000001", valid: "F", ...
id: "0A33BF97-666A-494C-B37D-A3CE86D0A047", logical_id: "ID_NAME-000001", valid: "T", ...
MySQL doesn't support sequence (IMO autoincrement isn't comparable to sequence). It's different from Oracle/PostgreSQL's sequence. I guess that's the cause why it's difficult to port the solution from Oracle database to MySQL. PostgeSQL does.