Issues with Slowly Changing Dimension Table with Primary Key

Issues with Slowly Changing Dimension Table with Primary Key - ssis

I have a package designed by another developer who used to work for the company. The package takes data from the source and inserts it into the destination. The slowly changing dimension task has 4 columns, set as historic attributes. Meaning it will insert a new row when any of the value changes. The business key is called PropertyID.
In destionation table, PropertyID is the primary key. When the package runs, we get primary key violation error. Which is understandable, because the destination table cannot insert a duplicate value when there is a change in the historic attribute. It is may be not the best design.
I want to correct this but i am not sure of the right approcah. I tried to add a new INT IDENTITY column (to use as a business key in SCD wizard) to the destination table and make the current PropertyID column as not primary key. But the INT IDENTITY column does not show up in the SCD wizard.
If someone can show me right approach to it, I would be much grateful.
Thanks.

In a slowly changing dimension, the destination table will have two types of keys, the surrogate key which will tie out to the fact table, and the business key, which identifies the record from the source.
You do not want the business key as the primary key on the destination in a slowly changing dimension. That is the point of the SCD, you will have multiple rows per business key since you are tracking changes. If you are not wanting to do this, and your table is all type one changes (overwrites with current value), then the SCD transform is not what you want.
See this link ... https://en.wikipedia.org/wiki/Surrogate_key

It looks like your are trying to design a Type 2 SCD as the changed records are getting inserted. In this case there should be a date field to track when a particular record was changed as well as to identify the current record. The primary key in the destination table should also be in property_id and a date field. You can refer the below link to check how is type 2 SCD Designed.
http://datawarehouse4u.info/SCD-Slowly-Changing-Dimensions.html

Related

Auto-increment a primary key in MySql

During the creation of tables using mysql on phpmyadmin, I always find an issue when it comes to primary keys and their auto-increments. When I insert lines into my table. The auto_increment works perfectly adding a value of 1 to each primary key on each new line. But when I delete a line for example a line where the primary key is 'id = 4' and I add a new line to the table. The primary key in the new line gets a value of 'id = 5' instead of 'id = 4'. It acts like the old line was never deleted.
Here is an example of the SQL statement:
CREATE TABLE employe(
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(30) NOT NULL
)
ENGINE = INNODB;
How do I find a solution to this problem ?
Thank you.

I'm pretty sure this is by design. If you had IDs up to 6 in your table and you deleted ID 2, would you want the next input to be an ID of 2? That doesn't seem to follow the ACID properties. Also, if there was a dependence on that data, for example, if it was user data, and the ID determined user IDs, it would invalidate pre-existing information, since if user X was deleted and the same ID was assigned to user Y, that could cause integrity issues in dependent systems.
Also, imagine a table with 50 billion rows. Should the table run an O(n) search for the smallest missing ID every time you're trying to insert a new record? I can see that getting out of hand really quickly.
Some links you might like to read:
Principles of Transaction-Oriented Database Recovery (1983)
How can we re-use the deleted id from any MySQL-DB table?

Why do you care?
Primary keys are internal row identifiers that are not supposed to be sexy or good looking. As long as they are able identify each row uniquely, they serve their purpose.
Now, if you care about its value, then you probably want to expose the primary key value somewhere, and that's a big red flag. If you need an external, visible identifier, you can create a secondary column with any formatting sequence and values you want.
As a side note, the term AUTO_INCREMENT is a bit misleading. It doesn't really mean they increase one by one all the time. It just mean it will try to produce sequential numbers, as long as it is possible. In multi-threaded apps that's usually not possible since batches or numbers are reserved per thread so the row insertion sequence may end actually not following the natural numbering. Row deletions have a similar effect, as well as INSERT with roll backs.

Primary keys are meant to be used for joining tables together and
indexing, they are not meant to be used for human usage. Reordering
primary key columns could orphan data and wreck havoc to your queries.
Tips: Add another column to your table and reorder that column to your will if needed (show that column to your user instead of the primary key).

How to track changes to a boolean column in a MySQL database?

My application serves customers which are online stores. One of the tables in my DB is "Product" and it has a column "In_Stock". This is a boolean (bit(1)) column. My customers send data feeds of their product catalog and each customer has their own version of this table. I would like to track changes to this In Stock column, something to the effect of...
11/13/2016 true
12/26/2016 false
01/07/2017 true
Just so that when I do some auditing, I can see for a given time period what was the state of a given product.
How best can I do this?
It seems overkill to create a separate history table and have it updated by a trigger just for one boolean column. Would a history column suffice? I can save the data there in some kind of JSON string.

Sorry, any workable solution will require a second table.
One such solution is Version Normal Form (vnf) which is a special case of 2nf. Consider your table containing the boolean field (assuming it is properly normalized to at least 3nf). Now you want to track the changes made to the boolean field. One way is to turn the rows into versions by adding an EffectiveDate column then, instead of updating the row, write a new row with the current date in the date field (or updating if the boolean field is unchanged).
This allows the tracking of the field, there being a new version for every time the field is changed. But there are severe disadvantages, not least of which is the fact that a row is no longer an entity, but a version of an entity. This makes is impossible to use a foreign key to this table as those want to refer to an entity.
But look carefully at the design. Before the change, you had a good, normalized table with no tracking of changes. After adding the EffectiveDate column, there has been a subtle change. All the fields except the boolean field are, as before, dependent only on the PK. The boolean field is dependent not only on the PK but the new date field as well. It is no longer is 2nf.
Normalizing the table requires moving the boolean field and the date field to a new table:
create table NewTable(
EntityID int not null references OriginalTable( ID ),
EffDate date not null,
TrackedCol boolean,
constraint PK_NewTable primary key( EntityID, EffDate )
);
The first version is inserted when a new row is inserted into the original table. From then on, another version is added only when an update to the original table changes the value of the boolean field.
Here is a previous answer that includes the query to get the current and any past values of the versioned data. I've discussed this design many times here.
Also, there is a way to structure the design so the application code doesn't need to be changed. That is, the redesign will be completely transparent to existing code. The answer linked above contains another link to more documentation to show how that is done.

I would do trigger thing. But don't replicate whole column - take unique column id, log timestamp and boolean value.
Sometimes having good logs is priceless :)

I've written an audit trail module for this purpose, it basically duplicates the table, add some information to each row and keep the original data table untouched except for triggers.

Create MySQL table without Primary Key

I have a table that does not require a primary key. It consists of 4 columns which are email,date,time,message. Each time a user logs in, logs out, or does any particular action that is important, I log the email, date, time and action (message). Currently the table is setup with email as the Primary Key but I am unable to insert more than one record with the same email. I suppose I could use time as the PK but there is the possibility that two actions fall on the same time. How can I just use the table without a PK? I have tried turning it off for the email column but it does not allow me to.

Yes as you have defined email field as your primary, it can hold unique data only and no duplication allowed.
So you have two options:
1: Remove email field as a primary key
2: Add new integer field as a Primary key with auto increment (I would prefer this one)

You could use a natural primary key that would be a combination of Email + Date + Time + Action. That combination would be unique. It is impossible for the same user to do 2 different actions at the same time. That will help you to keep integrity of your information.
Hope this helps you.

To make a decision on a table' primary key one may start with considering these points (applicable to innodb):
How the data is going to be accessed after it is written (if you don't query it, why store it?). If you care about read performance you should query your data by the primary key, since for innodb primary key is the only possible clustered index.
The data is stored ordered by the primary key, so if you care about write performance, you should write data ideally ordered by your primary key, which always happens automatically if you have an auto_increment. Also table for which you don't explicitly specify a primary key are going to have a hidden auto_increment field which you won't be able to access, i.e. you get less for the same cost.

Do I need a primary key for every table in MS Access?

I am new to MSAccess so I'm not sure about this; do I have to have a primary key for every single table in my database? I have one table which looks something like this:
(http://i108.photobucket.com/albums/n32/lurker3345/ACCESSHELP.png?t=1382688844)
In this case, every field/column has a repeating term. I have tried assigning the primary key to every field but it returns with an error saying that there is a repeated field.
How do I go about this?

Strictly speaking, Yes, every row in a relational database should have a Primary Key (a unique identifier). If doing quick-and-dirty work, you may be able to get away without one.
Internal Tracking ID
Some database generate a primary key under-the-covers if you do not assign one explicitly. Every database needs some way to internally track each row.
Natural Key
A natural key is an existing field with meaningful data that happens to identify each row uniquely. For example, if you were tracking people assigned to teams, you might have an "employee_id" column on the "person" table.
Surrogate Key
A surrogate key is an extra column you add to a table, just to assign an arbitrary value as the unique identifier. You might assign a serial number (1, 2, 3, …), or a UUID if your database (such as Postgres) supports that data type. Assigning a serial number or UUID is so common that nearly every database engine provides a built-in facility to help you automatically create such a value and assign to new rows.
My Advice
In my experience, any serious long-term project should always use a surrogate key because every natural key I've ever been tempted to use eventually changes. People change their names (get married, etc.). Employee IDs change when company gets acquired by another.
If, on the other hand, you are doing a quick-and-dirty job, such as analyzing a single batch of data to produce a chart once and never again, and your data happens to have a natural key then use it. Beware: One-time jobs often have a way of becoming recurring jobs.
Further advice… When importing data from a source outside your control, assign your own identifier even if the import contains a candidate key.
Composite Key
Some database engines offer a composite key feature, also called compound key, where two or more columns in the table are combined to create a single value which once combined should prove unique. For example, in a "person" table, "first_name" and "last_name", and "phone_number" fields might be unique when considered together. Unless two people married and sharing the same home phone number while also happening to each be named "Alex" with a shared last name! Because of such collisions as well as the tendency for meaningful data to change and also the overhead of calculating such combined values, it is advisable to stick with simple (single-column) keys unless you have a special situation.

If the data doesn't naturally have a unique field to use as the primary key, add an auto-generated integer column called "Id" or similar.
Read the "how to organize my data" section of this page:
http://www.htmlgoodies.com/primers/database/article.php/3478051
This page shows you how to create one (under "add an autonumber primary key"):
http://office.microsoft.com/en-gb/access-help/create-or-remove-a-primary-key-HA010014099.aspx

In you use a DataAdapter and a Currency Manager, your tables must have a primary key in order to push updates, additions and deletions back to the database. Otherwise, they will not register and you will receive an error.
I lost one week figuring that one out until I added this to the Try-Catch-End Try block: MsgBox(er.ToString) which mentioned "key". From there, I figured it out.
(NB : Having a primary key was not a requisite in VB6)

Not having a primary key usually means your data is poorly structured. However, it looks like you're dealing with summary/aggregate data there, so it's probably doesn't matter.

Deal with table that's composite primary key I wanted to include a nullable column

Assume a table that may look like this:
userId INT (foreign key to a users table)
profileId INT (foreign key to a profiles table)
value INT
Say that in this table preferences for users are saved. The preference should be loaded according to the current user and the profile that the current user has selected. That means that the combination of userId and profileId is unique and can be used as a composite primary key.
But then I want to add the ability to also save a default value that should be used if no value for a specific profileId is save in the database. My first idea would be to set the profileId column to nullable and say that the row that has null as profileId contains the default value. But then I can't use a composite primary key that involves this table, because nullable columns can't be part of a primary key.
So what's the "best" way to work around this? Just drop the primary key completely and go without primary key? Generate an identity column as primary key that I never need? Create a dummy profile to link to in the profile table? Create a separate table for default values (which is the only option that guarantees that no userId has multiple default values??)?
Update: I thought about Dmitry's answer but after all it has the drawback that I can't even create a unique constraint on the two columns userId and profileId (MySQL will allow duplicate values if profileId is null and DB2 will refuse to even create a unique constraint on a nullable column). So with Dmitry's solution I will have to live without this consistency check of the DB. Is that acceptable? Or is that not acceptable (after all consistency checks are a major feature of relational DBs). What is your reasoning?

Create ID autoincrement field for your primary key.
AND
Create unique index for (userId, profileId) pair. If necessary create dummy profile instead of null.

Dmitry's answer is a good one, but since your case involves what is essentially an intersection table, there is another good way to solve this. For your situation I also like the idea of creating a default user profile that you can use in your code to establish default settings. This is good because it keeps your data model clean without introducing extra candidate keys. You would need to be clear in this dummy/default profile that this is what it is. You can give it a clear name like "Default User" and make sure that nobody but the administrator has access to the user credentials.
One other advantage of this solution is that you can sign on as the default user and use your system's GUI to modify the defaults rather than having to fiddle with the data through DB access tools. Depending on the policies in your shop, direct access to the data tables by programmers may be hard or impossible. Using the tested/approved GUIs for modifying defaults removes a lot of red tape and prevents some kinds of accidental damage to the data.
Bottom Line: Primary keys are important. In a transactional system every table should have a at least one unique index one of which should be the primary key. You can always enforce this by adding a surrogate (auto increment) key to every table. Even if you do, you still generally want a natural unique index whenever possible. This is how you will generally find what you're looking for in a table.
Creating a Default User entry in your user table isn't a cheat or a hack, it's using your table structure the way it's meant to be used and it allows you to put a usable unique contraint on the combination of user ID and profile ID, regardless of whether you invent an additional, arbitrary unique constraint with a surrogate key.

This is the normal behaviour of UNIQUE constrain on a NULL column. It allows one row of data with NULL values. However, that is not the behaviour we want for this column. We want the column to accept unique values and also accept multiple NULL values.
This can be achieved using a computed column and adding a contraint to the computed column instead default null value.
Refer below article will help you more in this matter:
UNIQUE Column with multiple NULL values

I always always always use a primary auto_increment key on a table, even if its redundant; it just gives me a fantastically simple way to identify a record I want to access later or refer to elsewhere. I know it doesn't directly answer your question, but it does make the primary key situation simpler.
create table UserProfile ( int UserProfileID auto_increment primary key etc.,
UserID int not null, ProfileID int );
Then create a secondary index UserProfileIDX(UserID, ProfileID) that's unique, but not the primary key.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008