What is Surrogate key ? Where we use it
explain with examples???
Surrogate key is a substitution for the natural primary key.
It is just a unique identifier or number for each row that can be used
for the primary key to the table. The only requirement for a surrogate primary
key is that it is unique for each row in the table.
Data warehouses typically use a surrogate, (also known as artificial or
identity key), key for the dimension tables primary keys. They can use Infa
sequence generator, or Oracle sequence, or SQL Server Identity values for the
surrogate key.
It is useful because the natural primary key (i.e. Customer Number in
Customer table) can change and this makes updates more difficult.
Some tables have columns such as AIRPORT_NAME or CITY_NAME which are stated
as the primary keys (according to the business users) but ,not only can these
change, indexing on a numerical value is probably better and you could consider
creating a surrogate key called, say, AIRPORT_ID. This would be internal to the
system and as far as the client is concerned you may display only the
AIRPORT_NAME.
Another benefit you can get from surrogate keys (SID) is :
Tracking the SCD - Slowly Changing Dimension.
Example:
On the 1st of January
2002 , Employee 'E1' belongs to Business Unit 'BU1' (that's what
would be in your Employee Dimension). This employee has a turnover allocated to
him on the Business Unit 'BU1' But on the 2nd of June the Employee 'E1' is
muted from Business Unit 'BU1' to Business Unit 'BU2.' All the new turnover
have to belong to the new Business Unit 'BU2' but the old one should Belong to
the Business Unit 'BU1.'
If you used the natural business key 'E1' for your employee within your
datawarehouse everything would be allocated to Business Unit 'BU2' even what
actually belongs to 'BU1.'
If you use surrogate keys, you could create on the 2nd of June a new
record for the Employee 'E1' in your Employee Dimension with a new surrogate
key.
This way, in your fact table, you have your old data (before 2nd of June)
with the SID of the Employee 'E1' + 'BU1.' All new data (after 2nd of June)
would take the SID of the employee 'E1' + 'BU2.'
You could consider Slowly Changing Dimension as an enlargement of your
natural key: natural key of the Employee was Employee Code 'E1' but for you it
becomes
Employee Code + Business Unit - 'E1' + 'BU1' or 'E1' + 'BU2.' But the
difference with the natural key enlargement process, is that you might not have
all part of your new key within your fact table, so you might not be able to do
the join on the new enlarge key -> so you need another id.
No comments:
Post a Comment