what will be benefit of surrogate key in data warehouse layer?

Hello Experts,

OLTP source tables are having surrogate keys (numeric values) and natural keys (alphanumeric values), then can I skip creating surrogate keys in target OLAP DB (Dimensional Model) for dimension tables.

I know that I will need surrogate keys for fact tables as unique key for fact table will be a large set and I will need a single columns with numeric values as primary key there.

I am joining multiple source tables for populating data into one dimension target then I am wondering to use unique id (numeric values) of driving table (this id is inherited from OLTP source) as primary key , provided that data granularity of resulted record is at driving tables' id level (resulted record is the record after main source driving table is joined with other source tables).

what will be benefit of surrogate key in data warehouse layer?

Thanks,

Rajneesh

rajneesh4u

posted on 30 Apr 20

Enjoy great content like this and a lot more !

Signup for a free account to write a post / comment / upvote posts. Its simple and takes less than 5 seconds




nVector02-May-20

With Modern cloud data warehouse platforms having natural keys are as good as surrogate keys. So if you are designing a system from scratch, stay away from generating surrogate keys. Use Natural keys where ever possible

rajneesh4u03-May-20

Hello Npack,

Thanks for your input, it helps !!!

May I know a bit more-

 1) Why in Modern cloud data warehouse platforms having natural keys are as good as surrogate keys?

2) Surrogate key will allow me to join tables using numeric keys (than alphanumeric natural keys) , hence better performance?

3) For Type-2 SCD , I must have surrogate key to uniquely identify historical records?

Thanks,

Rajneesh

nVector03-May-20

1) Legacy datawarehouse systems relied heavily on numeric keys, as they had better performance over varchar columns. Numeric keys takes lesser memory and therefore easier and faster to traverse through. All Modern Data warehouses today use a variety of data pruning mechanisms to quickly retrieve the data, They do have clever metadata layers which is widely used to prune the partitions. Using Numeric keys is a thing of the past, Its painful to generate and maintain these keys, Its of no business value to the end users and often confuses them

2) I think i answered this one too as part of the above reply

3) Instead of manufacturing a numeric key, you can always concat the natural key values. like accountnumber|eff_dt