Friday, July 27, 2012

Database: Unique Identifier or Identity

In most of the RDBMS, DBA or DB developer has always considering whether he/she should use Unique Identifier (GUID) or Identity (Auto-Increment, sequence etc).


To be frank, many experienced DBAs have already known the answer long long time ago. Perhaps, some of them might have already posted this topic somewhere else on the Internet.


So, what do experts recommend? IDENTITY, if possible.


Why?
1. IDENTITY column is always unique and sorted in a sequence that we want (ascending or descending)
2. With these unique characteristics, it saves a lot of resources when INSERTING data into the table
    i.   new data always goes to the bottom of the BTree/Heap
    ii.  database engine itself does not need to adjust the BTree when new data come in
    iii. it is much easier to partition the data (well, it depends on your data partitioning strategy)
    iv. if this column is a PK or clustered index, it will impact the storage requirement of your non-clustered indexes. (For MSSQL, Unique Identifier is 16 bytes while BIGINT is 8 bytes) As you know, each non-clustered index will contain your clustered index key.
3. You hardly perform UPDATE on IDENTITY column (and it is not a good practice to update IDENTITY column)


However, there are some challenges that we might face when using IDENTITY
1. Limit of BIGINT is 9,223,372,036,854,775,807
    - in BIG DATA era, it becomes possible to hit the limit
2. Data migration between databases is not straight forward
    - we have to put extra effort to ensure 100% data integrity