Wednesday, June 12, 2019

Initial Data Analysis (IDA)

Reference
Introduction
  • IDA aims to inspect and inject data prior to main data analysis stage.
  • Stages:
    1. Data quality check
    2. Data transformation
    3. Data randomization
    4. Data characteristics documentation

Data Quality Check
  • Assessment types:
    1. Frequency counts
    2. Descriptive statistics (mean, standard deviation, median)
    3. Normality (skewness, kurtosis, frequency histograms)
  • Types of Data issue:
    1. Duplicate record
    2. Inconsistent date and time stamps
    3. Outliers
    4. Missing values

Data Transformation
  • Assessment types:
    1. Square root transformation
    2. Log-transformation
    3. Inverse transformation
    4. Make categorical

Data Randomization
  • Randomize the data and prove that sample data agree with the original intentions
  • Methods:
    1. Generate random permutation of the data
    2. Select random sample of the data

Data Characteristics Documentation
  • Changes (modified/removed/manipulated) to the original data
  • Shape of the distribution of variables
  • Error rates/patterns
  • Criteria to detect abnormality