Introduction
- IDA aims to inspect and inject data prior to main data analysis stage.
- Stages:
- Data quality check
- Data transformation
- Data randomization
- Data characteristics documentation
Data Quality Check
- Assessment types:
- Frequency counts
- Descriptive statistics (mean, standard deviation, median)
- Normality (skewness, kurtosis, frequency histograms)
- Types of Data issue:
- Duplicate record
- Inconsistent date and time stamps
- Outliers
- Missing values
Data Transformation
- Assessment types:
- Square root transformation
- Log-transformation
- Inverse transformation
- Make categorical
Data Randomization
- Randomize the data and prove that sample data agree with the original intentions
- Methods:
- Generate random permutation of the data
- Select random sample of the data
Data Characteristics Documentation
- Changes (modified/removed/manipulated) to the original data
- Shape of the distribution of variables
- Error rates/patterns
- Criteria to detect abnormality
No comments:
Post a Comment