Sunday, March 29, 2020

TOGAF v9.2

The TOGAF® Standard, Version 9.2 Overview

The TOGAF® Standard, a standard of The Open Group, is a proven Enterprise Architecture methodology and framework used by the world's leading organizations to improve business efficiency. It is the most prominent and reliable Enterprise Architecture standard, ensuring consistent standards, methods, and communication among Enterprise Architecture professionals. Those fluent in the TOGAF approach enjoy greater industry credibility, job effectiveness, and career opportunities. The TOGAF approach helps practitioners avoid being locked into proprietary methods, utilize resources more efficiently and effectively, and realize a greater return on investment.

Structure of TOGAF Standard

  • TOGAF Capability Framework
  • TOGAF ADM and Content Framework
  • TOGAF Enterprise Continuum & Tools
figure 1: Structure of the TOGAF Standard

Enterprise Architecture - References



Figure 1. The Relationship between TOGAF, IT4IT, ArchiMate, and ITIL.





Wednesday, June 12, 2019

Initial Data Analysis (IDA)

Reference
Introduction
  • IDA aims to inspect and inject data prior to main data analysis stage.
  • Stages:
    1. Data quality check
    2. Data transformation
    3. Data randomization
    4. Data characteristics documentation

Data Quality Check
  • Assessment types:
    1. Frequency counts
    2. Descriptive statistics (mean, standard deviation, median)
    3. Normality (skewness, kurtosis, frequency histograms)
  • Types of Data issue:
    1. Duplicate record
    2. Inconsistent date and time stamps
    3. Outliers
    4. Missing values

Data Transformation
  • Assessment types:
    1. Square root transformation
    2. Log-transformation
    3. Inverse transformation
    4. Make categorical

Data Randomization
  • Randomize the data and prove that sample data agree with the original intentions
  • Methods:
    1. Generate random permutation of the data
    2. Select random sample of the data

Data Characteristics Documentation
  • Changes (modified/removed/manipulated) to the original data
  • Shape of the distribution of variables
  • Error rates/patterns
  • Criteria to detect abnormality

Wednesday, May 22, 2019

Python Dictionary

Links: Journey to Data Scientist


Characteristics
  • Unordered (followed the sequence the keys were added)
  • Mutable
  • Can be nested
  • Access via key

#create an empty dictionary
d = {}
print(type(d))

Output: <class 'dict'>


d = {'key1':'value1',2:'value2',0.01:201}
print(d)
print(d['key1'])
print(d[2])
print(d[0.01])

Output:
{'key1': 'value1', 2: 'value2', 0.01: 201}
value1
value2
201


d[2] = 'update value'
print(d[2])

Output: update value


for k, v in d.items():
  print(k, ":", v)

Output:
key1 : value1
2 : update value
0.01 : 201


if 'key1' in d:
  print(d['key1'])

Output: value1


d['add'] = 1990
print(d)

Output: {'key1': 'value1', 2: 'update value', 0.01: 201, 'add': 1990}


d.pop('add')
print(d)

Output: {'key1': 'value1', 2: 'update value', 0.01: 201}


print(d)
d.clear()
print(d)

Output:
{'key1': 'value1', 2: 'value2', 0.01: 201}
{}


Tuesday, May 21, 2019

Python Sets

Links: Journey to Data Scientist


Characteristics
  • Unordered
  • Hashable
  • Unique
  • Immutable
  • Support mathematical operations like union, intersection, difference, and symmetric difference

1. Unordered (in fact, output is ordered by hashed value)
Example #1
s = {"abc",123,9.99,"DEF",3.142,100}
print(s)
print(type(s))

Output:
{3.142, 100, 9.99, 'abc', 'DDf', 123}
<class 'set'>

Example #2
s = s = {"abc","123","9.99","DDf","3.142","100"}
print(s)
print(type(s))

Output:
{'100', 'abc', 'DDf', '3.142', '9.99', '123'}
<class 'set'>

Example #1 and #2 above show different ordering of output compare to their original definition. This is because Python will hash the elements first. Different data type will use different algorithm.


2. Common Operations
s = set() #empty set
print(s)
s = {"abc",123,9.99,"DDf",3.142,100} #100 will be the first one to be pop-ed
print(s)
print("Length of s: ",len(s)) #number of element in set
s.add("one item") #add one element
print("Added 'one item':",s)
s.update(("i1","i2","i3")) #add multiple items
print("Updated 'i1, i2, i3':",s)
s.remove(3.142)
print("Removed 3.142: ",s)
s.discard(99999) #this will not trigger error
print("Discarded 99999 (no effect): ",s)
s.discard("DDf")
print("Discarded DDf: ",s)
s.pop()
print("Poped last item (in original definition): ",s)
s.clear() #make it an empty set
print(s)

s.remove(99999) #this will trigger error because 99999 does not exist in the set
del s #this will release 's' from memory, this variable will become undefined

Output:
set()
{3.142, 100, 9.99, 'abc', 'DDf', 123}
Length of s: 6
Added 'one item': {3.142, 100, 'one item', 9.99, 'abc', 'DDf', 123}
Updated 'i1, i2, i3': {3.142, 100, 'one item', 'i2', 9.99, 'abc', 'DDf', 'i1', 123, 'i3'}
Removed 3.142: {100, 'one item', 'i2', 9.99, 'abc', 'DDf', 'i1', 123, 'i3'}
Discarded 99999 (no effect): {100, 'one item', 'i2', 9.99, 'abc', 'DDf', 'i1', 123, 'i3'}
Discarded DDf: {100, 'one item', 'i2', 9.99, 'abc', 'i1', 123, 'i3'}
Poped last item (in original definition): {'one item', 'i2', 9.99, 'abc', 'i1', 123, 'i3'}
set()


2. Mathematical Operations
s1 = {1,3,4,8,9}
s2 = {5,7,6,8,9}

print("Difference: ",s1.difference(s2))
print("Intersection: ",s1.intersection(s2))
print("IsDisjoint: ",s1.isdisjoint(s2))
print("IsSubSet: ",s1.issubset(s2))
print("IsSuperSet: ",s1.issuperset(s2))
print("Symmetric Difference: ",s1.symmetric_difference(s2))
print("Union: ",s1.union(s2))

Output:
Difference: {1, 3, 4}
Intersection: {8, 9}
IsDisjoint: False
IsSubSet: False
IsSuperSet: False
Symmetric Difference: {1, 3, 4, 5, 6, 7}
Union: {1, 3, 4, 5, 6, 7, 8, 9}

s1 = {1,3,4,8,9}
s2 = {5,7,6,8,9}

s1.difference_update(s2)
print("After Difference Update: ",s1)

Output: After Difference Update: {1, 3, 4}

s1 = {1,3,4,8,9}
s2 = {5,7,6,8,9}

s1.intersection_update(s2)
print("After Intersection Update: ",s1)

Output: After Intersection Update: {8, 9}

s1 = {1,3,4,8,9}
s2 = {5,7,6,8,9}

s1.symmetric_difference_update(s2)
print("After Symmetric Difference Update: ",s1)

Output: After Symmetric Difference Update: {1, 3, 4, 5, 6, 7}

Wednesday, May 15, 2019

Python Tuples

Links: Journey to Data Scientist


Characteristics
  • Ordered
  • Can contain arbitrary objects
  • Accesible via index
  • Can be nested
  • Immutable

It shares most of the characteristics as Lists except it is immutable.
Therefore, operations on Tuples are faster than Lists (when number of elements is getting larger)

Immutable
t = ("apple","moon","3",57,9.99,0x3e)
print(t)
print(t[1])
print(t[3:6])
print(t[::4])

Output:
('apple', 'moon', '3', 57, 9.99, 62)
moon
(57, 9.99, 62)
('apple', 9.99)

If we try to modify the value:
t[0] = "orange"
Output:
--------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-9-9456d7af7fd3> in <module>
----> 1 t[0] = "orange"

TypeError: 'tuple' object does not support item assignment

Unpacking Tuples
t = ("apple","moon","3",57,9.99,0x3e)
(u1,u2,u3,u4,u5,u6) = t
print(u1,u2,u3,u4,u5,u6)

Output: apple moon 3 57 9.99 62




Tuesday, May 14, 2019

Journey to Data Scientist

Python

  • Basics
    • List | Tuple| Set| Dictionary
    • Lambda
    • Array
    • Object Oriented Programming (OOP)
    • Iterator
    • JSon
    • Regular Expression
    • File Operations
    • Database Operations
  • Numpy
    • Array
    • Universal Functions
    • Broadcasting
  • Pandas
    • Series
    • DataFrame
    • Pivot Table
  • Visualization
    • matplotlib.pyplot
    • seaborn
    • bokeh

Statistics

  • Initial Data Analysis (IDA)
  • Exploratory Data Analysis (EDA)
  • Confirmatory Data Analysis (CDA)
  • Probabilistic Logic and Statistical Inference
  • Parameter estimation by optimization
  • Dimension Reduction

Machine Learning

  • Supervised
  • Unsupervised
  • Reinforced

References

Python Lists

Links: Journey to Data Scientist


Characteristics
  • Ordered
  • Can contain arbitrary objects
  • Access via index
  • Can be nested
  • Mutable

1. Ordered
l = [21,5,3,18,34,67,9,12,88,73,65,13]
print(l)

Output: [21, 5, 3, 18, 34, 67, 9, 12, 88, 73, 65, 13]

2. Can contain arbitrary objects
l = [21,0x1e2f,'this is a string',3.142,8.176J,len('99999')]
print(l)

Output: [21, 7727, 'this is a string', 3.142, 8.176j, 5]

3. Access via index
l = [21,5,3,18,34,67,9,12,88,73,65,13]
print(l[0],l[-1])

Output: 21 13

4. Can be nested
l = [21,5,[3,18,34,[67,9,12],88,73],65,13]
print(l)
print(l[2][2])
print(l[2][3][2])

Output:
[21, 5, [3, 18, 34, [67, 9, 12], 88, 73], 65, 13]
34
12

5. Mutable
l = [21,5,3]
print(l)
del l[0]
print("Deleted l[0]:",l[0])
print(l)
l[1] = 99
print("Updated l[1] to 99:",l)
l += ["new"]
print("Added 'new':",l)
l += "hello"
print(l)
l.append("append")
print("Appended 'append':",l)
print(l + ["another","list"])
l.extend("extend")
print("Extended 'extend':",l)
l.insert(2,"insert")
print("Inserted 'insert':",l)
l.remove("l")
print("Removed first occurrence of 'l':",l)
l.pop()
print("Poped last item:",l)
l.pop(-4)
print("Poped last 4th item:",l)
print("Location of first 'e':",l.index("e"))
print("Number of 'e':",l.count("e"))
l.clear()
print("Empty the list:",l)

Output:
[21, 5, 3]
Deleted l[0]: 5
[5, 3]
Updated l[1] to 99: [5, 99]
Added 'new': [5, 99, 'new']
[5, 99, 'new', 'h', 'e', 'l', 'l', 'o']
Appended 'append': [5, 99, 'new', 'h', 'e', 'l', 'l', 'o', 'append']
[5, 99, 'new', 'h', 'e', 'l', 'l', 'o', 'append', 'another', 'list']
Extended 'extend': [5, 99, 'new', 'h', 'e', 'l', 'l', 'o', 'append', 'e', 'x', 't', 'e', 'n', 'd']
Inserted 'insert': [5, 99, 'insert', 'new', 'h', 'e', 'l', 'l', 'o', 'append', 'e', 'x', 't', 'e', 'n', 'd']
Removed first occurrence of 'l': [5, 99, 'insert', 'new', 'h', 'e', 'l', 'o', 'append', 'e', 'x', 't', 'e', 'n', 'd']
Poped last item: [5, 99, 'insert', 'new', 'h', 'e', 'l', 'o', 'append', 'e', 'x', 't', 'e', 'n']
Poped last 4th item: [5, 99, 'insert', 'new', 'h', 'e', 'l', 'o', 'append', 'e', 't', 'e', 'n']
Location of first 'e': 5
Number of 'e': 3
Empty the list: []

Slicing
l = [21,5,3,18,34,67,9,12,88,73,65,13]
print(l[3:7])

Output: [18, 34, 67, 9]

List Comprehension
l = [x for x in range(10)]
print(l)

Output: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Note:
List comprehension comes with cost.
  • %timeit l = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    • 63.7 ns ± 1.29 ns per loop
  • %timeit l = [x for x in range(10)]
    • 536 ns ± 11.1 ns per loop
  • %timeit l = list(map(lambda x: x, range(10)))
    • 959 ns ± 12.1 ns per loop