Course 1: A Crash Course in Data Science

What is statistics good for?

Statistics is about exploratory data analysis, inference, prediction, experimental design.

Inference

Inference is the process of drawing conclusion of populations from a sample.

Two kinds of structures of data science project

Question -> Data -> EDA(exploratory data analysis) -> Formal modeling -> Interpretation -> Communication

Data -> EDA(exploratory data analysis) -> Question -> More Data -> EDA -> Formal modeling -> Interpretation -> Communication

Course 2: Build a Data Science Team

Roles in a Data Science Team

Usually there will be 3 roles: Data Engineer, Data Scientist, Data Manager

Embedded vs Dedicated Data Science Team

Embedded data scientist sits with a diverse team and has deeper knowledge of substantive problems. Dedicated team has better communication, support and empowerment.

Empowerment is a little tricky, sounds like political things, why? The instructor told us data won’t always tell people expedient result, for example, a data scientist sits with marketing team, this team proposed a marketing method which may promote the sale, but after data analysis, modeling, interpretation, the data scientist found the new method doesn’t work so well as expected. For the proposer, the marketing team, it’s a little bit frustrating to believe the result, they may just ignore the data scientist’s conclusion and stick with their gut feeling. So empowerment is necessary to weight Data Science team’s words and prevent them from being ignored.

There are still 3 courses left, they are Managing Data Analysis, Data Science in Real Life and Executive Data Science Capstone. But this course is “executive” and for team leader as a whole, so I plan to put it off. Next I plan to move on the course Data Science also by Johns Hopkins University, which is more practical for me.