# Data Analysis and Interpretation by Wesleyan University

- COURSE 1: Data Management and Visualization
- COURSE 2: Data Analysis Tools
- COURSE 3: Regression Modeling in Practice
- COURSE 4: Machine Learning for Data Analysis
- COURSE 5: Data Analysis and Interpretation Capstone

# COURSE 1: Data Management and Visualization

# COURSE 2: Data Analysis Tools

**Central Limit Theorem**

As long as adequately large samples and an adequately large number of samples are used from a population, the distribution of the statistics will be normally distributed.

**Hypothesis Testing**

Definition: Assessing the evidence provided by the data, in favor of or against each hypothesis about the population.

Methods:

- ANOVA - Analysis of Variance
- X2 - Chi-Square of Independence

- Specify the null(\(h_0\)), and the alternate (\(h_a\)) hypothesis
- Choose a sample
- Assess the evidence
- Draw conclusions

**p value**

Often noted as α, will be compared with “significance level of a test”, usually taken for 0.05. If p-value < α (0.05), the data provides significant evidence against the null hypothesis (\(H_0\)), so we reject the null hypothesis and accept the alternate hypothesis (\(H_a\)).

p value is also known as “Type One Error Rate”, means the number of times out of 100 we would be wrong if we reject the null hypothesis.

**Bivariate Statistical tools**

- ANOVA - Analysis of Variance
- X2 - Chi-Square of Independence
- r - Correlation Coefficient

How to choose a statistical test?

- C->Q: if you have categorical explanatory and quantitative response, choose ANOVA
- C->C: if you have categorical explanatory and response, choose X2
- Q->Q: if you have quantitative explanatory and response, choose Pearson Correlation
- Q->C: if you have categorical explanatory and quantitative response, you need to categorize your explanatory variable with only two levels then use the Chi-Square of Independence as your inferential test.