Course

Course Summary

Credit Type:

Course

ACE ID:

STAT-0046

Organization's ID:

613

Organization:

Statistics.com

Length:

4 weeks (60 hours)

Dates Offered:

Credit Recommendation & Competencies

Level	Credits (SH)	Subject
Graduate	3	statistics

Description

Objective:

The course objective is to teach how to use machine learning and statistical methods to identify clusters in multivariate data, i.e., groups of cases that have relatively high within-group similarity. Using those same methods, and additional ones, students will also learn how to identify cases that are relatively unique - anomalies (also called outliers). Students will first cover the building blocks - measuring distance between records and distance between clusters. Then students will learn how to use hierarchical clustering and k-means clustering algorithms, as well as normal mixture models to identify clusters (and, by extension, anomalies). Students will also learn some additional statistical methods for identifying anomalies.

Learning Outcomes:

normalize data appropriately and calculate distances between records
use different metrics to calculate distances between clusters
conduct hierarchical cluster analysis and k-means clustering to identify clusters in multivariate data
fit a normal mixture models for continuous variables Interpret/diagnose the output of different clustering procedures
identify the assignment of individual cases to clusters
use the clustering output to identify potential anomalies
use exploratory and model-based statistical methods to identify anomalies (termed "outliers" in statistics)