SOA Exam ASTAM Cheat Sheet

Table of Contents #

Exam Overview
Linear Models and Regression Analysis
- Multiple Linear Regression
- Model Evaluation Metrics
Generalized Linear Models (GLMs)
Time Series Analysis
Advanced Regression Techniques
Model Validation Techniques
- Cross-Validation
- Bootstrap Methods
Advanced Statistical Concepts
- Mixed Effects Models
- Survival Analysis
Model Selection Techniques
- Stepwise Selection
- Information Criteria Comparison
Study Strategies
Essential R Functions
Exam Tips

Exam Overview #

The Advanced Statistics for Actuarial Modeling (ASTAM) exam tests candidates on advanced statistical techniques used in actuarial work, with a focus on model selection, validation, and advanced regression techniques. The exam is 3 hours and 15 minutes long with a mix of multiple-choice and written-answer questions.

Key Focus Areas:

Advanced regression modeling techniques
Time series analysis and forecasting
Model validation and selection methodologies
Generalized linear models and their applications
Statistical learning methods for actuarial applications

Linear Models and Regression Analysis #

Multiple Linear Regression #

The fundamental equation for multiple linear regression forms the backbone of statistical modeling:

Model Equation:

y = Xβ + ε

Where:

y is the n×1 vector of responses (dependent variable)
X is the n×p design matrix (independent variables)
β is the p×1 vector of parameters (coefficients)
ε is the n×1 vector of errors (residuals)

Parameter Estimation (Ordinary Least Squares):

β̂ = (X'X)⁻¹X'y

Variance of Parameter Estimates:

Var(β̂) = σ²(X'X)⁻¹

Residual Standard Error:

σ̂² = RSS/(n-p)

Where RSS = Σ(yᵢ - ŷᵢ)² (Residual Sum of Squares)

Key Assumptions:

Linearity in parameters
Independence of errors
Homoscedasticity (constant variance)
Normality of errors
No perfect multicollinearity

Model Evaluation Metrics #

R-squared (Coefficient of Determination):

R² = 1 - RSS/TSS

Where TSS = Σ(yᵢ - ȳ)² (Total Sum of Squares)

Adjusted R-squared:

R²ₐdⱼ = 1 - (RSS/(n-p))/(TSS/(n-1))

Information Criteria for Model Comparison:

Akaike Information Criterion (AIC):

AIC = -2ln(L) + 2p

Bayesian Information Criterion (BIC):

BIC = -2ln(L) + p×ln(n)

Where L is the likelihood and p is the number of parameters.

Generalized Linear Models (GLMs) #

GLMs extend linear regression to handle non-normal response variables and non-linear relationships through link functions.

Model Components #

Three Essential Components:

Random Component: Y ~ Distribution from exponential family
- Normal, Binomial, Poisson, Gamma, etc.
Systematic Component: Linear predictor
```
η = Xβ
```
Link Function: Connects mean to linear predictor
```
g(μ) = η
```

Common Link Functions #

Logistic Regression (Binary/Binomial Response):

g(μ) = ln(μ/(1-μ)) = logit(μ)

Poisson Regression (Count Data):

g(μ) = ln(μ)

Gamma Regression (Continuous Positive Data):

g(μ) = 1/μ  (inverse link)
or
g(μ) = ln(μ)  (log link)

Deviance #

Deviance Formula:

D = 2[l(y;y) - l(μ̂;y)]

Where l(y;y) is the saturated log-likelihood and l(μ̂;y) is the fitted log-likelihood.

Scaled Deviance:

D* = D/φ

Where φ is the dispersion parameter.

Parameter Estimation #

Maximum Likelihood Estimation through Iteratively Reweighted Least Squares (IRLS):

β̂ₜ₊₁ = β̂ₜ + (X'WₜX)⁻¹X'Wₜzₜ

Where:

Wₜ is the weight matrix at iteration t
zₜ is the working response at iteration t
Process continues until convergence

Time Series Analysis #

Time series analysis deals with data collected sequentially over time, requiring special techniques to handle temporal dependencies.

Stationarity Tests #

Augmented Dickey-Fuller (ADF) Test:

ΔYₜ = α + βt + γYₜ₋₁ + δ₁ΔYₜ₋₁ + ... + δₚ₋₁ΔYₜ₋ₚ₊₁ + εₜ

Null Hypothesis: Series has a unit root (non-stationary) Alternative: Series is stationary

KPSS Test: Tests for stationarity (opposite of ADF)

ARIMA Models #

ARIMA(p,d,q) Model Structure:

φ(B)(1-B)ᵈYₜ = θ(B)εₜ

Where:

φ(B) is the AR polynomial of order p
θ(B) is the MA polynomial of order q
B is the backshift operator: BYₜ = Yₜ₋₁
d is the degree of differencing

Components:

AR(p): Autoregressive terms
I(d): Integration (differencing)
MA(q): Moving average terms

Model Identification:

Use ACF and PACF plots
Box-Jenkins methodology
Information criteria for order selection

Forecasting #

One-step-ahead Forecast:

Ŷₜ(1) = E(Yₜ₊₁|Yₜ, Yₜ₋₁, ...)

h-step-ahead Forecast:

Ŷₜ(h) = E(Yₜ₊ₕ|Yₜ, Yₜ₋₁, ...)

Forecast Error Variance:

Var[eₜ(h)] = σ²[1 + ψ₁² + ψ₂² + ... + ψₕ₋₁²]

Advanced Regression Techniques #

Principal Component Analysis (PCA) #

Eigenvalue Decomposition:

Σ = PΛP'

Where:

Σ is the covariance matrix
Λ is diagonal matrix of eigenvalues (λ₁ ≥ λ₂ ≥ … ≥ λₚ)
P is matrix of eigenvectors (principal components)

Principal Components:

Z = XP

Proportion of Variance Explained:

λⱼ / Σλᵢ

Ridge Regression #

Parameter Estimation:

β̂ᵣᵢdₘₑ = (X'X + λI)⁻¹X'y

Where:

λ ≥ 0 is the regularization parameter
Shrinks coefficients toward zero
Handles multicollinearity effectively

Cross-Validation for λ Selection: Choose λ that minimizes CV error.

Lasso Regression #

Objective Function:

minimize: RSS + λΣ|βⱼ|

Properties:

L1 penalty performs variable selection
Some coefficients become exactly zero
Produces sparse models

Elastic Net #

Objective Function:

minimize: RSS + λ[(1-α)Σβⱼ² + αΣ|βⱼ|]

Where:

α ∈ [0,1] controls the mix of ridge and lasso penalties
α = 0: Pure ridge regression
α = 1: Pure lasso regression
λ ≥ 0 controls overall regularization strength

Model Validation Techniques #

Cross-Validation #

K-fold Cross-Validation Error:

CVₖ = (1/k)Σᵢ₌₁ᵏ MSEᵢ

Leave-One-Out Cross-Validation (LOOCV):

LOOCV = (1/n)Σᵢ₌₁ⁿ (yᵢ - ŷᵢ⁽⁻ⁱ⁾)²

Advantages of CV:

Provides unbiased estimate of prediction error
Uses all data for both training and validation
Helps select optimal hyperparameters

Bootstrap Methods #

Bootstrap Estimate:

θ̂ᵦₒₒₜ = (1/B)Σᵦ₌₁ᴮ θ̂*ᵦ

Bootstrap Standard Error:

SEᵦₒₒₜ = √[(1/(B-1))Σᵦ₌₁ᴮ (θ̂*ᵦ - θ̂ᵦₒₒₜ)²]

Bootstrap Confidence Intervals:

Percentile method
Bias-corrected and accelerated (BCa)

Advanced Statistical Concepts #

Mixed Effects Models #

Linear Mixed Model:

y = Xβ + Zu + ε

Where:

β are fixed effects (population-level parameters)
u are random effects (individual-level deviations)
Z is the random effects design matrix
u ~ N(0, G) and ε ~ N(0, R)

Applications:

Longitudinal data analysis
Hierarchical/clustered data
Panel data models

Survival Analysis #

Survival Function:

S(t) = P(T > t) = exp(-∫₀ᵗ h(u)du)

Hazard Function:

h(t) = f(t)/S(t) = -d/dt ln(S(t))

Cox Proportional Hazards Model:

h(t|X) = h₀(t)exp(Xβ)

Where:

h₀(t) is the baseline hazard
No parametric assumptions about h₀(t)
Focus on relative risks

Model Selection Techniques #

Stepwise Selection #

Forward Selection:

Start with null model
Add variables based on F-statistic or p-value
Continue until no improvement

Backward Elimination:

Start with full model
Remove variables with highest p-value above threshold
Continue until all remaining variables are significant

Stepwise (Forward/Backward): Combination of both approaches with entry and removal criteria.

Information Criteria Comparison #

Model Selection Rule: Choose model that minimizes:

Akaike Information Criterion (AIC):

AIC = -2ln(L) + 2p

Bayesian Information Criterion (BIC):

BIC = -2ln(L) + p ln(n)

Hannan-Quinn Information Criterion (HQIC):

HQIC = -2ln(L) + 2p ln(ln(n))

Key Points:

BIC penalizes complexity more heavily than AIC
BIC is consistent (selects true model as n→∞)
AIC optimizes prediction accuracy

Study Strategies #

1. Understanding Theoretical Foundations #

Focus Areas:

Understand assumptions behind each model and their implications
Know when each model is appropriate for different data types
Understand relationships between different techniques (e.g., GLM as extension of linear regression)
Master the mathematical foundations without getting lost in proofs

Study Approach:

Create concept maps linking related techniques
Practice identifying appropriate methods for given scenarios
Understand the “why” behind each formula

2. Practical Application Skills #

Key Competencies:

Practice interpreting model outputs and parameter estimates
Learn to identify violations of model assumptions
Develop intuition for model selection and validation
Understand practical implications of statistical results

Exercises:

Work through case studies with real actuarial data
Practice diagnostic plotting and interpretation
Compare different modeling approaches on same dataset

3. Common Pitfalls to Avoid #

Statistical Errors:

Overlooking multicollinearity in regression models
Ignoring model assumptions (normality, homoscedasticity, independence)
Misinterpreting statistical significance vs. practical significance
Overfitting with too many parameters relative to sample size

Conceptual Mistakes:

Confusing correlation with causation
Inappropriate extrapolation beyond data range
Ignoring temporal dependencies in time series data
Misapplying techniques to inappropriate data types

Essential R Functions #

While you won’t be coding in the exam, understanding these R functions helps grasp the concepts and their implementation:

# Linear Models
lm(y ~ x1 + x2 + x3, data = df)
summary(model)
anova(model)
plot(model)  # Diagnostic plots

# Generalized Linear Models
glm(y ~ x, family = binomial(link = "logit"))
glm(y ~ x, family = poisson(link = "log"))
glm(y ~ x, family = Gamma(link = "inverse"))

# Model Selection
step(model, direction = "both")
AIC(model1, model2, model3)
BIC(model1, model2, model3)

# Time Series Analysis
ts(data, frequency = 12, start = c(2020, 1))
arima(ts_data, order = c(p, d, q))
auto.arima(ts_data)  # Automatic ARIMA
forecast(model, h = 12)

# Advanced Regression Techniques
prcomp(X, scale = TRUE)  # Principal Component Analysis
glmnet(X, y, alpha = 0)  # Ridge Regression
glmnet(X, y, alpha = 1)  # Lasso Regression
glmnet(X, y, alpha = 0.5)  # Elastic Net

# Cross-Validation
cv.glmnet(X, y, alpha = 1, nfolds = 10)
boot(data, statistic, R = 1000)  # Bootstrap

# Mixed Effects Models
lmer(y ~ x1 + x2 + (1|group), data = df)  # Random intercept
glmer(y ~ x + (1|group), family = binomial)  # Mixed GLM

# Survival Analysis
survfit(Surv(time, status) ~ group)
coxph(Surv(time, status) ~ x1 + x2)

Exam Tips #

1. Time Management Strategy #

Before the Exam:

Practice with timed mock exams
Identify your strongest and weakest topic areas
Develop a systematic approach to problem-solving

During the Exam:

Read all questions quickly first to gauge difficulty
Start with questions you’re most confident about
Allocate time proportionally based on point values
Reserve 15-20 minutes at the end for review

2. Calculation Strategy #

Mathematical Approach:

Write out formulas clearly before plugging in numbers
Show all intermediate steps for partial credit
Double-check units and scaling factors
Verify answers make practical sense

Organization Tips:

Use clear notation and label variables
Create organized workspace on scratch paper
Circle or box final answers
Keep calculations neat for easier checking

3. Conceptual Understanding #

Analytical Thinking:

Explain why you chose specific methods
Consider practical implications of statistical results
Reference key assumptions when relevant
Compare alternative approaches when appropriate

Communication:

Write clear, concise explanations
Use proper statistical terminology
Justify your reasoning process
Address limitations of your analysis

4. Common Exam Topics #

High-Priority Areas:

Linear regression diagnostics and remedies
GLM selection and interpretation
Time series model identification
Cross-validation and bootstrap methods
Information criteria for model selection

Practice Focus:

Work through past exam problems repeatedly
Master the most commonly tested formulas
Understand when to apply each technique
Practice explaining statistical concepts clearly

Final Preparation:

Review formula sheet thoroughly
Practice mental math and calculator efficiency
Prepare for both computational and conceptual questions
Stay calm and trust your preparation

Quick Reference Formulas #

Regression Basics:

β̂ = (X’X)⁻¹X’y
R² = 1 - RSS/TSS
AIC = -2ln(L) + 2p

GLM Essentials:

Link: g(μ) = η = Xβ
Deviance: D = 2[l(y;y) - l(μ̂;y)]

Time Series:

ARIMA(p,d,q): φ(B)(1-B)ᵈYₜ = θ(B)εₜ

Regularization:

Ridge: β̂ = (X’X + λI)⁻¹X’y
Lasso: min{RSS + λΣ|βⱼ|}

Validation:

CV: (1/k)Σ MSEᵢ
Bootstrap SE: √[(1/(B-1))Σ(θ̂*ᵦ - θ̂)²]

Remember: This cheat sheet provides formulas and concepts, but exam success requires deep understanding of when and how to apply these techniques in actuarial contexts. Focus on building intuition alongside memorizing formulas.