agentskills.codes
PR

proc-reg-vs-proc-glm-sas

Use when choosing between PROC REG and PROC GLM in SAS for running a linear regression, particularly when your model includes categorical predictors or interaction terms.

Install

mkdir -p .claude/skills/proc-reg-vs-proc-glm-sas && curl -L -o skill.zip "https://agentskills.codes/api/skills/download/14939" && unzip -o skill.zip -d .claude/skills/proc-reg-vs-proc-glm-sas && rm skill.zip

Installs to .claude/skills/proc-reg-vs-proc-glm-sas

Activation

This is the description your AI agent reads to decide when to run this skill — the better it matches your request, the more reliably it fires.

Use when choosing between PROC REG and PROC GLM in SAS for running a linear regression, particularly when your model includes categorical predictors or interaction terms.
170 chars✓ has a “when” trigger

About this skill

Overview

Both PROC REG and PROC GLM fit linear regression models. The difference is how they handle categorical variables and what diagnostics they provide. Choosing the wrong one creates extra work or missing output.

Short version: PROC REG is more powerful for diagnostics, requires manual dummy coding. PROC GLM handles categorical variables natively, makes interaction terms easier.

When to Use

  • Use PROC REG when you need detailed regression diagnostics (residual plots, influence statistics, VIF) or when all your predictors are already numeric.
  • Use PROC GLM when you have unordered categorical variables and don't want to manually create dummy variables, or when you need interaction terms.

Core Pattern

PROC REG:

proc reg data=analytic;
  model BPXSY1 = age bmi gender widowed divorced separated never_married living_partner;
run;

Outputs regression coefficients by default. All variables must be numeric. Categorical variables need manual dummy coding before this step.

PROC GLM:

proc glm data=analytic;
  class DMDMARTL(ref='1') gender;
  model BPXSY1 = age bmi gender DMDMARTL / solution clparm;
run;

The CLASS statement tells SAS which variables are categorical. SAS creates the dummies internally. The ref= option sets the reference category. The /solution clparm options are required to see the regression coefficients and confidence intervals, which are NOT shown by default.

Step-by-Step Process

For PROC GLM with categorical variables:

  1. Identify which variables are unordered categorical.
  2. List them in the CLASS statement. Specify ref= for each if you want a specific reference group.
  3. Include them in the MODEL statement like any other variable.
  4. Add /solution clparm to the MODEL statement. Without this, you get ANOVA-style output only, no regression coefficients.
  5. Interpret output: each class level coefficient is the difference from the reference group, same as manually-coded dummies.

For PROC REG with categorical variables:

  1. Create k-1 binary dummy variables for each k-level categorical variable.
  2. Choose which category to exclude (the reference).
  3. Include all k-1 dummies in the MODEL statement.
  4. Coefficients appear automatically.

Judgment & Heuristics

If you need serious diagnostics, use PROC REG. It has built-in options for outlier detection, Cook's D, leverage, VIF for multicollinearity, and residual plots. PROC GLM's diagnostics are limited by comparison.

If you have categorical variables and don't need advanced diagnostics, PROC GLM is less error-prone. Manual dummy coding works, but forgetting a level or miscoding a reference group is a real risk. Let SAS handle it.

PROC GLM is better for interactions. Adding an interaction between a continuous and a categorical variable (or two categoricals) is cleaner with the CLASS statement and the | operator in the MODEL statement.

The /solution option is easy to forget in PROC GLM. Without it, you get F-tests and sums of squares but no parameter estimates. Not useful for most regression work.

Both produce the same estimates when coded correctly. This is a good sanity check. If you build the same model in both procedures and get different coefficients, something went wrong in your dummy coding.

Common Mistakes

  • Running PROC GLM without /solution clparm. You'll see the ANOVA table but no regression coefficients.
  • Not specifying ref= in the CLASS statement. SAS defaults to the last (alphabetically or numerically) category as reference. This may not be what you want.
  • Including a categorical variable in PROC REG without creating dummies. SAS will treat it as numeric, implying an ordering and equal spacing that doesn't exist.
  • Creating dummies in PROC REG but also listing the original variable. Don't include both the original categorical variable and its dummies in the same model.
  • Expecting the same diagnostic output from both procedures. PROC REG has more. Plan your analysis accordingly.

Quick Reference

FeaturePROC REGPROC GLM
Categorical variable handlingManual dummy coding requiredCLASS statement, automatic
Regression coefficients shown by defaultYesNo, requires /solution clparm
Setting reference categoryExclude that dummyref= option in CLASS statement
Interaction termsHarder to specifyEasier with `
Regression diagnosticsComprehensiveLimited
Best forDiagnostics-heavy models, all-numeric predictorsModels with categorical vars or interactions

Search skills

Search the agent skills registry