If you're already comfortable with correlation and regression, you might be wondering:
"Can I model how multiple variables influence each other and the final outcome?"
That’s exactly what Path Analysis allows you to do.
In this post, we’ll walk through how to perform Path Analysis in R—using an example from plant research (e.g., how sunlight, water, and fertilizer impact plant height directly or indirectly).
🌱 What is Path Analysis?
Path Analysis is an extension of multiple regression. It lets you model:
- Direct effects (e.g., sunlight → plant height)
- Indirect effects (e.g., sunlight → water absorption → plant height)
- Causal chains among multiple variables
It’s typically visualized with arrows showing cause-and-effect relationships.
🛠 Step 1: Install and Load Required Packages
We’ll use the lavaan package for path modeling.
# Install if not already installedinstall.packages("lavaan")# Load the librarylibrary(lavaan)
🌾 Step 2: Create Your Dataset
Let’s use an example dataset:
plant_data <- data.frame(sunlight = c(5, 6, 6, 7, 8, 9, 10, 11),water = c(300, 350, 400, 450, 500, 550, 600, 650),fertilizer = c(2, 3, 3, 4, 5, 5, 6, 7),height = c(45, 50, 55, 60, 65, 70, 75, 80))
Let’s say you want to model that:
- Sunlight, water, and fertilizer directly affect plant height.
- Sunlight also affects water (e.g., more sun → more transpiration → more water needed).
🧠 Step 3: Define the Path Model
Use lavaan’s model syntax:
model <- '# Direct effectsheight ~ sunlight + water + fertilizer# Indirect effect: sunlight affects waterwater ~ sunlight'
📈 Step 4: Fit the Model
Now fit the path model using sem():
fit <- sem(model, data = plant_data)
📊 Step 5: Summarize the Results
Check parameter estimates and model fit:
summary(fit, standardized = TRUE, fit.measures = TRUE)
Key outputs to look for:
- Standardized estimates (effect sizes)
- p-values (significance of paths)
- Model fit indices (like RMSEA, CFI)
🧭 Step 6: Visualize the Path Diagram (Optional but Awesome)
Install and use semPlot for a diagram:
install.packages("semPlot")library(semPlot)semPaths(fit, "std", layout = "tree", edge.label.cex = 1.2)
You’ll get a graphical diagram showing the relationships and path coefficients between variables—great for presentations or papers.
✅ Interpretation Example
Let’s say the output shows:
- Sunlight → Height = 0.50 (p < 0.01)
- Water → Height = 0.40 (p < 0.05)
- Fertilizer → Height = 0.30 (p = 0.10)
- Sunlight → Water = 0.60 (p < 0.01)
You can conclude that:
- Sunlight has both a direct effect on plant height and an indirect effect via increased water uptake.
- Fertilizer’s effect might not be statistically significant in this small dataset.
🧪 Bonus: Check Indirect Effects
Use parameterEstimates() to extract effects:
parameterEstimates(fit, standardized = TRUE)
Or compute indirect effects manually using the coefficients (multiply paths along the chain).
📌 Conclusion
Path Analysis gives you a deeper understanding of how variables interact. In our plant example, it helped reveal not just that sunlight matters—but how it impacts plant growth, both directly and indirectly.
With just a few lines of code in R, you can:
- Model complex causal relationships
- Test direct/indirect effects
- Visualize structural models
0 Comments