Whether you're a student, researcher, or just a curious plant lover, understanding the relationships between plant height and other variables (like sunlight, water, or soil type) is crucial. One powerful statistical tool for this is correlation analysis—and R makes it easy.
In this blog post, we’ll walk through how to perform a correlation analysis in R using plant height data. Let's dive in!
📌 What is Correlation?
Correlation measures the strength and direction of a relationship between two numerical variables. It ranges from -1 (perfect negative relationship) to 1 (perfect positive relationship), with 0 meaning no correlation.
🛠 Step 1: Set Up Your Environment
First, open R or RStudio and load the required libraries. For this analysis, we’ll use tidyverse
for data handling and ggpubr
for visualization.
# Install packages if not already installed
install.packages("tidyverse")install.packages("ggpubr")# Load librarieslibrary(tidyverse)library(ggpubr)
🌱 Step 2: Load or Create Your Data
Let’s assume you have a dataset with plant height and other variables like sunlight, water, and fertilizer.
Here’s a sample dataset:
# Sample dataset
plant_data <- data.frame(height_cm = c(45, 50, 55, 60, 65, 70, 75, 80),sunlight_hours = c(5, 6, 6, 7, 8, 9, 10, 11),water_ml = c(300, 350, 400, 450, 500, 550, 600, 650),fertilizer_score = c(2, 3, 3, 4, 5, 5, 6, 7))
📊 Step 3: Explore the Data
Before running a correlation analysis, it’s good practice to look at the data:
head(plant_data)
summary(plant_data)
📈 Step 4: Compute the Correlation Matrix
Use the cor()
function to calculate the correlation coefficients:
cor_matrix <- cor(plant_data)
print(cor_matrix)
This will give you a matrix showing the correlation between all pairs of variables.
🔍 Step 5: Visualize the Correlations
Let’s visualize the correlation between height_cm
and other variables using scatter plots:
# Scatter plot with regression line
ggscatter(plant_data, x = "sunlight_hours", y = "height_cm",add = "reg.line", conf.int = TRUE,cor.coef = TRUE, cor.method = "pearson",xlab = "Sunlight (hours)", ylab = "Plant Height (cm)")
Repeat for other variables (water_ml
, fertilizer_score
) by changing the x
argument.
✅ Step 6: Interpret the Results
- A positive correlation means that as one variable increases, the other tends to increase.
- A negative correlation means that as one increases, the other decreases.
- A value close to 0 means no clear relationship.
Example interpretation:
The correlation between plant height and sunlight is 0.98, indicating a strong positive relationship. As sunlight increases, plant height increases significantly.
📌 Bonus: Significance Testing
You can test whether a correlation is statistically significant using cor.test()
:
cor.test(plant_data$height_cm, plant_data$sunlight_hours)
This provides a p-value. If p < 0.05, the correlation is statistically significant.
🌟 Conclusion
Correlation analysis is a quick and effective way to uncover patterns in your plant growth data. With just a few lines of R code, you can:
- Quantify relationships
- Visualize patterns
- Support your plant biology research with real stats!
So next time you're wondering whether watering more really makes your plants taller—R has your back!
0 Comments