All contents are licensed under CC BY-NC-ND 4.0.

Data preparations

We use the ddply function implemented in the plyr (Wickham 2011) add-on package (we will introduce ddply and the ‘split-apply-combine’ strategy in a bit more detail in Session 05).

library("plyr")
d_breaks_cut <- quantile(df$d, probs = seq(0, 1, by = 0.05))
df$d_cut <- cut(df$d, breaks = d_breaks_cut, include.lowest = TRUE)
dd <- ddply(df, c("d_cut"), summarise, 
            h_mean = mean(h),
            h_q25 = quantile(h, probs = 0.25),
            h_q75 = quantile(h, probs = 0.75))
dd$d_lb <- d_breaks_cut[-length(d_breaks_cut)]
dd$d_ub <- d_breaks_cut[-1]
dd$d_mean <- apply(dd[, c("d_lb", "d_ub")], MAR = 1, FUN = mean)

1 Store graphics

File format:

  • pdf(): ‘portable document format’
  • jpeg(): ‘joint photographic experts group’
  • tiff(): ‘tagged image file format’
  • png(): ‘portable network graphics’

Options:

  • width: width (for pdf in inches)
  • height: height (forpdf in inches)
  • onefile: logical value (should several graphics as separate pages in one file?)

Usage:

pdf(file='<file name>.pdf', height = 6, width = 9)
...
dev.off()

2 ggplot (Wickham 2016) basics

Books:

From R Graphics Cookbook, Section Some terminology and theory

  • “The data is what we want to visualize. It consists of variables, which are stored as columns in a data frame.
  • Geoms are the geometric objects that are drawn to represent the data, such as bars, lines, and points.
  • Aesthetic attributes, or aesthetics, are visual properties of geoms, such as x and y position, line color, point shapes, etc.
  • There are mappings from data values to aesthetics.
  • Scales control the mapping from the values in the data space to values in the aesthetic space. A continuous y scale maps larger numerical values to vertically higher positions in space.
  • Guides show the viewer how to map the visual properties back to the data space. The most commonly used guides are the tick marks and labels on an axis.”

ggplot2 requirement: data must be organised in data frames!

The syntax of the ggplot2 package differs from that of base R, and in order to create a visualization of your data using ggplot2, you need to specify three key components:

  • data,
  • aesthetics, and
  • geometry (geoms).

From ?aes:

“Aesthetic mappings describe how variables in the data are mapped to visual properties (aesthetics) of geoms. Aesthetic mappings can be set in ggplot() and in individual layers.”

Visual properties of geoms that are constant are set outside of aes (illustrated later)!

We always begin with defining a plotting object using a ggplot(data = ...) call:.

library("ggplot2")
ggplot(data = drought)

This indicates the (main) data set we will be using. Often, you might want to plot two variables (of course, there are is situations where you might want to specify only one, or three, or even more variables), one on the x-axis and one on the y-axis. Such an aim is known as positional aesthetics and added to the ggplot() call using aes(x = var1, y = var2), where aes() stands for aesthetics.

ggplot(data = drought, aes(x = elev, y = bair))

… no data is plotted yet, however the stage is prepared.

So we need a third key component, which is a geometry short geom

ggplot(data = drought, aes(x = elev, y = bair)) + 
  geom_point()

… where we can also shift the aesthetics to:

ggplot(data = drought) + 
  geom_point(aes(x = elev, y = bair))

Note: The data is specified outside of aes(), while the variables that ggplot maps to aesthetics are added inside aes().

3 Most important geoms

3.1 One continuous variable

3.1.1 Histogram

ggplot(data = df) + geom_histogram(aes(x = d))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

3.1.2 Boxplot

ggplot(data = df) + geom_boxplot(aes(x = d))

3.1.3 Kernel density estimation

ggplot(data = df) + geom_density(aes(x = d))

In combination with a histogram:

ggplot(data = df, aes(x = d)) + 
  geom_histogram(aes(y = after_stat(density))) + 
  geom_density()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Layer by layer (order of geoms matters):
ggplot(data = df, aes(x = d)) + 
  geom_density() +
  geom_histogram(aes(y = after_stat(density)))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

3.2 One categorical variable

ggplot(data = df, aes(y = d_cut)) + geom_bar()

3.3 Two continuous variables

Try the following:

ggplot(data = dd) + geom_point(aes(x = (d_lb + d_ub)/2, y = h_mean))

ggplot(data = dd) + geom_line(aes(x = (d_lb + d_ub)/2, y = h_mean))

ggplot(data = dd, aes(x = (d_lb + d_ub)/2, y = h_mean)) + 
  geom_line() + 
  geom_point()

ggplot(data = df) + geom_line(aes(x = d, y = h))

ggplot(data = df) + geom_path(aes(x = d, y = h))

ggplot(data = df) + geom_rug(aes(x = d, y = h))

ggplot(data = df, aes(x = d, y = h)) + 
  geom_rug() +
  geom_point() + 
  geom_line(data = dd, ## 2nd data-set with ...
            aes(x = (d_lb + d_ub)/2, y = h_mean), ## ... different aesthetics
            color = "aquamarine4", linewidth = 2) ## 

ggplot(data = df, aes(x = d, y = h)) + 
  geom_rug() +
  geom_bin_2d() +
  geom_point(alpha = .4, color = "coral1")

4 Colors with colorspace (Stauffer et al. 2009)

Rather then starting somewhere different, start at the colorspace:

Remember from R Graphics Cookbook, Section Some terminology and theory

  • Scales control the mapping from the values in the data space to values in the aesthetic space. A continuous y scale maps larger numerical values to vertically higher positions in space.
library("colorspace")
ggplot(data = df, aes(x = d, y = h)) + 
  geom_rug() +
  geom_bin_2d() +
  scale_fill_continuous_divergingx(pal = "Earth")

ggplot(data = df, aes(x = d, y = h)) + 
  geom_rug() +
  geom_density_2d(aes(color=after_stat(level))) +
  scale_color_continuous_sequential(pal = "ag_GrnYl", rev = F)

5 Titles

Titles are axis labels, but also main title and subtitle (as well as caption and tag). They are configured using function labs:

ggplot(data = df, aes(x = d, y = h)) + 
  geom_rug() + 
  geom_point(aes(color = as.factor(plot))) + 
  labs(x = "Diameter at breastheight [cm]", 
       y = "Tree height [m]", 
       title = "Heights and diameters of Scots pine trees in Ilomantsi, Finland", 
       subtitle = "Plot symbols show individual tree measurements at 56 plots", 
       caption = "Data from 'spati2' dataset shipping with 'lmfor'.", 
       tag = "A)") + 
  scale_color_discrete_qualitative(pal = "Dark 2") + 
  theme(legend.position = "none")

6 Facetting

Conditional on categorical grouping variable(s), facet_wrap and facet_grid provide functionality to for conditional subplots:

ggplot(data = df, aes(x = d, y = h)) + 
  geom_rug() + 
  geom_point(aes(color = as.factor(plot))) +
  facet_wrap(~ plot) + 
  labs(x = "Diameter at breastheight [cm]", 
       y = "Tree height [m]", 
       title = "Heights and diameters of Scots pine trees in Ilomantsi, Finland", 
       subtitle = "Plot symbols show individual tree measurements at 56 plots", 
       caption = "Data from 'spati2' dataset shipping with 'lmfor'.", 
       tag = "A)") + 
  scale_color_discrete_qualitative(pal = "Dark 2") + 
  theme(legend.position = "none")

ggplot(data = subset(df, (plot < 6) & (d > 15)), 
       aes(x = d, y = h)) + 
  geom_rug() + 
  geom_point(aes(color = as.factor(plot))) +
  facet_grid(row = vars(plot), cols = vars(d_cut)) + 
  labs(x = "Diameter at breastheight [cm]", 
       y = "Tree height [m]", 
       title = "Heights and diameters of Scots pine trees in Ilomantsi, Finland", 
       subtitle = "Plot symbols show individual tree measurements at 56 plots", 
       caption = "Data from 'spati2' dataset shipping with 'lmfor'.", 
       tag = "A)") + 
  scale_color_discrete_qualitative(pal = "Dark 2") + 
  theme(legend.position = "none")

7 Axes

Manipulate axis range in three different ways:

ggplot(data = df, aes(x = d, y = h)) + 
  geom_point() + 
  xlim(c(10, 20))
## Warning: Removed 1005 rows containing missing values or values outside the scale range
## (`geom_point()`).

ggplot(data = df, aes(x = d, y = h)) + 
  geom_point() + 
  coord_cartesian(xlim = c(10, 20))

ggplot(data = df, aes(x = d, y = h)) + 
  geom_point() + 
  expand_limits(y = 0)

See ?scale_continuous for transformations of axis:

## Log-log plot:
ggplot(data = df, aes(x = d, y = h)) + 
  geom_point() + 
  scale_x_log10() +
  scale_y_log10()

frost$bud_burst_doy <- as.numeric(strftime(as.Date(frost$bud_burst), format = "%j"))
frost$end_1st_dev_stage_doy <- as.numeric(strftime(as.Date(frost$end_1st_dev_stage), format = "%j"))
foo <- function(x) {
  x <- as.Date(x, origin = as.Date("2024-01-01"))
  Sys.setlocale(locale = "en_US.UTF-8")
  lab <- months(x)
  Sys.setlocale(locale = "de_DE.UTF-8")
  return(paste0(lab, ", 1st"))
}
## Apply function for axis labels:
p1 <- ggplot(data = frost, aes(x = bud_burst_doy, y = end_1st_dev_stage_doy)) + 
  geom_point(aes(color = year)) +
  scale_x_continuous(# name = "time",
     breaks = as.numeric(strftime(as.Date(paste0("2024-0", 4:7, "-01")), format = "%j")),
     minor_breaks = NULL,
     labels = foo, 
     limits = as.numeric(strftime(as.Date(paste0("2024-0", c(4, 7), "-01")), format = "%j"))) +
  scale_y_continuous(# name = "time",
     breaks = as.numeric(strftime(as.Date(paste0("2024-0", 4:7, "-01")), format = "%j")),
     minor_breaks = NULL,
     labels = foo, 
     limits = as.numeric(strftime(as.Date(paste0("2024-0", c(4, 7), "-01")), format = "%j"))) + 
  labs(y = "End of 1st development stage", x = "Bud burst") + 
  scale_color_continuous_sequential(pal = "Rocket", begin = .1)
p1

8 ggrepel (Slowikowski 2024)

library("ggrepel")
p2 <- ggplot(data = frost, aes(x = bud_burst_doy, y = end_1st_dev_stage_doy)) + 
  geom_point(aes(color = year)) +
  geom_label_repel(aes(label = year, color = year), min.segment.length = 0) +
  labs(y = "End of 1st development stage [day of year]", x = "Bud burst [day of year]") + 
  scale_color_continuous_sequential(pal = "Rocket", begin = .1)
p2
## Warning: ggrepel: 46 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

9 cowplot (Wilke 2024)

library("cowplot")
guide_box_top <- get_plot_component(p1 + 
                                      labs(color = "Year:") +
                                      theme(legend.position = "top"), pattern = "guide-box-top")
plot_grid(guide_box_top, 
          plot_grid(p1 + theme(legend.position = "none"), p2 + theme(legend.position = "none"), ncol = 2), 
          nrow = 2, rel_heights = c(.1, .9))
## Warning: ggrepel: 63 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

References

Slowikowski, Kamil. 2024. Ggrepel: Automatically Position Non-Overlapping Text Labels with ’Ggplot2’. https://CRAN.R-project.org/package=ggrepel.
Stauffer, Reto, Georg J. Mayr, Markus Dabernig, and Achim Zeileis. 2009. “Somewhere over the Rainbow: How to Make Effective Use of Colors in Meteorological Visualizations.” Bulletin of the American Meteorological Society 96 (2): 203–16. https://doi.org/10.1175/BAMS-D-13-00155.1.
Wickham, Hadley. 2011. “The Split-Apply-Combine Strategy for Data Analysis.” Journal of Statistical Software 40 (1): 1–29. http://www.jstatsoft.org/v40/i01/.
———. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
Wilke, Claus O. 2024. Cowplot: Streamlined Plot Theme and Plot Annotations for ’Ggplot2’. https://CRAN.R-project.org/package=cowplot.