Inserting subplots

Posted on Updated on

Creating subplots in R

Let's discuss how to insert a subplot in an existing plot. Most of the following is actually taken from Hadley Wickam's book ggplot2: Elegant Graphics for Data Analysis.

Suppose we have two graphs, plotlm500 (LHS graph) and plotablm (RHS graph) as shown in the figure below:

Figure 1

Now let's assume we want to embed the right-hand side graph in the left-hand side graph. The way to procede is to use viewports as described below.

## Create plot + subplots using viewports
subvp <- viewport(width = 0.4, height = 0.4, x = 0.72, y = 0.3)

## Shrink plot margins of small graph
plotablmsmall <- plotablm + theme_gray(9) + ## labs(x = NULL, y = NULL)+
theme(plot.margin = unit(rep(0, 4), "lines"), legend.position = "none")

## Plot bigger graph..

## ..and insert subset
print(plotablmsmall, vp = subvp)

plot of chunk dat subplots


Multiple plots

Posted on Updated on

Plotting multiple curves and datasets on the same graph

It is quite common that we want to display different curves on the same graph, and / or display different sets of data in order to investigate the parameter space being covered.

This is often done when looking at various galaxy cluster datasets and comparing, for instance, the scaling relations derived for each sample. Let's consider, as an example, 3 samples which I will call A, B and C. Let's assume further that the L-M scaling relation has been derived for each sample. The results for the normalisation Alm and slope blm are summarised in the table below.

Sample Alm blm
A 0.76\pm 0.08 1.34\pm 0.05
B 0.59\pm 0.11 1.64\pm 0.23
C 1.13\pm 0.35 2.42\pm 0.46


First, let's generate three data sets…

dat1 <- data.frame(

dat2 <- data.frame(

dat3 <- data.frame(

To display these three datasets easily with ggplot, we have to create a new data frame which holds all these data frames + an extra parameter (which I will call sample) which refers to the name of the data sample for each index value.

dat <- data.frame(

N.B. There is a simpler way to merge different data frames using dplyr but let's leave this for the moment. I might update the post later and/or explain how to do this in another post.

Fitting parameters

In the same way that we created a combined data frames to hold the data values, we are going to create a combined data frame containing the slope and normalisation values for each of our samples.

##plot limits
mmin <- 13.5
mmax <- 15.5
lmin <- 43.5
lmax <- 45.5

##create a function to hold errors in log space
err <- function(x,logAe,Be) {
  ##form is y = B*logx + logA
  err <- sqrt( (Be*x)**2 + logAe**2 )

##these lines are created from real data so I correct here for 
##the pivot mass and luminosity
xlinem <- seq(mmin,mmax,0.1)
ylinem1 <- 44.76 + 1.34*(xlinem - 15)
ylinem2 <- 44.77 + 1.64*(xlinem - 15)
ylinem3 <- 44.75 + 2.42*(xlinem - 14.7)
em1 <- err(xlinem-15,0.08/0.76/log(10),0.05)
em2 <- err(xlinem-15,0.11/0.59/log(10),0.23)
em3 <- err(xlinem-14.7,0.35/1.13/log(10),0.46)

fitdat <- data.frame(


Let's create the plot!!


p1 <- ggplot(data=fitdat, aes(x=x, y=y,group=fit))+
    geom_line(data=fitdat, aes(linetype=factor(fit)))+
    geom_ribbon(data=fitdat, aes(ymin=ylow, ymax=yup,fill=factor(fit)),
    xlab(expression(paste("log(",M,"/",h[70],M[sol],"])"))) +

plot of chunk unnamed-chunk-1

Add a nice legend

##add nice colours
mycolours <- c("#aaaaf0","#f0aaf0","#f0f0aa","#aaf0aa")
mycolours2 <- c("#7f7fe8","#e87fe8","#e8e87f","#7fe87f")

p1+theme_bw(16) +
    theme(legend.justification=c(1,0), legend.position=c(1,0))+

plot of chunk data+combined+fit+plot+legend


p>Note that in creating these plots, I have assumed that the L-M relations were not derived from the 3 datasets… which seems obvious considering the poor agreement between the data and the fits!

Marginal distribution plots

Posted on Updated on

Marginal distribution plots

Often, it is useful to visualize graphically how two variables are correlated with each other. For instance, I recently carried out a Bayesian analysis to get constraints on cluster scaling relations. Immediately, I wanted look at the marginal samples from the posterior distribution for the normalisation and slope of my relation.

Apparently, there is no obvious way to so in ggplot, but after much search I finally managed to plot what I intended to…

To start off, let's generate some data:



mydata <- data.frame(

Plot the data

First Method

There are two different options to display the data. One is based on the code given here, which I show below…

empty <- ggplot()+geom_point(aes(1,1), colour="white") +
       plot.background = element_blank(), 
       panel.grid.major = element_blank(), 
       panel.grid.minor = element_blank(), 
       panel.border = element_blank(), 
       panel.background = element_blank(),
       axis.title.x = element_blank(),
       axis.title.y = element_blank(),
       axis.text.x = element_blank(),
       axis.text.y = element_blank(),
       axis.ticks = element_blank()

#scatterplot of x and y variables
scatter <- ggplot(mydata,aes(A, B)) + 
  geom_point(aes(color=factor(sample))) + 
  scale_color_manual(name="Data",labels=c("run1","run2"),values = c("pink", "blue")) + 

#marginal density of x - plot on top
plot_top <- ggplot(mydata, aes(A, fill=factor(sample))) + 
  geom_density(alpha=.5) + 
  scale_fill_manual(values = c("pink", "blue")) + 
  theme(legend.position = "none")

#marginal density of y - plot on the right
plot_right <- ggplot(mydata, aes(B, fill=factor(sample))) + 
  geom_density(alpha=.5) + 
  coord_flip() + 
  scale_fill_manual(values = c("pink", "blue")) + 
  theme(legend.position = "none") 

#arrange the plots together, with appropriate height and width for each row and column
grid.arrange(plot_top, empty, scatter, plot_right, ncol=2, nrow=2, widths=c(4, 1), heights=c(1, 4))

plot of chunk plot

Now that's one way to view the data. Another one involves plotting probability contours, say 1\sigma, 2\sigma or 3\sigma contours on top of the scatter plot.

Second Method

To start with, let's create a contour function to generate the probability contours.

panel.conf.lines <- function(x, y, ...) {
    lims = c(min(x) - 0.5 * (max(x) - min(x)), max(x) + 0.5 * (max(x) - min(x)), 
        min(y) - 0.5 * (max(y) - min(y)), max(y) + 0.5 * (max(y) - min(y)))
    kd <- kde2d(x, y, n = 100, lims = lims)
    ## store density at each A,B pair
    pp <- array()
    for (i in 1:length(x)) {
        z.x <- max(which(kd$x < x[i]))
        z.y <- max(which(kd$y < y[i]))
        pp[i] <- kd$z[z.x, z.y]
        ## cat(i,' pp',pp[i],'\n')
    ## find density level enclosing fraction of pairs
    c1 <- quantile(pp, 0.32, na.rm = TRUE)
    c2 <- quantile(pp, 0.05, na.rm = TRUE)
    c3 <- quantile(pp, 0.0025, na.rm = TRUE)
    cl1 <- contourLines(kd, levels = c1)
    cl2 <- contourLines(kd, levels = c2)
    cl3 <- contourLines(kd, levels = c3)
    cat("c1=", c1, " c2=", c2, " c3=", c3, "\n")
    cl <- mapply(function(x, piece) {
        rbind(data.frame(x, piece = piece), c(NA, NA, NA))
    }, c(cl3, cl2, cl1), seq_along(c(cl3, cl2, cl1)), SIMPLIFY = FALSE)
    cl_df <-"rbind", cl)

Now let's generate contours for the two subsets of our data, i.e. sample S and sample P.

conf <- panel.conf.lines(mydata$A[1:100],mydata$B[1:100])
conf2 <- panel.conf.lines(mydata$A[101:200],mydata$B[101:200])

And finally let's create the plot…

plotablm <- ggplot(mydata, aes(x = A, y = B)) + geom_polygon(aes(x = x, y = y, 
    group = piece, fill = as.factor(level)), data = conf, alpha = I(1/2)) + 
    geom_path(aes(x = x, y = y, group = piece, linetype = as.factor(level)), 
        size = 0.5, data = conf2) + geom_point(aes(colour = factor(sample)), 
    size = 1.5) + coord_cartesian(xlim = c(43.25, 45), ylim = c(0.75, 2)) + 
    xlab(expression(A)) + ylab(expression(B)) + theme_bw(18) + theme(legend.justification = c(1, 
    1), legend.position = c(1, 1)) + scale_colour_manual(name = "Data", label = c("B", 
    "P"), values = c("#7f7fe8", "#e87fe8")) + scale_fill_manual(name = "Sample B", 
    label = c(expression(paste(1, "", sigma)), expression(paste(2, "", sigma)), 
        expression(paste(3, "", sigma))), values = c("#7f7fe8", "#e87fe8", "#e8e87f")) + 
    scale_linetype_discrete(name = "Sample P", label = c(expression(paste(1, 
        "", sigma)), expression(paste(2, "", sigma)), expression(paste(3, "", 

plot of chunk data + contour function + plot