# ggplot

### Inserting subplots

Posted on Updated on

# Creating subplots in R

Let's discuss how to insert a subplot in an existing plot. Most of the following is actually taken from Hadley Wickam's book ggplot2: Elegant Graphics for Data Analysis.

Suppose we have two graphs, plotlm500 (LHS graph) and plotablm (RHS graph) as shown in the figure below:

Now let's assume we want to embed the right-hand side graph in the left-hand side graph. The way to procede is to use viewports as described below.

library(ggplot2)
library(grid)
## Create plot + subplots using viewports
subvp <- viewport(width = 0.4, height = 0.4, x = 0.72, y = 0.3)

## Shrink plot margins of small graph
plotablmsmall <- plotablm + theme_gray(9) + ## labs(x = NULL, y = NULL)+
theme(plot.margin = unit(rep(0, 4), "lines"), legend.position = "none")

## Plot bigger graph..
plotlm500

## ..and insert subset
print(plotablmsmall, vp = subvp)


### Multiple plots

Posted on Updated on

# Plotting multiple curves and datasets on the same graph

It is quite common that we want to display different curves on the same graph, and / or display different sets of data in order to investigate the parameter space being covered.

This is often done when looking at various galaxy cluster datasets and comparing, for instance, the scaling relations derived for each sample. Let's consider, as an example, 3 samples which I will call A, B and C. Let's assume further that the L-M scaling relation has been derived for each sample. The results for the normalisation Alm and slope blm are summarised in the table below.

Sample Alm blm
A $0.76\pm 0.08$ $1.34\pm 0.05$
B $0.59\pm 0.11$ $1.64\pm 0.23$
C $1.13\pm 0.35$ $2.42\pm 0.46$

## Data

First, let's generate three data sets…

dat1 <- data.frame(
"M"=seq(13.5,14.6,0.1),
"L"=rnorm(12,43*(seq(13.5,14.6,0.1)/14)**1.33,0.4)
)

dat2 <- data.frame(
"M"=seq(14,15.5,length.out=25),
"L"=rnorm(25,44.5*(seq(14,15.5,length.out=25)/14.75)**1.4,0.32)
)

dat3 <- data.frame(
"M"=seq(14,15,length.out=8),
"L"=rnorm(8,43*(seq(14,15,length.out=8)/14.5)**1.33,0.4)
)


To display these three datasets easily with ggplot, we have to create a new data frame which holds all these data frames + an extra parameter (which I will call sample) which refers to the name of the data sample for each index value.

dat <- data.frame(
"M"=c(dat1$M,dat2$M,dat3$M), "L"=c(dat1$L,dat2$L,dat3$L),
"sample"=c(rep("dat1",length(dat1$M)),rep("dat2",length(dat2$M)),rep("dat3",length(dat3$M))) )  N.B. There is a simpler way to merge different data frames using dplyr but let's leave this for the moment. I might update the post later and/or explain how to do this in another post. ## Fitting parameters In the same way that we created a combined data frames to hold the data values, we are going to create a combined data frame containing the slope and normalisation values for each of our samples. ##plot limits mmin <- 13.5 mmax <- 15.5 lmin <- 43.5 lmax <- 45.5 ##create a function to hold errors in log space err <- function(x,logAe,Be) { ##form is y = B*logx + logA err <- sqrt( (Be*x)**2 + logAe**2 ) return(err) } ##these lines are created from real data so I correct here for ##the pivot mass and luminosity xlinem <- seq(mmin,mmax,0.1) ylinem1 <- 44.76 + 1.34*(xlinem - 15) ylinem2 <- 44.77 + 1.64*(xlinem - 15) ylinem3 <- 44.75 + 2.42*(xlinem - 14.7) em1 <- err(xlinem-15,0.08/0.76/log(10),0.05) em2 <- err(xlinem-15,0.11/0.59/log(10),0.23) em3 <- err(xlinem-14.7,0.35/1.13/log(10),0.46) fitdat <- data.frame( "x"=rep(xlinem,3), "y"=c(ylinem1,ylinem2,ylinem3), "yup"=c(ylinem1+em1,ylinem2+em2,ylinem3+em3), "ylow"=c(ylinem1-em1,ylinem2-em2,ylinem3-em3), "fit"=c(rep(1,length(xlinem)),rep(2,length(xlinem)),rep(3,length(xlinem))) )  ## Plot Let's create the plot!! library(ggplot2) p1 <- ggplot(data=fitdat, aes(x=x, y=y,group=fit))+ geom_line(data=fitdat, aes(linetype=factor(fit)))+ geom_ribbon(data=fitdat, aes(ymin=ylow, ymax=yup,fill=factor(fit)), alpha=0.55)+ geom_point(data=dat,aes(x=M,y=L,group=sample,colour=factor(sample), shape=factor(sample),size=factor(sample)))+ xlab(expression(paste("log(",M,"/",h[70],M[sol],"])"))) + ylab(expression(paste("log(",L,"/",h[70]^2,"/[",erg,~s^{-1},"])")))  ### Add a nice legend ##add nice colours mycolours <- c("#aaaaf0","#f0aaf0","#f0f0aa","#aaf0aa") mycolours2 <- c("#7f7fe8","#e87fe8","#e8e87f","#7fe87f") p1+theme_bw(16) + theme(legend.justification=c(1,0), legend.position=c(1,0))+ scale_colour_manual(name="Data",labels=c("Dat1","Dat2","Dat3"), values=c(mycolours2[1],mycolours2[2],mycolours2[4]))+ scale_shape_manual(name="Data",labels=c("Dat1","Dat2","Dat3"), values=c(16,1,8))+ scale_size_manual(name="Data",labels=c("Dat1","Dat2","Dat3"), values=c(2.8,2.8,2.8))+ scale_linetype_manual(name="Fit",breaks=c(1,2,3),label=c("A","B","C"), values=c("dashed","twodash","solid"))+ scale_fill_manual(name="Fit",breaks=c(1,2,3),label=c("A","B","C"), values=c(mycolours[1],mycolours[2],mycolours[4]))  < p>Note that in creating these plots, I have assumed that the L-M relations were not derived from the 3 datasets… which seems obvious considering the poor agreement between the data and the fits! ### Marginal distribution plots Posted on Updated on # Marginal distribution plots Often, it is useful to visualize graphically how two variables are correlated with each other. For instance, I recently carried out a Bayesian analysis to get constraints on cluster scaling relations. Immediately, I wanted look at the marginal samples from the posterior distribution for the normalisation and slope of my relation. Apparently, there is no obvious way to so in ggplot, but after much search I finally managed to plot what I intended to… To start off, let's generate some data: ## Data set.seed(888) mydata <- data.frame( "A"=c(rnorm(100,44,0.1),rnorm(100,43.8,0.12)), "B"=c(rnorm(100,1.33,0.15),rnorm(100,1.4,0.15)), "sample"=c(rep("s",100),rep("p",100)) )  ## Plot the data ### First Method There are two different options to display the data. One is based on the code given here, which I show below… library(ggplot2) library(gridExtra) empty <- ggplot()+geom_point(aes(1,1), colour="white") + theme( plot.background = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.border = element_blank(), panel.background = element_blank(), axis.title.x = element_blank(), axis.title.y = element_blank(), axis.text.x = element_blank(), axis.text.y = element_blank(), axis.ticks = element_blank() ) #scatterplot of x and y variables scatter <- ggplot(mydata,aes(A, B)) + geom_point(aes(color=factor(sample))) + scale_color_manual(name="Data",labels=c("run1","run2"),values = c("pink", "blue")) + theme(legend.position=c(1,1),legend.justification=c(1,1)) #marginal density of x - plot on top plot_top <- ggplot(mydata, aes(A, fill=factor(sample))) + geom_density(alpha=.5) + scale_fill_manual(values = c("pink", "blue")) + theme(legend.position = "none") #marginal density of y - plot on the right plot_right <- ggplot(mydata, aes(B, fill=factor(sample))) + geom_density(alpha=.5) + coord_flip() + scale_fill_manual(values = c("pink", "blue")) + theme(legend.position = "none") #arrange the plots together, with appropriate height and width for each row and column grid.arrange(plot_top, empty, scatter, plot_right, ncol=2, nrow=2, widths=c(4, 1), heights=c(1, 4))  Now that's one way to view the data. Another one involves plotting probability contours, say $1\sigma$, $2\sigma$ or $3\sigma$ contours on top of the scatter plot. ### Second Method To start with, let's create a contour function to generate the probability contours. library(MASS) panel.conf.lines <- function(x, y, ...) { lims = c(min(x) - 0.5 * (max(x) - min(x)), max(x) + 0.5 * (max(x) - min(x)), min(y) - 0.5 * (max(y) - min(y)), max(y) + 0.5 * (max(y) - min(y))) kd <- kde2d(x, y, n = 100, lims = lims) ## store density at each A,B pair pp <- array() for (i in 1:length(x)) { z.x <- max(which(kd$x < x[i]))
z.y <- max(which(kd$y < y[i])) pp[i] <- kd$z[z.x, z.y]
## cat(i,' pp',pp[i],'\n')
}
## find density level enclosing fraction of pairs
c1 <- quantile(pp, 0.32, na.rm = TRUE)
c2 <- quantile(pp, 0.05, na.rm = TRUE)
c3 <- quantile(pp, 0.0025, na.rm = TRUE)
cl1 <- contourLines(kd, levels = c1)
cl2 <- contourLines(kd, levels = c2)
cl3 <- contourLines(kd, levels = c3)
cat("c1=", c1, " c2=", c2, " c3=", c3, "\n")
cl <- mapply(function(x, piece) {
rbind(data.frame(x, piece = piece), c(NA, NA, NA))
}, c(cl3, cl2, cl1), seq_along(c(cl3, cl2, cl1)), SIMPLIFY = FALSE)
cl_df <- do.call("rbind", cl)
return(cl_df)
}


Now let's generate contours for the two subsets of our data, i.e. sample S and sample P.

conf <- panel.conf.lines(mydata$A[1:100],mydata$B[1:100])
conf2 <- panel.conf.lines(mydata$A[101:200],mydata$B[101:200])


And finally let's create the plot…

plotablm <- ggplot(mydata, aes(x = A, y = B)) + geom_polygon(aes(x = x, y = y,
group = piece, fill = as.factor(level)), data = conf, alpha = I(1/2)) +
geom_path(aes(x = x, y = y, group = piece, linetype = as.factor(level)),
size = 0.5, data = conf2) + geom_point(aes(colour = factor(sample)),
size = 1.5) + coord_cartesian(xlim = c(43.25, 45), ylim = c(0.75, 2)) +
xlab(expression(A)) + ylab(expression(B)) + theme_bw(18) + theme(legend.justification = c(1,
1), legend.position = c(1, 1)) + scale_colour_manual(name = "Data", label = c("B",
"P"), values = c("#7f7fe8", "#e87fe8")) + scale_fill_manual(name = "Sample B",
label = c(expression(paste(1, "", sigma)), expression(paste(2, "", sigma)),
expression(paste(3, "", sigma))), values = c("#7f7fe8", "#e87fe8", "#e8e87f")) +
scale_linetype_discrete(name = "Sample P", label = c(expression(paste(1,
"", sigma)), expression(paste(2, "", sigma)), expression(paste(3, "",
sigma))))
plotablm