# Statistics

### Multiple plots

# Plotting multiple curves and datasets on the same graph

It is quite common that we want to display different curves on the same graph, and / or display different sets of data in order to investigate the parameter space being covered.

This is often done when looking at various galaxy cluster datasets and comparing, for instance, the scaling relations derived for each sample. Let's consider, as an example, **3** samples which I will call **A**, **B** and **C**. Let's assume further that the *L-M* scaling relation has been derived for each sample. The results for the normalisation *A _{lm}* and slope

*b*are summarised in the table below.

_{lm}Sample | A_{lm} |
b_{lm} |
---|---|---|

A | ||

B | ||

C |

## Data

First, let's generate three data sets…

```
dat1 <- data.frame(
"M"=seq(13.5,14.6,0.1),
"L"=rnorm(12,43*(seq(13.5,14.6,0.1)/14)**1.33,0.4)
)
dat2 <- data.frame(
"M"=seq(14,15.5,length.out=25),
"L"=rnorm(25,44.5*(seq(14,15.5,length.out=25)/14.75)**1.4,0.32)
)
dat3 <- data.frame(
"M"=seq(14,15,length.out=8),
"L"=rnorm(8,43*(seq(14,15,length.out=8)/14.5)**1.33,0.4)
)
```

To display these three datasets *easily* with **ggplot**, we have to create a new data frame which holds all these data frames + an extra parameter (which I will call *sample*) which refers to the name of the data sample for each index value.

```
dat <- data.frame(
"M"=c(dat1$M,dat2$M,dat3$M),
"L"=c(dat1$L,dat2$L,dat3$L),
"sample"=c(rep("dat1",length(dat1$M)),rep("dat2",length(dat2$M)),rep("dat3",length(dat3$M)))
)
```

*N.B.* There is a simpler way to merge different data frames using *dplyr* but let's leave this for the moment. I might update the post later and/or explain how to do this in another post.

## Fitting parameters

In the same way that we created a combined data frames to hold the data values, we are going to create a combined data frame containing the slope and normalisation values for each of our samples.

```
##plot limits
mmin <- 13.5
mmax <- 15.5
lmin <- 43.5
lmax <- 45.5
##create a function to hold errors in log space
err <- function(x,logAe,Be) {
##form is y = B*logx + logA
err <- sqrt( (Be*x)**2 + logAe**2 )
return(err)
}
##these lines are created from real data so I correct here for
##the pivot mass and luminosity
xlinem <- seq(mmin,mmax,0.1)
ylinem1 <- 44.76 + 1.34*(xlinem - 15)
ylinem2 <- 44.77 + 1.64*(xlinem - 15)
ylinem3 <- 44.75 + 2.42*(xlinem - 14.7)
em1 <- err(xlinem-15,0.08/0.76/log(10),0.05)
em2 <- err(xlinem-15,0.11/0.59/log(10),0.23)
em3 <- err(xlinem-14.7,0.35/1.13/log(10),0.46)
fitdat <- data.frame(
"x"=rep(xlinem,3),
"y"=c(ylinem1,ylinem2,ylinem3),
"yup"=c(ylinem1+em1,ylinem2+em2,ylinem3+em3),
"ylow"=c(ylinem1-em1,ylinem2-em2,ylinem3-em3),
"fit"=c(rep(1,length(xlinem)),rep(2,length(xlinem)),rep(3,length(xlinem)))
)
```

## Plot

Let's create the plot!!

```
library(ggplot2)
p1 <- ggplot(data=fitdat, aes(x=x, y=y,group=fit))+
geom_line(data=fitdat, aes(linetype=factor(fit)))+
geom_ribbon(data=fitdat, aes(ymin=ylow, ymax=yup,fill=factor(fit)),
alpha=0.55)+
geom_point(data=dat,aes(x=M,y=L,group=sample,colour=factor(sample),
shape=factor(sample),size=factor(sample)))+
xlab(expression(paste("log(",M,"/",h[70],M[sol],"])"))) +
ylab(expression(paste("log(",L,"/",h[70]^2,"/[",erg,~s^{-1},"])")))
```

### Add a nice legend

```
##add nice colours
mycolours <- c("#aaaaf0","#f0aaf0","#f0f0aa","#aaf0aa")
mycolours2 <- c("#7f7fe8","#e87fe8","#e8e87f","#7fe87f")
p1+theme_bw(16) +
theme(legend.justification=c(1,0), legend.position=c(1,0))+
scale_colour_manual(name="Data",labels=c("Dat1","Dat2","Dat3"),
values=c(mycolours2[1],mycolours2[2],mycolours2[4]))+
scale_shape_manual(name="Data",labels=c("Dat1","Dat2","Dat3"),
values=c(16,1,8))+
scale_size_manual(name="Data",labels=c("Dat1","Dat2","Dat3"),
values=c(2.8,2.8,2.8))+
scale_linetype_manual(name="Fit",breaks=c(1,2,3),label=c("A","B","C"),
values=c("dashed","twodash","solid"))+
scale_fill_manual(name="Fit",breaks=c(1,2,3),label=c("A","B","C"),
values=c(mycolours[1],mycolours[2],mycolours[4]))
```

<

p>Note that in creating these plots, I have assumed that the *L-M* relations were not derived from the 3 datasets… which seems obvious considering the poor agreement between the data and the fits!

### Marginal distribution plots

# Marginal distribution plots

Often, it is useful to visualize graphically how two variables are correlated with each other. For instance, I recently carried out a Bayesian analysis to get constraints on cluster scaling relations. Immediately, I wanted look at the marginal samples from the posterior distribution for the normalisation and slope of my relation.

Apparently, there is no obvious way to so in *ggplot*, but after much search I finally managed to plot what I intended to…

To start off, let's generate some data:

## Data

```
set.seed(888)
mydata <- data.frame(
"A"=c(rnorm(100,44,0.1),rnorm(100,43.8,0.12)),
"B"=c(rnorm(100,1.33,0.15),rnorm(100,1.4,0.15)),
"sample"=c(rep("s",100),rep("p",100))
)
```

## Plot the data

### First Method

There are two different options to display the data. One is based on the code given here, which I show below…

```
library(ggplot2)
library(gridExtra)
empty <- ggplot()+geom_point(aes(1,1), colour="white") +
theme(
plot.background = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank()
)
#scatterplot of x and y variables
scatter <- ggplot(mydata,aes(A, B)) +
geom_point(aes(color=factor(sample))) +
scale_color_manual(name="Data",labels=c("run1","run2"),values = c("pink", "blue")) +
theme(legend.position=c(1,1),legend.justification=c(1,1))
#marginal density of x - plot on top
plot_top <- ggplot(mydata, aes(A, fill=factor(sample))) +
geom_density(alpha=.5) +
scale_fill_manual(values = c("pink", "blue")) +
theme(legend.position = "none")
#marginal density of y - plot on the right
plot_right <- ggplot(mydata, aes(B, fill=factor(sample))) +
geom_density(alpha=.5) +
coord_flip() +
scale_fill_manual(values = c("pink", "blue")) +
theme(legend.position = "none")
#arrange the plots together, with appropriate height and width for each row and column
grid.arrange(plot_top, empty, scatter, plot_right, ncol=2, nrow=2, widths=c(4, 1), heights=c(1, 4))
```

Now that's one way to view the data. Another one involves plotting probability contours, say , or contours on top of the scatter plot.

### Second Method

To start with, let's create a contour function to generate the probability contours.

```
library(MASS)
panel.conf.lines <- function(x, y, ...) {
lims = c(min(x) - 0.5 * (max(x) - min(x)), max(x) + 0.5 * (max(x) - min(x)),
min(y) - 0.5 * (max(y) - min(y)), max(y) + 0.5 * (max(y) - min(y)))
kd <- kde2d(x, y, n = 100, lims = lims)
## store density at each A,B pair
pp <- array()
for (i in 1:length(x)) {
z.x <- max(which(kd$x < x[i]))
z.y <- max(which(kd$y < y[i]))
pp[i] <- kd$z[z.x, z.y]
## cat(i,' pp',pp[i],'\n')
}
## find density level enclosing fraction of pairs
c1 <- quantile(pp, 0.32, na.rm = TRUE)
c2 <- quantile(pp, 0.05, na.rm = TRUE)
c3 <- quantile(pp, 0.0025, na.rm = TRUE)
cl1 <- contourLines(kd, levels = c1)
cl2 <- contourLines(kd, levels = c2)
cl3 <- contourLines(kd, levels = c3)
cat("c1=", c1, " c2=", c2, " c3=", c3, "\n")
cl <- mapply(function(x, piece) {
rbind(data.frame(x, piece = piece), c(NA, NA, NA))
}, c(cl3, cl2, cl1), seq_along(c(cl3, cl2, cl1)), SIMPLIFY = FALSE)
cl_df <- do.call("rbind", cl)
return(cl_df)
}
```

Now let's generate contours for the two subsets of our data, *i.e.* sample **S** and sample **P**.

```
conf <- panel.conf.lines(mydata$A[1:100],mydata$B[1:100])
conf2 <- panel.conf.lines(mydata$A[101:200],mydata$B[101:200])
```

And finally let's create the plot…

```
plotablm <- ggplot(mydata, aes(x = A, y = B)) + geom_polygon(aes(x = x, y = y,
group = piece, fill = as.factor(level)), data = conf, alpha = I(1/2)) +
geom_path(aes(x = x, y = y, group = piece, linetype = as.factor(level)),
size = 0.5, data = conf2) + geom_point(aes(colour = factor(sample)),
size = 1.5) + coord_cartesian(xlim = c(43.25, 45), ylim = c(0.75, 2)) +
xlab(expression(A)) + ylab(expression(B)) + theme_bw(18) + theme(legend.justification = c(1,
1), legend.position = c(1, 1)) + scale_colour_manual(name = "Data", label = c("B",
"P"), values = c("#7f7fe8", "#e87fe8")) + scale_fill_manual(name = "Sample B",
label = c(expression(paste(1, "", sigma)), expression(paste(2, "", sigma)),
expression(paste(3, "", sigma))), values = c("#7f7fe8", "#e87fe8", "#e8e87f")) +
scale_linetype_discrete(name = "Sample P", label = c(expression(paste(1,
"", sigma)), expression(paste(2, "", sigma)), expression(paste(3, "",
sigma))))
plotablm
```