Category Archives: R

More R and ggplot2

You would think to annotate a graph by stringing on more geom calls, like so:

p <- ggplot(ve, aes(x=V1) + stat_ecdf() + geom_vline(xintercept=10)

However, this applies geom_vline for all data in the frame. Instead, get the object from gglot and use annotate()

p <- ggplot(ve, aes(x=V1) + stat_ecdf()
p + geom_vline(xintercept=10)

You can also use geom_hline for horizontal lines and geom_abline to specify slope and intercept. And since this is R, you can compute results from your data and use that to place lines or text objects.

The intention is to use annotate(), but ggplot2 originally didn’t have annotate, and geom_vline etc were modifed to handle annotations properly.

annotate(“text”) adds text annotations.

annotate(“segment”) adds line segments.

annotate(“rect”) adds shaded rectangles.

coord_cartesian(xlim=c(..), ylim=c(..)) zooms the viewport in or out to show the specified range. This is generally what you want instead of scale_y_continuous; in the latter, data outside the range is removed from consideration, which can affect presentation.

R and ggplot2

The relatively new graphing package ggplot2 looks better and is more fully-featured. Here’s some translation of typical graphs from plot to ggplot2. ggplot2 takes its direction from Leland Wilkinson’s grammar of graphics; there is data (what we want to visualize), geometry (the geometric objects used to represent data), and aesthetic attributes that are visual properties of geoms like position, color, shapes and so on.

FYI, you’ll need to install it (which you can do inside R), and then in each R session you need to load it.

library(ggplot2)

Line graphs

In ggplot2

ggplot(two50, aes(x=Days, y=Likelihood)) + geom_line()

Cumulative probability distribution graph

In plot

n <- length(data$var)
plot(sort(data$var), (1:n)/n, type="s")

In ggplot2, the literatal translation would be

n <- length(data$var)
qplot(sort(data$var), (1:n)/n, stat="ecdf", geom="step")

but the idiomatic version is

ggplot(data, aes(x=var)) + stat_ecdf()

The stat_ecdf function here is from the ggplot2 package, and is the same function used in the qplot line above.

Beginner R

I have a data table on disk. I want to load it into R. R is using Unix-style paths even on Windows, so use forward-slashes no matter what platform you are on.

tz <- read.table("E:/time-hash/results.txt", header=FALSE)

Now I want to plot it with a basic scatter plot.

plot(tz$V1)

Now I want to plot a range of it – the first 100 samples, and limit the Y axis to the range 1.1 to 1.3.

plot(tz$V1, xlim=c(0,100), ylim=c(1.1,1,3))

R recipes

Cumulative probability distribution graph

Assume data is in the first variable of the data frame tz

n <- length(tz$V1)
plot(sort(tz$V1), (1:n)/n, type="s")

This will plot a no-frills cumulative probability distribution graph.

Histogram

Assume data is in the first variable of the data frame tz

hist(tz$V1, nclass=60)

This will plot a histogram with 60 buckets. If you leave nclass off, then R will compute what it thinks a reasonable bucket count and bucket width are. Note that you can also supply a vector declaring the width of each bucket.