Today Scott Chamberlain tweeted asking for a better/faster solution to building an m x n matrix with randomly assigned 0/1. He already had a working version:

r <- 1000 c <- 1000 m0 <- matrix(0, r, c) apply(m0, c(1,2), function(x) sample(c(0,1),1))

Now, I’m the first to acknowledge that I’ve never got the ‘apply’ family of functions and that—thanks Hadley—if I need to do something like that I go for the plyr package. Nevertheless, if you think about it, the starting point is a sugary version of a loop; I like loops because they are explicit and clear, but they can be very slow in R. The loop could be written as:

m00 <- matrix(0, r, c) for(i in 1:r) { for(j in 1:c) { m00[i, j] <- sample(c(0,1),1) } }

In contrast, my first idea was to generate a bunch of uniformly distributed [0, 1) random numbers and round them to the closest integer, which is a more ‘matricy’ way of thinking:

m1 <- round(matrix(runif(r*c), r, c)))

and it happens to be a lot faster. My second idea, and Edmund Hart beat me to that one, was to use something like:

m2 <- matrix(rbinom(r*c,1,0.5),r,c)

which is a nice option, generating r*c random numbers following a binomial distribution, which also has the advantage of allowing for different probabilities, rather than 0.5 as in m1. This is also faster than m1, although probably one would not notice the difference until working with fairly big matrices. When Scott pointed out the speed bump he got me thinking about order of the operations; would this be better?

m3 <- matrix(round(runif(r*c)), r, c)

In terms of speed, m3 fits between m1 and m2, so the order *can* make a difference.

David Smith and Rafael Maia came up with a different approach, using sample() which I had not considered at all. m4 has the advantage that one could use any number of values to randomize.

m4 <- matrix(sample(0:1,r*c, replace=TRUE),r,c)

Any of these options can be timed using the system.time() function as in `system.time(m3 = matrix(round(runif(r*c)), r, c))`

. It’s interesting how different people come up with alternative strategies (reflecting different ways of thinking about the problem) using exactly the same language. I’m not sure what should be the ‘obvious’ solution; but for almost any purpose any of them (with the exception of the explicit loop) will work just fine.

**P.S. 2012-08-29, 22:33.** Another way that would work (although 4 times slower than m1) would be `m5 = matrix(ifelse(runif(r*c) < 0.5, 0, 1))`

. Still, it takes less than 0.5 seconds for a 1,000,000 elements matrix.**P.S. 2012-09-04.** Scott Chamberlain wrote his own blog post discussing the different suggestions. Even cooler, Dirk Eddelbuettel implemented even faster creation of random 1/0 matrices using Rcpp, with inline C++ code. Much faster in a crazy overkill sort of way.

Interesting post.

m4 would be the usual approach I would use, but if efficiency is an absolute must this code is marginally quicker:

system.time((function(row, col){

temp <- sample(0:1, size = row*col, replace = TRUE)

dim(temp) <- c(row, col)

return(temp)

})(10000, 10000)

)

For the 10000^2 case above, creating a {0, 1} vector and then making it two-dimensional (converting it to a matrix) is around 15% faster than the m4 function.

Thanks for the code. I’m still thinking if there is any other very different implementation.

binmatfun = function(mn)

{

mat=sample(0:1, mn^2, replace=T)

dim(mat)=c(mn ,mn)

}

Yes, converting to a matrix takes some time as your function and the previous commenter’s try to reduce. One option would be to work only with vectors and totally sidestep this time; the feasibility of this approach will depend on what Scott was thinking of doing later. Thanks for the idea.

If you change the second example with the double for loop to this

m00 <- matrix(0, r, c)

for(j in 1:c){

m00[,j] <- sample(c(0,1),replace=TRUE,r)

}

where each column is filled in one go, you'll see that this is also quite quick.

So your conclusion about the loop only seems to hold if you fill the matrix a single element at a time.

But then you are moving into matrix operations, so then why not do m1, m2, m3, m4 or m5? At the extreme I could have something like:

for(i in 1){

m2 <- matrix(rbinom(r*c,1,0.5),r,c) } It is a single pass loop but, in my view, if you are going to go for a matrix approach just do it the whole way. It is faster, more compact and, many times, very readable.

Thank you! This helped me tremendously. I wanted to make random 0/1 matrices with a fixed number of 1s. This is what I ended up doing.

# Specify the number of 1s needed in the matrix.

y<-100

# Make a vector with the appropriate number of 0s and 1s.

x<-c(rep(1, y), rep(0, r*c-y))

m6 <- matrix(sample(x,r*c, replace=FALSE),r,c)

sum(m6)==y #Check that it worked- this should be true.

Is there a better way?

Hi Carrie, I think you nailed it and works very fast.

It turns out that I actually needed matrices that had at least one 1 in each row and column. Here’s what I came up with and put in all in a loop. So far it seems to work.

# Stand-alone example

y<-20 #Number of 1s needed in the matrix

r<-10 #Rows

c<-10 #Columns

# Make a matrix and fill it with NAs

mrand<-matrix(NA, r, c)

randcolc

for(j in 1:r){

mrand[j,randcol[j]]<-1

}

coltots<-.colSums(mrand, r, c, na.rm=TRUE) #sum each column, ignoring NAs. Save as a vector.

fix<-grep(0, coltots) #Find which columns have 0 sums and save as a vector.

randrow<-sample(1:r, length(fix), replace=TRUE) #Randomly selects as many row numbers as there are columns with sum 0.

for(k in 1:length(fix)){ #loop through the columns that need a 1.

mrand[randrow[k], fix[k]]<-1

}

msum<-sum(mrand, na.rm=TRUE)

vec<-c(rep(1, y-msum), rep(0, r*c-y)) #Make a vector with the remaining number of 1s to be filled in and the rest 0s

fill<-which(is.na(mrand)==TRUE) #Make a vector of the index numbers of each element of the matrix that needs to be fill (i.e. is currently NA)

fillvalues<-sample(vec, length(fill), replace=FALSE) #randomly sample values from vec without replacement.

for(l in 1:length(fill)){

mrand[fill[l]]<-fillvalues[l] #Put the randomly sampled 1s and 0s in the places in the matrix that were NA.

}

#mrand

sum(mrand)==y #Does the matrix have the right number of 1s?

Any suggestions for improvement?