Evolving notes, images and sounds by Luis Apiolaza

Category: r (Page 18 of 20)

Simulating data following a given covariance structure

Every year there is at least a couple of occasions when I have to simulate multivariate data that follow a given covariance matrix. For example, let’s say that we want to create an example of the effect of collinearity when fitting multiple linear regressions, so we want to create one variable (the response) that is correlated with a number of explanatory variables and the explanatory variables have different correlations with each other.

There is a matrix operation called Cholesky decomposition, sort of equivalent to taking a square root with scalars, that is useful to produce correlated data. If we have a covariance matrix M, the Cholesky descomposition is a lower triangular matrix L, such as that M = L L'. How does this connect to our simulated data? Let’s assume that we generate a vector z of random normally independently distributed numbers with mean zero and variance one (with length equal to the dimension of M), we can create a realization of our multivariate distribution using the product L z.
Continue reading

Setting plots side by side

This is simple example code to display side-by-side lattice plots or ggplot2 plots, using the mtcars dataset that comes with any R installation. We will display a scatterplot of miles per US gallon (mpg) on car weight (wt) next to another scatterplot of the same data, but using different colors by number of engine cylinders (cyl, treated as factor) and adding a smooth line (under the type option).
Continue reading

Reading HTML pages in R for text processing

We were talking with one of my colleagues about doing some text analysis—that, by the way, I have never done before—for which the first issue is to get text in R. Not any text, but files that can be accessed through internet. In summary, we need to access an HTML file, parse it so we can access specific content and then remove the HTML tags. Finally, we may want to replace some text (the end of lines, \n, for example) before continue processing the files.

The package XML has the necessary functionality to deal with HTML, while the rest is done using a few standard R functions.
Continue reading

Operating on datasets inside a function

There are times when we need to write a function that makes changes to a generic data frame that is passed as an argument. Let’s say, for example, that we want to write a function that converts to factor any variable with names starting with a capital letter. There are a few issues involved in this problem, including:

  • Obtaining a text version of the name of the dataset (using the substitute() function).
  • Looping over the variable names and checking if they start with a capital letter (comparing with the LETTERS vector of constants).
  • Generating the plain text version of the factor conversion, glueing the dataset and variable names (using paste()).
  • Parsing the plain text version of the code to R code (using parse()) and evaluating it (using eval()). This evaluation has to be done in the parent environment or we will lose any transformation when we leave the function, which is the reason for the envir() specification.

Continue reading

« Older posts Newer posts »

© 2024 Palimpsest

Theme by Anders NorenUp ↑