Luis Apiolaza — Palimpsest

happy pigs = happy bacon

ASReml — Spatial Analysis

The following code is adapted from code borrowed from Greg Dutkowski (thanks Greg!). The example presents a ‘typical’ complete block design and then adds spatial components to it.

 Examples for spatial analysis
  tree       !P
  family  50 !I
  row     19
  col     84
  rep      5
  iblock  10
  plot   400
  dbh
 radiata.ped !make !alpha
 radiata.csv !csv !dopart $A

 # Analysis as random incomplete blocks
 # includes additive (tree) and SCA (family) effects
 !part 1
 dbh ~ mu rep !r rep.iblock plot tree family !f mv
 0

 # Analysis as random incomplete blocks
 # but adding spatial row and col autoregressive
 # processes and independent error (units)
 !part 2
 dbh ~ mu rep !r rep.iblock plot tree family units !f mv
 1 2 0
 row row AR 0.8
 col col AR 0.8

 # Spatial analysis dropping non-significant design
 # features (see text)
 !part 3
 dbh ~ mu rep !r tree family units !f mv
 1 2 0
 row row AR 0.8
 col col AR 0.8

 # Analysis as random complete blocks
 # with spatial component. However, spatial is
 # represented in the G matrix rather than the 
 # R matrix
 !part 4
 dbh ~ mu !r row.col tree family
 1 1 1
 0 0 IDEN
 row.col 2
 row row `AR1V 0.9 01
 col col AR 0.9

Part 1 of the code is simply an incomplete block design with random plot, additive (tree) and SCA (Family) effects. You can see explanations for this code in the UnivariateAnalysis section of the cookbook. Part 2 introduces a few additional elements:

  • The 1 2 0 states there is ONE error structure that is the Kronecker product of TWO matrices (that are explained below).
  • Separate autoregressive (AR) errors for rows and columns. The first row states the factor that we are using to fit the AR residual. The second row specifies the ‘distance’ measure. Same thing for col. the 0.9 is the starting value for each of the autoregressive coefficients.
  • units refers to the uncorrelated (non spatial) residual. Thus, our model includes spatially associated residuals (explained in the previous two points) and spatially independent residuals. Do not forget to include units, because if it is not in the model you will probably get inflated estimates of heritability and think that spatial analysis is the best thing after sliced bread (and a long warm shower and …). Well, it is not. It can help you with patchy situations (for example disease scores) or to discover assessor effects, but in well designed experiments and standard traits (e.g. volume, basic density) will not make much of a difference.
  • You will also need to fit missing values as fixed effects (!f mv). Most likely you do not have a perfectly rectangular experiment, but you can create a rectangular matrix with missing values and then fitting the missing values in the model. This will speed up (or even make possible) the use of spatial analysis in ASReml. Remember: to use part 2 or part 3 (modelling spatial through the residual matrix R) you must have a rectangular distribution of trees.

Most of the time, when you fit spatial and independent residuals (Part 2), the experimental design features will become non significant and you will drop them from the model (the exception, if any, will probably be the plot effect). Part 3 shows the likely form of the model without incomplete block and plot effects. Part 4 is a bit more intriguing, although it is equivalent to Part 3. There, rather than modelling the spatial residuals in the R residual matrix, we are modelling them as additional random effects, that is in the G matrix. Thus,

  • 1 1 1 There is ONE residual structure, which is the product of ONE identity matrix multiplied by a scalar. There is also ONE additional covariance structure.
  • The additional covariance structure row.col is the Kronecker product of TWO matrices, which have the same structure as the autoregressive residual matrices of Parts 2 and 3.
  • Modelling spatial correlation through G does not require a rectangular distribution of trees. Part 4 is faster, but it may not run for large arrays.

You can revisit a longer explanation of [Covariance Structures](\covariance-structures).

Spatial analysis for irregularly spaced trees

The above code works well with equally spaced individuals; however, there are examples when the distances between trees are variable (for example when using GPS measurements of position). In this case, the code can be changed to something like (Thanks to Arthur Gilmour):

 ...
 # Scale the coordinates so a typical distance 
 # is around 1.
  y  !/10     
  x  !/10

 filename.dat !dopart $A
 !part 1
 dbh ~ mu
 0

 # With Nugget variance fitted as residual
 !part 2  
 dbh ~ mu !r fac(x,y)
 0 0 1
 fac(x,y) 1
 fac(x,y) fac(x,y) IEXPV 0.2 1

 # with Nugget variance in G structure, 
 # spatial in residual
 !part 3
 dbh ~ mu x y !r units 1
 1 1 0
 0 910 EUC 0.2

In these jobs we are not including any genetic components, just to highlight the spatial syntax. Part 1 obviously fits only the overall mean. Part 2 creates a factor based on the combinations of two continuous variables, and allows for the spatial association in the G structure (in the fac(x,y) 1 … part). Part 3 fits the distances as covariates and allow for the spatial association at the R structure level. In the code 0 910 EUC 0.2, the 910 refers to variables 9 and 10 in the listing of read variables (x and y). Parts 2 and 3 are memory hogs, so try to stay clear of them if you can. An option would be to create an equally spaced grid and then assign each tree to the closest point in that grid. This would convert the problem to something that can be treated with the code used at the beginning of this page.