asreml, job anatomy
An ASReml job is a set of commands in a text file (usually with extension .as) providing information about the data and the statistical models that we want to fit. The commands file has three parts:
- that is the first line of an ASReml job. It usually includes a succint description or title for the job. This header is used in the output file and titles for the plots produced by the program.
- Data section
- where the user specifies the structure of the dataset (including both factors and variables), where is the dataset (and any other additional files required for the analysis) located, and options for reading the data.
- Model section
- where the user specifies one or more equations and covariance structures, to fully describe the models being fit.
Describe your dataset
- are the classification variables that you have in your dataset. Things like site (Hobart, Burnie), fertilizer (0, 100, 500), rep (1, 2, …, 50), etc. For each factor you need to specify the number of levels (2 for site, 3 for fertilizer, 4 for subrace, 50 for rep) and the type of coding that you are using. If you use consecutive integers starting from 1 (like rep) you do not need to specify coding, if you specify non consecutive integers (like fertilizer) you will use !I, for alphanumeric factors (like site) you will use !A. These factors would look like: site 2 !A fertilizer 3 !I rep 50
- include response variables as well as any other covariables you assessed. By default, ASReml considers ., * and NA as missing values. You can define any other character as missing values using the qualifier !M; for example, growth !M 0 considers 0 as a missing value. When setting the file containing your dataset is a good idea to explicitly define missing values (for example, always use a .). Any data transformation follows the variable names.
- Data source
is where you use the name of the text file containing your dataset. If the file is not in the same directory of the .as file you will need to include the path to the file. There should not be any space between the beginning of the line and the file name. This is the line where you can also add information about the format of the file (for example !csv for a comma delimited file) and create subsets of data. For example:
growth density d:\research\woodgrowth.csv !csv
Remember: Field names are case sensitive, so tree and Tree are not the same. Always leave a space before a field name; ASReml needs the space to know the difference between the data section and the rest of the command file.
There is a large number of options that can be checked in Chapters 3 and 4 of the ASReml User’s Manual. You can also find extra hints in the “data tips” section of these notes.
Know thy models
ASReml notation aims to collect information about the model being fitted to the dataset. Hence, it is fundamental to understand the model before trying to fit it! A model comprises two parts i) the model equation and ii) the moments or distributional properties of the model. If the user provides only the model equation, ASReml will use the simplest Covariance Structures available, namely a variance pre- and post-multiplied by the design matrices. For example, with the following code for a trial with an incomplete block design:
growth ~ mu rep !r rep.iblock family
ASReml will assume that the the variance of the residuals will follow a distribution I σe2, while the variances of incomplete blocks and families will be Zb σb2 Zb and Zf σf2 Zf respectively. This is OK if you want to use these defaults, but maybe you want to fit different covariance structures.
Model equations are constructed as a linear combination of ASReml keywords (for example, mu), effects and covariates. You can specify simple factors, interactions, nesting, covariances, polynomial functions, splines, etc. By default all equations include an error term.
Some useful keywords:
- fits the overall intercept.
- is the multivariate version of mu. It creates a vector holding the overall means for each trait included in the analysis.
- generates polynomial functions based on a covariate. Usage: pol(var,n) generates a polynomial of n degree based on covariate var.
Some common models:
|var ~ mu Factor||Single fixed factor|
|var ~ mu Factor1 Factor2||Two fixed factors|
|var ~ mu Factor1 Factor2 Factor1.Factor2||Two fixed factors with interaction|
|var ~ mu !r Factor||Single random factor|
|var ~ mu Factor1 !r Factor2||Mixed model with one fixed and one random effect|
|var ~ mu Factor1 !r Factor1.Factor2||Mixed model where the second factor is nested within the first one|
|var ~ mu covariate Factor||Single fixed factor with one covariance|