The following example is intended
to show the usage of OneMap
functions for linkage mapping
in outcrossing (non-inbred) populations. With basic knowledge of R
syntax, one should have no problems using it. If you are not familiar
with R, we recommend reading the vignette Introduction
to R.
Hopefully, these examples will be clear enough to help any user to understand its functionality and start using it. You do not need to be an expert in R to build your linkage map, but some concepts are necessary and will help you through the process.
There is a GitHub OneMap
version which is continuously
improved, we strongly recommend all users to try this version. On
cristianetaniguti/onemap
GitHub page you can find
instructions to install the package from GitHub and also more fancy
tutorials.
This step may be quite difficult because the data file is not very simple, and some errors can occur while reading it. The input file format is similar to that used by MAPMAKER/EXP (Lander et al., 1987), so experienced users of genetic analysis software should be already familiar with this scenario.
The input file is a text file, where the first line indicates the
cross-type, and the second line provides information about the number of
individuals, the number of markers, the presence of physical marker
locations, and the presence of phenotypic data. The third line contains
the sample IDs. Then, the genotype information is included separately
for each marker. The character *
indicates the beginning of
information input for a new marker, followed by the marker name. Next,
there is a code indicating the marker type, according to Wu’s et
al. (2002a) notation. It is recommended to check Wu’s et al. (2002a)
paper before using OneMap
.
Marker types must be one of the following: A.1
,
A.2
, A.3
, A.4
, B1.5
,
B2.6
, B3.7
, C.8
,
D1.9
, D1.10
, D1.11
,
D1.12
, D1.13
, D2.14
,
D2.15
, D2.16
, D2.17
or
D2.18
, each one corresponding to a row of the following
table:
Parent | Offspring | |||||
---|---|---|---|---|---|---|
Crosstype | Cross | Observed bands | Observed bands | Segregation | ||
A | 1 | ab × cd | ab × cd | ac, ad, bc, bd | 1 : 1 : 1 : 1 | |
2 | ab × ac | ab × ac | a, ac, ba, bc | 1 : 1 : 1 : 1 | ||
3 | ab × co | ab × c | ac, a, bc, b | 1 : 1 : 1 : 1 | ||
4 | ao × bo | a × b | ab, a, b, o | 1 : 1 : 1 : 1 | ||
B | B1 | 5 | ab × ao | ab × a | ab, 2a, b | 1 : 2 : 1 |
B2 | 6 | ao × ab | a × ab | ab, 2a, b | 1 : 2 : 1 | |
B3 | 7 | ab × ab | ab × ab | a, 2ab, b | 1 : 2 : 1 | |
C | 8 | ao × ao | a × a | 3a, o | 3 : 1 | |
D | D1 | 9 | ab × cc | ab × c | ac, bc | 1 : 1 |
10 | ab × aa | ab × a | a, ab | 1 : 1 | ||
11 | ab × oo | ab × o | a, b | 1 : 1 | ||
12 | bo × aa | b × a | ab, a | 1 : 1 | ||
13 | ao × oo | a × o | a, o | 1 : 1 | ||
D2 | 14 | cc × ab | c × ab | ac, bc | 1 : 1 | |
15 | aa × ab | a × ab | a, ab | 1 : 1 | ||
16 | oo × ab | o × ab | a, b | 1 : 1 | ||
17 | aa × bo | a × b | ab, a | 1 : 1 | ||
18 | oo × ao | o × a | a, o | 1 : 1 |
Letters A
, B
, C
and
D
indicate the segregation type (i.e.,
1:1:1:1
, 1:2:1
, 3:1
or
1:1
, respectively), while the number after the dot
(e.g., A.1
) indicates the observed bands in the
offspring. The paper cited above gives details with respect to marker
types; we will not discuss them here, but it is easy to see that each
marker is classified based on the band patterns of parents and
progeny.
Finally, after each marker name, comes the genotype data for the
segregating population. The coding for marker genotypes used by
OneMap
is also the same one proposed by Wu et al. (2002a),
and the possible values vary according to the specific marker type.
Missing data are indicated with the character -
(minus
sign), and an empty space separates the information for each individual.
Phenotype information, if present, follows genotypic data with a similar
structure. Details are found with the help of function
read_onemap
.
Here is an example of such a file for 10 individuals and 5 markers (the three zeros in the second line indicate that there is no chromosome information, physical position information, or phenotypic data, respectively). It is very similar to a MAPMAKER/EXP file, but has additional information about the cross-type.
data type outcross
10 5 0 0 0
I1 I2 I3 I4 I5 I6 I7 I8 I9 I10
*M1 B3.7 ab ab - ab b ab ab - ab b
*M2 D2.18 o - a a - o a - o o
*M3 D1.13 o a a o o - a o a o
*M4 A.4 ab b - ab a b ab b - a
*M5 D2.18 a a o - o o a o o o
In case you have physical chromosome and position information:
data type outcross
10 5 1 1 0
I1 I2 I3 I4 I5 I6 I7 I8 I9 I10
*M1 B3.7 ab ab - ab b ab ab - ab b
*M2 D2.18 o - a a - o a - o o
*M3 D1.13 o a a o o - a o a o
*M4 A.4 ab b - ab a b ab b - a
*M5 D2.18 a a o - o o a o o o
*CHROM 1 1 1 2 2
*POS 2391 3812 5281 1823 3848
Notice that once the marker type is identified, no variations of
symbols presented on the table for the observed bands
are allowed. For example, for A.1
, only ac
,
ad
, bc
, and bd
genotypes are
expected (plus missing values). We notice in FAQs that this is a
common mistake made by users, so please be careful.
The input file must be saved in text format, with extensions like
.raw
. It is a good idea to open the text file called
onemap_example_out.raw
(available with OneMap
and saved in the directory you installed it) to see how this file should
be. You can see where OneMap
is installed using the
command:
In the session Importing data from VCF file
below, you will see how to import VCF files as OneMap
objects.
Once the input file is created, the data can be loaded and saved into
an R onemap
object. The function used to import data is
named read_onemap
. Its usage is quite simple:
onemap_example_out <- read_onemap(dir = "C:/workingdirectory", inputfile = "onemap_example_out.raw")
The first argument is the directory where the input file is located,
so modify it accordingly. The second one is the data file name. In this
example, an object named onemap_example_out
was created. If
you leave the argument dir
blank, the file will be loaded
from your working directory
.
You can change the working directory in R using function
setwd()
or in the toolbar clicking
File -> Change dir
. If you set your working directory to
the one containing the input file, you can just type:
If no error has occurred, a message will display some basic information about the data, such as number of individuals and number of markers:
#> Working...
#>
#> --Read the following data:
#> Type of cross: outcross
#> Number of individuals: 100
#> Number of markers: 30
#> Chromosome information: no
#> Position information: no
#> Number of traits: 3
#> Missing trait values:
#> Pheno1: 0
#> Pheno2: 3
#> Pheno3: 0
Because this particular dataset is distributed along with the package, as an alternative you can load it by typing:
Loading the data creates an object of class onemap
,
which will further be used in the analysis. R command print
recognizes objects of this class. Thus, if you type:
you will see some information about the object:
#> This is an object of class 'onemap'
#> Type of cross: outcross
#> No. individuals: 100
#> No. markers: 30
#> CHROM information: no
#> POS information: no
#> Percent genotyped: 100
#>
#> Segregation types:
#> A.1 --> 3
#> A.2 --> 1
#> A.4 --> 4
#> B1.5 --> 1
#> B2.6 --> 2
#> B3.7 --> 5
#> C.8 --> 2
#> D1.10 --> 2
#> D1.12 --> 1
#> D1.13 --> 2
#> D2.15 --> 1
#> D2.16 --> 2
#> D2.17 --> 2
#> D2.18 --> 2
#>
#> No. traits: 3
#> Missing trait values:
#> Pheno1: 0
#> Pheno2: 3
#> Pheno3: 0
Also, you can use the plot.onemap
function to see
graphically markers genotypes:
Changing the argument all
to FALSE
, the
markers will be separated by their type. In this case, you can note that
the graphic cell size will adapt to the number of markers of the same
type. In other words, the higher is the number of markers with the same
type, the lower is the cell for this type.
This function can take quite some time, depending on the number of
markers involved. More information about this plot function can be found
using ?plot.onemap
.
Also, you can see the number of markers by segregation pattern with
the plot_by_segreg_type
function:
You can import information from VCF
to
OneMap
using onemap_read_vcfR
function.
With the onemap_read_vcfR
you can convert the object
from vcfR
package directly to onemap
. The
onemap_read_vcfR
function keeps chromosome and position
information for each marker in the onemap object generated.
We will use the example file vcf_example_out.vcf.gz
to
show how it works, which contains markers from the same population of
onemap_example_out.raw
.
Here we use the the vcfR
package internally to help this
conversion. The vcfR
authors mentioned in their tutorials
that RAM memory use is an important consideration when using the
package. Depending of your dataset, the object created can be huge and
occupy a lot of memory.
You can use onemap_read_vcfR
function to convert the VCF
file to onemap
object. The parameters used are the
vcf
with the VCF file path, the identification of each
parent (here, you must define only one sample for each parent) and the
cross type.
vcf_example_out <- onemap_read_vcfR(vcf = system.file("extdata/vcf_example_out.vcf.gz", package = "onemap"),
parent1 = "P1",
parent2 = "P2",
cross = "outcross")
Depending on your dataset, this function can take some time to run.
Note that the conversion filter out markers which are not informative
for the informed cross type. For example, in outcrossing species,
markers that have both parents homozygous (aa x bb) do not inform
recombinations and are removed of the data set. Only markers types
contained the table at Creating the
data file are kept in the onemap object. Function
onemap_read_vcfR
print at the screen the reason why markers
were filtered.
You can also have more missing data in the returned object compared
with the VCF because the onemap_read_vcfR
replace by
missing data the genotypes that are not expected for that marker type.
For example, for a marker type D1.10 (ab
x
aa
), we only expect aa
and ab
genotypes, if there are bb
genotypes they will be replaced
by missing data. You can see the percentage of missing data at the
resulted onemap object with:
vcf_example_out
#> This is an object of class 'onemap'
#> Type of cross: outcross
#> No. individuals: 92
#> No. markers: 24
#> CHROM information: yes
#> POS information: yes
#> Percent genotyped: 99
#>
#> Segregation types:
#> B3.7 --> 18
#> D1.10 --> 6
#>
#> No. traits: 0
NOTE:From version 2.0.6 to 2.1.1005,
OneMap
had the vcf2raw
function to convert
vcf
to .raw
. Now, this function is defunct,
but it can be replaced by a combination of onemap_read_vcfR
and write_onemap_raw
functions. See Exporting .raw file from
onemap object session to further information about
write_onemap_raw
.
If your onemap object has too many missing genotypes you can face problems during the analysis. Check the percentage of missing genotypes in our data set printing the onemap object:
vcf_example_out
#> This is an object of class 'onemap'
#> Type of cross: outcross
#> No. individuals: 92
#> No. markers: 24
#> CHROM information: yes
#> POS information: yes
#> Percent genotyped: 99
#>
#> Segregation types:
#> B3.7 --> 18
#> D1.10 --> 6
#>
#> No. traits: 0
Our example has 1% of missing genotypes (99% are genotyped). If you
want to filter markers according to their percentage of missing data,
you can use the function filter_missing
:
vcf_filtered <- filter_missing(vcf_example_out, threshold = 0.25)
#> Number of markers removed from the onemap object: 0
Any of our markers were filtered, because, in this example, we do not have much missing data.
Function create_depth_profile
generates dispersion
graphics with x and y-axis representing, respectively, the reference and
alternative allele depths. The function is only available for biallelic
markers in VCF files with allele counts information. Each dot represents
a genotype for mks
markers and inds
individuals. If both arguments receive NULL
, all markers
and individuals are considered. Dots are colored according to the
genotypes present in the onemap object (GTfrom = onemap
) or
VCF file (GTfrom = vcf
). A rds file is generated with the
data in the graphic (rds.file
). The alpha
argument controls the transparency of the color of each dot. Control
this parameter is a good idea when having a large number of markers and
individuals. The x_lim
and y_lim
control the
axis scale limits; by default, it uses the maximum value of the
counts.
Here is an example for the vcf_example_out
dataset.
# For outcrossing population
create_depths_profile(onemap.obj = vcf_example_out,
vcf = system.file("extdata/vcf_example_out.vcf.gz", package = "onemap"),
parent1 = "P1",
parent2 = "P2",
vcf.par = "AD",
recovering = FALSE,
mks = NULL,
inds = NULL,
GTfrom = "vcf",
alpha = 0.1,
rds.file = "depths_out.rds")
Because the genotypes are from VCF file, the legend points the VCF
codification? ./.
represent missing data; 0/0
homozygotes for reference alleles; 0/1
heterozygotes;
1/1
homozygotes for alternative alleles. You can also have
phased genotypes represented which have pipe |
instead of
bar /
.
By default, OneMap
sets a error probability of 10−5 for every genotype:
head(vcf_example_out$error)
#> [,1] [,2] [,3] [,4]
#> SNP1_IND1 0.99999 3.333333e-06 3.333333e-06 3.333333e-06
#> SNP2_IND1 0.99999 3.333333e-06 3.333333e-06 3.333333e-06
#> SNP3_IND1 0.99999 3.333333e-06 3.333333e-06 3.333333e-06
#> SNP4_IND1 0.99999 3.333333e-06 3.333333e-06 3.333333e-06
#> SNP5_IND1 0.99999 3.333333e-06 3.333333e-06 3.333333e-06
#> SNP6_IND1 0.99999 3.333333e-06 3.333333e-06 3.333333e-06
See Taniguti et. al, 2023 Supplementary File 1 for details about the
$error
object format.
For markers from sequencing technology, this value is unrealistic and
generates inflated linkage maps. OneMap
3.0 can consider
three types of genotype probabilities to consider error in the HMM
chain. A single global value (global_error), a matrix with dimensions
(number of individuals) x (number of markers) with genotypes errors
values (genotypes_errors), and a matrix with dimensions (number of
individuals)*(number of markers) x possible genotypes (genotypes_probs).
See details in the function create_probs
:
If you have a VCF file, you can use the function
extract_depth
to obtain the genotypes_errors and
genotypes_probs from the GQ or PL or GL format field:
library(vcfR)
vcfR_object <- read.vcfR(system.file("extdata/vcf_example_out.vcf.gz", package = "onemap"))
#> Scanning file to determine attributes.
#> File attributes:
#> meta lines: 8
#> header_line: 9
#> variant count: 25
#> column count: 103
#> Meta line 8 read in.
#> All meta lines processed.
#> gt matrix initialized.
#> Character matrix gt created.
#> Character matrix gt rows: 25
#> Character matrix gt cols: 103
#> skip: 0
#> nrows: 25
#> row_num: 0
#> Processed variant: 25
#> All variants processed
genotypes_errors <- extract_depth(vcfR.object = vcfR_object,
onemap.object=vcf_example_out,
vcf.par= "GQ",
parent1="P1",
parent2="P2",
recovering=FALSE)
genotypes_errors[10:50, 1:5]
#> SNP1 SNP2 SNP3 SNP4 SNP5
#> IND12 1.258925e-10 1.258925e-08 1.258925e-10 1.258925e-10 1.584893e-10
#> IND13 1.584893e-10 1.584893e-10 2.511886e-10 2.511886e-10 3.981072e-10
#> IND14 1.584893e-09 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10
#> IND15 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10
#> IND16 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10
#> IND17 1.258925e-10 1.258925e-10 2.511886e-10 2.511886e-10 1.258925e-10
#> IND18 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10
#> IND19 1.258925e-10 1.584893e-10 1.258925e-10 1.258925e-10 1.258925e-10
#> IND20 1.258925e-10 1.258925e-10 3.981072e-10 3.981072e-10 1.258925e-10
#> IND21 1.584893e-10 1.584893e-10 3.981072e-10 3.981072e-10 2.511886e-10
#> IND22 2.511886e-10 1.258925e-08 1.258925e-08 1.258925e-08 1.584893e-09
#> IND23 1.584893e-10 1.258925e-10 1.258925e-10 1.258925e-10 1.584893e-10
#> IND25 2.511886e-10 3.981072e-10 1.584893e-09 1.584893e-09 2.511886e-10
#> IND26 1.258925e-10 1.584893e-10 1.584893e-10 1.584893e-10 1.584893e-10
#> IND27 1.258925e-10 1.258925e-10 2.511886e-10 2.511886e-10 2.511886e-10
#> IND28 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10
#> IND30 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10
#> IND31 2.511886e-07 2.511886e-10 1.584893e-10 1.584893e-10 3.981072e-10
#> IND32 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10 2.511886e-10
#> IND33 1.584893e-10 2.511886e-10 1.258925e-08 1.258925e-08 3.981072e-10
#> IND34 1.258925e-10 3.981072e-10 1.258925e-10 1.258925e-10 3.981072e-10
#> IND35 1.584893e-10 3.981072e-10 1.258925e-10 1.258925e-10 2.511886e-10
#> IND36 1.258925e-10 1.584893e-10 1.258925e-10 1.258925e-10 3.981072e-10
#> IND37 1.258925e-08 1.584893e-09 2.511886e-10 2.511886e-10 1.584893e-09
#> IND38 1.258925e-10 1.258925e-10 1.584893e-10 1.584893e-10 1.258925e-10
#> IND39 1.584893e-10 1.584893e-10 2.511886e-10 2.511886e-10 1.258925e-10
#> IND40 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10 1.584893e-10
#> IND42 2.511886e-10 3.981072e-10 1.258925e-10 1.258925e-10 2.511886e-10
#> IND43 3.981072e-10 1.258925e-10 1.584893e-10 1.584893e-10 2.511886e-10
#> IND44 2.511886e-10 1.258925e-10 1.584893e-10 1.584893e-10 1.258925e-10
#> IND45 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10
#> IND46 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10
#> IND47 1.584893e-10 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10
#> IND48 1.584893e-09 1.584893e-10 1.584893e-10 1.584893e-10 1.258925e-10
#> IND49 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10 1.584893e-09
#> IND50 2.511886e-10 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10
#> IND51 1.258925e-10 2.511886e-10 1.258925e-10 1.258925e-10 3.981072e-10
#> IND52 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10 2.511886e-10
#> IND53 1.258925e-10 1.584893e-09 1.584893e-09 1.584893e-09 1.258925e-10
#> IND54 1.258925e-10 1.584893e-10 1.584893e-10 1.584893e-10 1.258925e-08
#> IND55 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10 1.258925e-10
onemap_obj_errors <- create_probs(vcf_example_out, genotypes_errors = genotypes_errors)
head(onemap_obj_errors$error)
#> [,1] [,2] [,3] [,4]
#> SNP1_IND1 1 4.196418e-09 4.196418e-09 4.196418e-09
#> SNP2_IND1 1 8.372955e-11 8.372955e-11 8.372955e-11
#> SNP3_IND1 1 4.196418e-11 4.196418e-11 4.196418e-11
#> SNP4_IND1 1 4.196418e-11 4.196418e-11 4.196418e-11
#> SNP5_IND1 1 4.196418e-11 4.196418e-11 4.196418e-11
#> SNP6_IND1 1 5.282977e-11 5.282977e-11 5.282977e-11
genotypes_probs <- extract_depth(vcfR_object,
vcf_example_out,
vcf.par = "PL",
parent1 = "P1",
parent2 = "P2")
genotypes_probs[1:5, ]
#> [,1] [,2] [,3]
#> SNP1_IND1 0.7992400 0.200759999 5.042863e-08
#> SNP2_IND1 0.9693466 0.030653430 9.693466e-19
#> SNP3_IND1 0.9921193 0.007880684 6.259850e-26
#> SNP4_IND1 0.9921193 0.007880684 6.259850e-26
#> SNP5_IND1 0.9921193 0.007880684 6.259850e-26
onemap_obj_probs <- create_probs(vcf_example_out, genotypes_probs = genotypes_probs)
head(onemap_obj_probs$error)
#> [,1] [,2] [,3] [,4]
#> SNP1_IND1 0.7992400 0.200759999 0.200759999 5.042863e-08
#> SNP2_IND1 0.9693466 0.030653430 0.030653430 9.693466e-19
#> SNP3_IND1 0.9921193 0.007880684 0.007880684 6.259850e-26
#> SNP4_IND1 0.9921193 0.007880684 0.007880684 6.259850e-26
#> SNP5_IND1 0.9921193 0.007880684 0.007880684 6.259850e-26
#> SNP6_IND1 0.9843983 0.015601662 0.015601662 2.472697e-22
According to results from Reads2Map using a global error rate can be the best solution in most cases. However, the software probabilities are useful to filter markers before starting the linkage map building:
hist(genotypes_errors) # Check distribution to define threshold - it will change according to the software used for genotype calling
summary(as.vector(genotypes_errors))
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> 0 0 0 0 0 0 4
onemap_obj_prob_filt <- filter_prob(onemap_obj_probs, threshold = 0.9)
#> 150 genotypes were converted to missing data.
Now, we can set the genotype probabilities according to the selected global error:
Note: This section is included solely to demonstrate the possibility of adjusting the error probability. This adjustment is not applied in subsequent steps of this tutorial.
If you have multiallelic markers (MNPs) in your VCF file, set the
only_biallelics
to FALSE
. Here, we have the
multiallelic markers in a separated file:
onemap_obj_multi <- onemap_read_vcfR("roses_populations.haps.new.names.vcf", parent1 = "PH", parent2 = "J14-3", cross = "outcross", only_biallelic = FALSE) # Example data not available
onemap_obj_multi
6659 Markers were removed of the dataset because one or both of parents have no informed genotypes (are missing data)
8351 Markers were removed from the dataset because both of parents are homozygotes, these markers are considered non-informative in outcrossing populations.
This is an object of class 'onemap'
Type of cross: outcross
No. individuals: 138
No. markers: 7991
CHROM information: yes
POS information: yes
Percent genotyped: 81
Segregation types:
A.1 --> 830
A.2 --> 1891
B3.7 --> 548
D1.10 --> 1949
D1.9 --> 673
D2.14 --> 641
D2.15 --> 1459
No. traits: 0
This one does not have genotype probabilities information and erroneous multiallelic markers can generate higher impact in the map quality. Therefore, we will use a global error rate of 0.1 for them (value also defined using Reads2Map workflows):
OneMap
datasetsIf you have more than one dataset of markers, all from the same
mapping population, you can use the function combine_onemap
to merge them into only one onemap
object.
In our example, we have two datasets:
onemap_example_out
with 30 markers and 100
individualsvcf_example_out
with 24 biallelic markers and
92 individuals.The combine_function
recognizes the correspondent
individuals by the ID, thus, it is important to define the same IDs to
respective individuals in both raw
files. Compared with the
first file, the second file does not have markers information for 8
individuals. The combine_onemap
will complete this
information with NA.
In our examples, we have only genotypic information, but the function can also merge the phenotypic information if it exists.
comb_example <- combine_onemap(onemap_example_out, vcf_example_out)
comb_example
#> This is an object of class 'onemap'
#> Type of cross: outcross
#> No. individuals: 100
#> No. markers: 54
#> CHROM information: yes
#> POS information: yes
#> Percent genotyped: 96
#>
#> Segregation types:
#> A.1 --> 3
#> A.2 --> 1
#> A.4 --> 4
#> B1.5 --> 1
#> B2.6 --> 2
#> B3.7 --> 23
#> C.8 --> 2
#> D1.10 --> 8
#> D1.12 --> 1
#> D1.13 --> 2
#> D2.15 --> 1
#> D2.16 --> 2
#> D2.17 --> 2
#> D2.18 --> 2
#>
#> No. traits: 3
#> Missing trait values:
#> Pheno1: 0
#> Pheno2: 3
#> Pheno3: 0
The function arguments are the names of the onemap
objects you want to combine.
Plotting markers genotypes from the outputted onemap
object, we can see that there are more missing data -
(black vertical lines) for some individuals because they were missing in
the second file.
It is possible that there are redundant markers in your dataset, especially when dealing with too many markers. Redundant markers have the same genotypic information that other markers because they didn’t happen recombination events between each other. They will not increase information on the map but will increase computational effort during the map building. Therefore, it is a good practice to remove them to build the map and, once the map is already built, they can be added again.
First, we use the function find_bins
to group the
markers into bins according to their genotypic information. In other
words, markers with the same genotypic information will be in the same
bin.
bins <- find_bins(comb_example, exact = FALSE)
bins
#> This is an object of class 'onemap_bin'
#> No. individuals: 100
#> No. markers in original dataset: 54
#> No. of bins found: 52
#> Average of markers per bin: 1.038462
#> Type of search performed: non exact
The first argument is the onemap
object and the
exact
argument specifies if only markers with the same
information will be at the same bin. Using FALSE
at this
second argument, missing data will not be considered, and the marker
with the lowest amount of missing data will be the representative marker
on the bin.
Our example dataset has only two redundant markers. We can create a
new onemap
object without them, using the
create_data_bins
function. This function keeps only the
most representative marker of each bin from the bins
object.
bins_example <- create_data_bins(comb_example, bins)
bins_example
#> This is an object of class 'onemap'
#> Type of cross: outcross
#> No. individuals: 100
#> No. markers: 52
#> CHROM information: yes
#> POS information: yes
#> Percent genotyped: 96
#>
#> Segregation types:
#> A.1 --> 3
#> A.2 --> 1
#> A.4 --> 4
#> B1.5 --> 1
#> B2.6 --> 2
#> B3.7 --> 22
#> C.8 --> 2
#> D1.10 --> 7
#> D1.12 --> 1
#> D1.13 --> 2
#> D2.15 --> 1
#> D2.16 --> 2
#> D2.17 --> 2
#> D2.18 --> 2
#>
#> No. traits: 3
#> Missing trait values:
#> Pheno1: 0
#> Pheno2: 3
#> Pheno3: 0
The arguments for the create_data_bins
function are the
onemap
object and the object created by the
find_bins
function.
The functions onemap_read_vcfR
generates new onemap
objects without use a input .raw
file. Also, the functions
combine_onemap
and create_data_bins
manipulate
the information of the original .raw
file and creates a new
dataset. In both cases, you do not have an input file .raw
that contains the same information as the analyzed data. If you want to
create a new input file with the dataset you are working on after using
these functions, you can use the function
write_onemap_raw
.
The file new_dataset.raw
will be generated in your
working directory. In our example, it contains only non-redundant
markers from onemap_example_out
and
vcf_example_out
datasets.
For the map building process, it is also important to know which markers have deviations in the expected segregation pattern. It can be a good practice to remove them from the map building process, because they can adversely affect the map building, and, once the map is built, they can be inserted.
The function test_segregation_of_a_marker
performs a
chi-square test according to Mendelian segregation to check if a
specific marker is following the expected segregation pattern.
test_segregation_of_a_marker(bins_example, 4)
#> $Hypothesis
#> [1] "1:1:1:1"
#>
#> $qui.quad
#> X-squared
#> 2.64
#>
#> $p.val
#> [1] 0.4505201
#>
#> $perc.genot
#> [1] 100
The arguments are the onemap
object and the number of
the marker you want to test.
You can also test all the markers in your onemap
object
using the test_segregation
function. The results can be
viewed by printing the output object of class
onemap_segreg_test
.
segreg_test <- test_segregation(bins_example)
print(segreg_test)
#> Marker H0 Chi-square p-value % genot.
#> 1 M1 1:2:1 1.76000000 4.147829e-01 100
#> 2 M2 1:1 0.04000000 8.414806e-01 100
#> 3 M3 1:1 0.36000000 5.485062e-01 100
#> 4 M4 1:1:1:1 2.64000000 4.505201e-01 100
#> 5 M5 1:1 1.96000000 1.615133e-01 100
#> 6 M6 1:2:1 1.52000000 4.676664e-01 100
#> 7 M7 1:1 0.16000000 6.891565e-01 100
#> 8 M8 1:2:1 0.86000000 6.505091e-01 100
#> 9 M9 1:1 0.04000000 8.414806e-01 100
#> 10 M10 1:1 0.36000000 5.485062e-01 100
#> 11 M11 1:1 0.16000000 6.891565e-01 100
#> 12 M12 1:1:1:1 6.48000000 9.045460e-02 100
#> 13 M13 3:1 0.00000000 1.000000e+00 100
#> 14 M14 1:1:1:1 0.40000000 9.402425e-01 100
#> 15 M15 1:1:1:1 2.24000000 5.241127e-01 100
#> 16 M16 1:1 1.44000000 2.301393e-01 100
#> 17 M17 1:2:1 35.58000000 1.878889e-08 100
#> 18 M18 1:1:1:1 1.44000000 6.961859e-01 100
#> 19 M19 1:2:1 37.98000000 5.659105e-09 100
#> 20 M20 1:1:1:1 4.88000000 1.807980e-01 100
#> 21 M21 1:1 1.44000000 2.301393e-01 100
#> 22 M22 1:1 1.00000000 3.173105e-01 100
#> 23 M23 3:1 0.48000000 4.884223e-01 100
#> 24 M24 1:2:1 1.50000000 4.723666e-01 100
#> 25 M25 1:2:1 35.04000000 2.461278e-08 100
#> 26 M26 1:1:1:1 1.52000000 6.776621e-01 100
#> 27 M27 1:1 1.00000000 3.173105e-01 100
#> 28 M28 1:1:1:1 1.20000000 7.530043e-01 100
#> 29 M29 1:1 0.00000000 1.000000e+00 100
#> 30 M30 1:2:1 3.42000000 1.808658e-01 100
#> 31 SNP1 1:2:1 3.73913043 1.541907e-01 92
#> 32 SNP2 1:2:1 4.76086957 9.251035e-02 92
#> 33 SNP3 1:2:1 4.76086957 9.251035e-02 92
#> 34 SNP5 1:2:1 4.95652174 8.388899e-02 92
#> 35 SNP6 1:2:1 1.80434783 4.056868e-01 92
#> 36 SNP7 1:2:1 1.45652174 4.827478e-01 92
#> 37 SNP8 1:2:1 0.26086957 8.777137e-01 92
#> 38 SNP9 1:2:1 1.39130435 4.987491e-01 92
#> 39 SNP10 1:2:1 0.06521739 9.679172e-01 92
#> 40 SNP11 1:2:1 0.52173913 7.703814e-01 92
#> 41 SNP12 1:2:1 0.17391304 9.167170e-01 92
#> 42 SNP13 1:2:1 0.08695652 9.574534e-01 92
#> 43 SNP14 1:1 0.53846154 4.630710e-01 91
#> 44 SNP16 1:2:1 0.00000000 1.000000e+00 92
#> 45 SNP17 1:1 0.00000000 1.000000e+00 90
#> 46 SNP18 1:1 0.27472527 6.001795e-01 91
#> 47 SNP20 1:1 0.17777778 6.732900e-01 90
#> 48 SNP21 1:1 0.01123596 9.155825e-01 89
#> 49 SNP22 1:2:1 1.10869565 5.744468e-01 92
#> 50 SNP23 1:2:1 2.15217391 3.409270e-01 92
#> 51 SNP24 1:2:1 2.86956522 2.381671e-01 92
#> 52 SNP25 1:2:1 2.15217391 3.409270e-01 92
The only argument of the function is a onemap
object.
Once we have the onemap_segreg_test
object, the function
select_segreg
can be used to show only the markers
considered with/without segregation distortion. By default, it uses as a
threshold for the test a global α = 0.05, corrected for multiple
tests with Bonferroni correction.
select_segreg(segreg_test, distorted = TRUE) #to show the markers names with segregation distortion
#> [1] "M17" "M19" "M25"
select_segreg(segreg_test, distorted = FALSE) #to show the markers names without segregation distortion
#> [1] "M1" "M2" "M3" "M4" "M5" "M6" "M7" "M8" "M9"
#> [10] "M10" "M11" "M12" "M13" "M14" "M15" "M16" "M18" "M20"
#> [19] "M21" "M22" "M23" "M24" "M26" "M27" "M28" "M29" "M30"
#> [28] "SNP1" "SNP2" "SNP3" "SNP5" "SNP6" "SNP7" "SNP8" "SNP9" "SNP10"
#> [37] "SNP11" "SNP12" "SNP13" "SNP14" "SNP16" "SNP17" "SNP18" "SNP20" "SNP21"
#> [46] "SNP22" "SNP23" "SNP24" "SNP25"
It is not recommended, but you can define a different threshold value
by changing the threshold
argument of the function
select_segreg
.
For the next steps, it will be useful to know the numbers of each marker with segregation distortion, so then you can keep those out of your map building analysis. These numbers refer to the lines where markers are located on the data file.
To access the corresponding number for these markers you can change
the numbers
argument:
dist <- select_segreg(segreg_test, distorted = TRUE, numbers = TRUE) #to show the markers numbers with segregation distortion
dist
#> [1] 17 19 25
no_dist <- select_segreg(segreg_test, distorted = FALSE, numbers = TRUE) #to show the markers numbers without segregation distortion
no_dist
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 18 20 21 22 23 24 26 27 28
#> [26] 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
You can also see the results graphically by:
Now, we start the map building analysis. In this example, we follow two different strategies:
Using only recombinations information.
Using the recombinations and also the reference genome
information, once our example has CHROM
and
POS
information for some of the markers.
First, we will apply the strategy using only recombinations information. In the second part of this tutorial, we show a way to use also reference genome information. At the end of our analysis, we will be able to compare these two strategies by drawing the resulted genetic maps.
The first step is estimating the recombination fraction between all pairs of markers, using two-point tests.
Although two-point tests were implemented in C language, which is usually much faster than R, this step can take quite some time, depending on the number of markers involved and their segregation type, because all combinations will be estimated and tested. Besides, the results use a a lot of memory and a rather powerful computer is needed.
When the two-point analysis is finished, an object of class
rf_2pts
is created. Typing
will show a message with the criteria used in the analysis and some other information:
#> This is an object of class 'rf_2pts'
#>
#> Criteria: LOD = 3 , Maximum recombination fraction = 0.5
#>
#> This object is too complex to print
#> Type 'print(object, c(mrk1=marker, mrk2=marker))' to see
#> the analysis for two markers
#> mrk1 and mrk2 can be the names or numbers of both markers
If you want to see the results for given markers, say M1
and M3
, the command is:
print(twopts, c("M1", "M3"))
#> Results of the 2-point analysis for markers: M1 and M3
#> Criteria: LOD = 3 , Maximum recombination fraction = 0.5
#>
#> rf LOD
#> CC 0.2954514 1.646878
#> CR 0.2954514 1.646878
#> RC 0.7045486 1.646878
#> RR 0.7045486 1.646878
Each line corresponds to a possible linkage phase. CC
denotes the coupling phase in both parents, CR
and
RC
denote coupling phase in parent 1 and 2, respectively,
and repulsion in the other, and RR
denotes the repulsion
phase in both parents. Value rf
is the maximum likelihood
estimate of the recombination fraction, with its corresponding LOD
Score.
Once the recombination fractions and linkage phases for all pairs of
markers have been estimated and tested, markers can be assigned to
linkage groups. To do this, first, use the function
make_seq
to create a sequence with the markers you want to
assign.
The function make_seq
is used to create sequences from
objects of several kinds, as will be seen along with this tutorial.
Here, the object is of class rf_2pts
and the second
argument specifies which markers one wants to use. If one wants to use
only a subset of markers, say M1
and M2
, the
option will be a vector with the corresponding numbers of the markers,
as c(1,2)
, you can also use a string "all"
to
specify that you want to analyze all markers. In our example, we will
use the vector with the numbers of the markers with no segregation
distortion.
Because the identification of the markers can be cumbersome, one
should use the function marker type
to see their numbers,
names, and types:
marker_type(mark_no_dist)
#> Marker Marker.name Type
#> 1 1 M1 B3.7
#> 2 2 M2 D2.18
#> 3 3 M3 D1.13
#> 4 4 M4 A.4
#> 5 5 M5 D2.18
#> 6 6 M6 B3.7
#> 7 7 M7 D2.15
#> 8 8 M8 B3.7
#> 9 9 M9 D1.10
#> 10 10 M10 D2.17
#> 11 11 M11 D2.16
#> 12 12 M12 A.2
#> 13 13 M13 C.8
#> 14 14 M14 A.4
#> 15 15 M15 A.4
#> 16 16 M16 D2.17
#> 17 18 M18 A.1
#> 18 20 M20 A.1
#> 19 21 M21 D2.16
#> 20 22 M22 D1.10
#> 21 23 M23 C.8
#> 22 24 M24 B3.7
#> 23 26 M26 A.1
#> 24 27 M27 D1.12
#> 25 28 M28 A.4
#> 26 29 M29 D1.13
#> 27 30 M30 B3.7
#> 28 31 SNP1 B3.7
#> 29 32 SNP2 B3.7
#> 30 33 SNP3 B3.7
#> 31 34 SNP5 B3.7
#> 32 35 SNP6 B3.7
#> 33 36 SNP7 B3.7
#> 34 37 SNP8 B3.7
#> 35 38 SNP9 B3.7
#> 36 39 SNP10 B3.7
#> 37 40 SNP11 B3.7
#> 38 41 SNP12 B3.7
#> 39 42 SNP13 B3.7
#> 40 43 SNP14 D1.10
#> 41 44 SNP16 B3.7
#> 42 45 SNP17 D1.10
#> 43 46 SNP18 D1.10
#> 44 47 SNP20 D1.10
#> 45 48 SNP21 D1.10
#> 46 49 SNP22 B3.7
#> 47 50 SNP23 B3.7
#> 48 51 SNP24 B3.7
#> 49 52 SNP25 B3.7
OneMap has two different functions for grouping markers. The
group
function:
LGs <- group(mark_no_dist)
#> Selecting markers:
#> group 1
#> ........................
#> group 2
#> ................
#> group 3
#> ......
For this function, optional arguments are LOD
and
max.rf
, which define thresholds to be used when assigning
markers to linkage groups. If none is provided (default), it uses as
default values of LOD Score 3
and maximum recombination
fraction 0.50
.
Also, you can use the function suggest_lod
to calculate
a suggested LOD score considering that multiple tests are being
performed.
And apply this suggested value to the two-point tests:
LGs <- group(mark_no_dist, LOD=LOD_sug)
#> Selecting markers:
#> group 1
#> ........................
#> group 2
#> ................
#> group 3
#> ......
The previous command generates an object of class group
and the command print
for such object has two options. If
you type:
you will get detailed information about the groups, that is, all linkage groups will be printed, displaying the names of markers in each one of them.
#> This is an object of class 'group'
#> It was generated from the object "mark_no_dist"
#>
#> Criteria used to assign markers to groups:
#> LOD = 3.639312 , Maximum recombination fraction = 0.5
#>
#> No. markers: 49
#> No. groups: 3
#> No. linked markers: 49
#> No. unlinked markers: 0
#>
#> Printing groups:
#> Group 1 : 25 markers
#> M1 M2 M3 M5 M6 M10 M11 M12 M14 M15 M26 M28 M30 SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10 SNP11 SNP12 SNP13
#>
#> Group 2 : 17 markers
#> M4 M9 M16 M20 M21 M23 M24 M27 M29 SNP17 SNP18 SNP20 SNP21 SNP22 SNP23 SNP24 SNP25
#>
#> Group 3 : 7 markers
#> M7 M8 M13 M18 M22 SNP14 SNP16
However, in case you just want to see some basic information (such as the number of groups, number of linked markers, etc), use:
print(LGs, detailed = FALSE)
#> This is an object of class 'group'
#> It was generated from the object "mark_no_dist"
#>
#> Criteria used to assign markers to groups:
#> LOD = 3.639312 , Maximum recombination fraction = 0.5
#>
#> No. markers: 49
#> No. groups: 3
#> No. linked markers: 49
#> No. unlinked markers: 0
You can notice that all markers are linked to some linkage group. If the LOD Score threshold is changed to a higher value, some markers are kept unassigned:
LGs <- group(mark_no_dist, LOD = 6)
#> Selecting markers:
#> group 1
#> .....................
#> group 2
#> ..........
#> group 3
#> ..
#> group 4
#> .....
#> group 5
#> ....
LGs
#> This is an object of class 'group'
#> It was generated from the object "mark_no_dist"
#>
#> Criteria used to assign markers to groups:
#> LOD = 6 , Maximum recombination fraction = 0.5
#>
#> No. markers: 49
#> No. groups: 5
#> No. linked markers: 47
#> No. unlinked markers: 2
#>
#> Printing groups:
#> Group 1 : 22 markers
#> M1 M2 M3 M6 M10 M12 M14 M26 M28 M30 SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10 SNP11 SNP12 SNP13
#>
#> Group 2 : 11 markers
#> M4 M9 M16 M20 M21 M23 M27 SNP17 SNP18 SNP20 SNP21
#>
#> Group 3 : 3 markers
#> M5 M11 M15
#>
#> Group 4 : 6 markers
#> M8 M13 M18 M22 SNP14 SNP16
#>
#> Group 5 : 5 markers
#> M24 SNP22 SNP23 SNP24 SNP25
#>
#> Unlinked markers: 2 markers
#> M7 M29
Changing back to the previous criteria, now setting the maximum recombination fraction to 0.40:
LGs <- group(mark_no_dist, LOD = LOD_sug, max.rf = 0.4)
#> Selecting markers:
#> group 1
#> ........................
#> group 2
#> ................
#> group 3
#> ......
LGs
#> This is an object of class 'group'
#> It was generated from the object "mark_no_dist"
#>
#> Criteria used to assign markers to groups:
#> LOD = 3.639312 , Maximum recombination fraction = 0.4
#>
#> No. markers: 49
#> No. groups: 3
#> No. linked markers: 49
#> No. unlinked markers: 0
#>
#> Printing groups:
#> Group 1 : 25 markers
#> M1 M2 M3 M5 M6 M10 M11 M12 M14 M15 M26 M28 M30 SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10 SNP11 SNP12 SNP13
#>
#> Group 2 : 17 markers
#> M4 M9 M16 M20 M21 M23 M24 M27 M29 SNP17 SNP18 SNP20 SNP21 SNP22 SNP23 SNP24 SNP25
#>
#> Group 3 : 7 markers
#> M7 M8 M13 M18 M22 SNP14 SNP16
The other function for grouping is called group_upgma
.
It is an adapted version of MAPpoly grouping
function.
You can define the expected number of groups in the
expected.groups
argument and check how the markers are
split in the plotted dendrogram. Using argument inter=TRUE
you can change interactively the number of groups defined by the red
squares in the graphic.
Once marker assignment to linkage groups is finished, the mapping
step can take place. First of all, you must set the mapping function
that should be used to display the genetic map throughout the analysis.
You can choose between Kosambi
or Haldane
mapping functions. To use Haldane, type:
To use Kosambi, type:
If you do not set one of these functions, the kosambi is used as default.
Now, you must define which linkage group will be mapped. In other
words, a linkage group must be extracted
from the object of
class group
or group.upgma
, in order to be
mapped. For simplicity, we will start here with the smallest one, which
is linkage group 3 (considering the group
function). This
can be easily done using the following code:
The first argument (LGs
) is an object of class
group
or group.upgma
and the second is a
number indicating which linkage group will be extracted, according to
the results stored in object LGs
. The object
LG3
, generated by function make_seq
, is of
class sequence
, showing that this function can be used with
several types of objects.
If you type
you will see which markers are comprised in the sequence, and also that no parameters have been estimated so far.
#>
#> Number of markers: 7
#> Markers in the sequence:
#> M7 M8 M13 M18 M22 SNP14 SNP16
#>
#> Parameters not estimated.
To order these markers, one can use a two-point based algorithm such as Seriation (Buetow and Chakravarti, 1987), Rapid Chain Delineation (Doerge, 1996), Recombination Counting and Ordering (Van Os et al., 2005) and Unidirectional Growth (Tan and Fu, 2006):
(LG3_ser <- seriation(LG3, hmm = FALSE))
(LG3_rcd <- rcd(LG3, hmm = FALSE))
(LG3_rec <- record(LG3, hmm = FALSE))
(LG3_ug <- ug(LG3, hmm = FALSE))
Argument hmm
defines if the function should run the HMM
chain multipoint approach to estimate the genetic distances given the
marker order provided by the two-points ordering algorithm. We set here
the argument hmm=FALSE
because we just want to obtain the
marker order. We are not yet estimating the genetic distances. We
suggest to use hmm=TRUE
only when you already decided which
order is the best because the HMM chain is the most computationally
intensive step in the map building. You can use
rf_graph_table
to check the ordering quality (see details
below) and make editions in the marker order using
drop_marker
. After, you can use map
or
map_avoid_unlinked
functions to estimate the genetic
distances (check session #Map-estimation-for-an-arbitrary-order).
Here we can check, there are some differences between each ordering algorithm results (results not shown).
Alternatively, you can also use the mds_onemap
function
to obtain a first draft for the order of the markers. The
mds_onemap
is a wrapper function that makes an interface
between OneMap
and MDSMap
package. The
ordering approach presented in MDSMap
provides a faster and
efficient way of ordering markers using multi-dimensional scaling. The
method also provides diagnostics graphics and parameters to find
outliers to help users to filter the dataset. You can find more
information in MDSMap
vignette. Here we
will show a simple example of how it can be used for ordering our
example markers from an outcrossing population.
LG3_mds <- mds_onemap(LG3, hmm = FALSE)
#> Stress: 0.221432338237643
#> Mean Nearest Neighbour Fit: 22.1195799121636
If you only specify the input sequence, mds_onemap will use the
default parameters. It will also generate an MDSMap input file in the
out.file
file. You can use out.file
in the
MDSMap package to try other parameters too. The default method used is
the principal curves, know more about using ?mds_onemap
and
reading the MDSMap vignette.
Besides these algorithms use a two-point approach to order the
markers, if you set hmm=TRUE
a multipoint approach is
applied to estimate the genetic distances after the order is estimated.
Thus, it can happen that some markers are not considered linked when
evaluated by multipoint information, and the function will return an
error like this:
ERROR: The linkage between markers 1 and 2 did not reach the OneMap default criteria. They are probably segregating independently
You can automatically remove these markers setting argument
rm_unlinked = TRUE
. The marker will be removed, and the
ordering algorithms will be restarted. Warning messages will inform
which markers were removed. If you don’t get warning messages, it means
that any marker needed to be removed. This is our case in this example,
but if you obtain an error or warning running your dataset, you already
know what happened.
NOTE: If your sequence has many markers (more than
60), we suggest to first use hmm=FALSE to check the ordering and after
speed up mds
, seriation
, rcd
,
record
and ug
using BatchMap parallelization
approach. See section Speed up analysis with parallelization
for more information.
To order by comparing all possible orders (exhaustive search), the
function compare
can be used:
WARNING: This algorithm can take some time to run, depending on marker types in the linkage group. If you are working on a personal computer, without high capacity, we recommend using a maximum of ten markers.
If you have more markers in your group, we suggest using the
following explained approaches order_seq
.
In the example, LG3
contains only seven markers. Two of
them are of type D1, and one is segregating in 3:1 fashion (type C).
Thus, although the number of possible orders is relatively small (360),
for each order, there are various possible combinations of linkage
phases. Also, the convergence of the EM algorithm takes considerably
more time, because markers of type C and D are not very informative.
The first argument to the compare
function is an object
of class sequence
(the extracted group LG3
),
and the object generated by this function is of class
compare
.
To see the results of the previous step, type:
NOTE: Check the GitHub vignette version to visualize the output.
Remember that for outcrossing populations, one needs to estimate
marker order and also linkage phases between markers for a given order.
However, because two-point analysis provides information about linkage
phases, this information is taken into consideration in the
compare
function, reducing the number of combinations to be
evaluated. If a given linkage phase has LOD greater than 0.005 in the
two-point analysis, we assume that this phase is very unlikely and so
does not need to be evaluated in the multipoint procedure used by
compare
. We did extensive simulations, which showed that
this is a good procedure.
By default, OneMap
stores 50 orders, which may or may
not be unique. The value of LOD
refers to the overall LOD
Score, considering all orders tested. Nested LOD
refers to
LOD Scores within a given order, that is, scores for different
combinations of linkage phases for the same marker order.
For example, order 1 has the largest value of log-likelihood and,
therefore, its LOD Score is zero for a given combination of linkage
phases (CC, CC, RR, RR). For this same order and other linkage phases,
LOD Score is -5.20. Analyzing the results for order 2, notice that its
highest LOD Score is very close to zero, indicating that this order is
also quite plausible. Notice also that Nested LOD
will
always contain at least one zero value, corresponding to the best
combination of phases for markers in a given order. Due to the
information provided by a two-point analysis, not all combinations are
tested, and that is the reason why the number of Nested LOD values is
different for each order.
Unless one has some biological information, it is a good idea to choose the order with the highest likelihood. The final map can then be obtained with the command.
The first argument is the object of class compare
. The
second argument indicates which order is chosen: 1 is for the order with
the highest likelihood, 2 is for the second-best, and so on. The third
argument indicates which combination of phases is chosen for a given
order: 1 also means the combination with the highest likelihood among
all combinations of phases (based on Nested LOD).
For simplicity, these values are defaults, so typing
have the same effect.
To see the final map, type:
LG3_final
#>
#> Printing map:
#>
#> Markers Position Parent 1 Parent 2
#>
#> 43 SNP14 0.00 a | | b a | | a
#> 22 M22 13.51 a | | b a | | a
#> 7 M7 23.08 a | | a a | | b
#> 18 M18 64.51 a | | b c | | d
#> 8 M8 70.20 b | | a b | | a
#> 13 M13 72.86 a | | o a | | o
#> 44 SNP16 78.74 b | | a b | | a
#>
#> 7 markers log-likelihood: -398.6832
At the leftmost position, marker names are displayed.
Position
shows the cumulative distance using the Kosambi
mapping function. Finally, Parent 1
and
Parent 2
show the diplotypes of both parents, that is, the
combination in which alleles are arranged in the chromosomes, given the
estimated linkage phase. The notation is the same as that used by Wu et
al. (2002a). Details about how ordering algorithms can be chosen and
used are presented by Mollinari et al. (2009).
A careful examination of the results can be done using the function
rf_graph_table
to provide graphical view:
With the default arguments, this function plots the recombination
fractions between the markers pointed in the axes. You can change the
number of colors from the rainbow
palette with the argument
n.colors
. Hot colors (more close to red) represent lower
values of recombination fractions, as shown in the scale at the right
side of the graphic. White cells indicate combinations of markers for
which the recombination fractions cannot be estimated (D1 and D2). If
you want to analyze the LOD values between the markers, use
graph.LOD = TRUE
.
If you change the inter
argument to TRUE
,
you should also specify an output HTML file name in
html.file
. This HTML contains an iterative plot graphic. If
you hover the mouse cursor over the cells it shows some extra
information about cells, as a percentage of missing data, marker name,
and type. The output HTML file is generated in your work directory and
opens automatically in your internet browser.
For example, passing on the cell corresponding to markers
8
and 13
, you can see their names
(M8
and M13
), types (B3.7
and
C.8
), recombination fraction (rf = 0.03
) and
LOD Scores for each possible linkage phase. This is quite useful in
helping to interpret the results.
If you want to see corresponding marker numbers (not the names) in
the axis, just change the argument mrk.axis
to
numbers
. It can make the next steps easier.
The rf_graph_table
can also be used to check the order
of markers based on the monotonicity of the matrix: as we get away from
the secondary diagonal, the recombination fraction values should
increase.
It is possible to see a gap between markers M7
and
M18
(numbers 7 and 18). In some cases, gaps could indicate
that the group must be divided at this position, but here
SNP18
(number 43) also shows linkage with M8
,
which points that probably it is only a gap. Adding more markers to
these groups could fill this gap.
Changing other arguments of the function, you can add/remove labels of the axes (‘lab.xy’) and add a title to the graph (‘main’).
Now, let us map the markers in linkage group number 2.
Again, extract
that group from the object
LGs
:
LG2 <- make_seq(LGs, 2)
LG2
#>
#> Number of markers: 17
#> Markers in the sequence:
#> M4 M9 M16 M20 M21 M23 M24 M27 M29 SNP17 SNP18 SNP20 SNP21 SNP22 SNP23 SNP24
#> SNP25
#>
#> Parameters not estimated.
Note that there are more than 10 markers in this group, so it is
infeasible to use the compare
function with all of them
because it will take a very long time to proceed.
First, use rcd
to get a preliminary order estimate:
LG2_rcd <- rcd(LG2, hmm = F)
LG2_rcd
#>
#> Number of markers: 17
#> Markers in the sequence:
#> SNP20 M16 M20 M4 M21 M23 SNP17 SNP18 M9 SNP21 SNP24 SNP23 M24 SNP22 SNP25 M29
#> M27
#>
#> Parameters not estimated.
rf_graph_table(LG2_rcd)
Use the marker_type
function to check the segregation
types of all markers in this group:
marker_type(LG2)
#> Marker Marker.name Type
#> 1 4 M4 A.4
#> 2 9 M9 D1.10
#> 3 16 M16 D2.17
#> 4 20 M20 A.1
#> 5 21 M21 D2.16
#> 6 23 M23 C.8
#> 7 24 M24 B3.7
#> 8 27 M27 D1.12
#> 9 29 M29 D1.13
#> 10 45 SNP17 D1.10
#> 11 46 SNP18 D1.10
#> 12 47 SNP20 D1.10
#> 13 48 SNP21 D1.10
#> 14 49 SNP22 B3.7
#> 15 50 SNP23 B3.7
#> 16 51 SNP24 B3.7
#> 17 52 SNP25 B3.7
Based on their segregation types and distribution on the preliminary
map, markers M4
, M20
, M24
,
SNP22
, SNP23
, SNP24
and
SNP25
are the most informative ones (type A
is
better, followed by type B
). So, let us create a framework
of ordered markers using compare
for the most informative
ones:
Here there is a automatic way of obtain a new sequence only with markers selected by type:
LG2_init <- seq_by_type(sequence = LG2, mk_type = c("A", "B"))
marker_type(LG2_init)
#> Marker Marker.name Type
#> 1 4 M4 A.4
#> 2 20 M20 A.1
#> 3 24 M24 B3.7
#> 4 49 SNP22 B3.7
#> 5 50 SNP23 B3.7
#> 6 51 SNP24 B3.7
#> 7 52 SNP25 B3.7
# If I want to reduce even more the number of markers
# I can use drop_marker function
LG2_init <- drop_marker(LG2_init, 52)
marker_type(LG2_init)
#> Marker Marker.name Type
#> 1 4 M4 A.4
#> 2 20 M20 A.1
#> 3 24 M24 B3.7
#> 4 49 SNP22 B3.7
#> 5 50 SNP23 B3.7
#> 6 51 SNP24 B3.7
Now, the first argument to make_seq
is an object of
class rf_2pts
, and the second argument is a vector of
integers, specifying which molecular markers comprise the sequence.
LG2_comp <- compare(LG2_init)
#> Warning in compare_outcross(input.seq = input.seq, n.best = n.best, tol = tol, : This operation may take a VERY long time
Select the best order:
Also, we can obtain a useful diagnostic graphic using the function
rf_graph_table
.
The graphic shows that there are two groups of markers, once
M20
and M4
are far from the other markers.
These markers could be in other linkage groups, or they are distant in
the same group. Adding more markers will give more information to solve
this issue.
Next, let us try to map the remaining markers, one at a time. First,
we will try to add the remaining most informative markers. Starting with
SNP25
:
LG2_extend
#>
#> LOD scores correspond to the best linkage phase combination
#> for each position
#>
#> The symbol "*" outside the box indicates that more than one
#> linkage phase is possible for the corresponding position
#>
#>
#> Marker tested: 52
#>
#> Markers LOD
#> =====================
#> | |
#> | -26.43 | 1
#> | 20 |
#> | -50.47 | 2
#> | 4 |
#> | -8.34 | 3
#> | 51 |
#> | -8.78 | 4
#> | 50 |
#> | -8.44 | 5
#> | 24 |
#> | -2.46 | 6
#> | 49 |
#> | 0.00 | 7
#> | |
#> =====================
Based on the LOD Scores, marker SNP25 is probably better located
after SNP22
(number 49). Detailed results can be seen
with:
print(LG2_extend, 7)
#>
#> LOD is the overall LOD score (among all orders)
#>
#> NEST.LOD is the LOD score within the order
#>
#> Marker tested: 52
#> --------------
#> | | |
#> | 20 | |
#> | | CR |
#> | 4 | |
#> | | CC |
#> | 51 | |
#> | | CC |
#> | 50 | |
#> | | CC |
#> | 24 | |
#> | | CC |
#> | 49 | |
#> | | CC |
#> | 52 | |
#> | | |
#> |------------|
#> | LOD | 0.0|
#> |------------|
#> |NEST.| |
#> | LOD | 0.0|
#> --------------
The second argument indicates the position where to place the marker. Note that the first allele arrangement is the most likely one.
It should be pointed out that the framework created by the function
compare
(with M20
, M4
,
SNP24
, SNP23
, M24
and
SNP22
, or numbers 20
, 4
,
51
,50
, 24
and 49
)
could be in reverse order (SNP22
, M24
,
SNP23
, SNP24
, M4
and
M20
, or numbers 49
, 24
,
50
, 51
, 4
, 20
) and
still represent the same map. Thus, the positioning of markers with the
try_seq
command can be different on your computer. For
example, here marker SNP25
(number 52
) was
better placed at position 7; however, if you obtain a reversed order,
marker SNP25
would be better placed in position 1. In both
cases, the best position for this marker is after
SNP22
.
We can better evaluate the order with rf_graph_table
. It
requires an object of the sequence
class with mapping
information.
When using make_seq
with an object of class
try
, the second argument is the position on the map
(according to the scale on the right of the output) and the last
argument indicates linkage phases (defaults to 1, higher nested
LOD).
We can see that SNP25
(or marker 52) was positioned at
the end of the sequence and the color pattern shows that it is strongly
linked with its neighbors, indicating that it is well-positioned. We
will maintain this marker at this position:
Adding other markers, one by one (output not shown):
LG2_extend <- try_seq(LG2_frame, 9)
LG2_frame <- make_seq(LG2_extend, 3)
LG2_extend <- try_seq(LG2_frame, 16)
LG2_frame <- make_seq(LG2_extend, 1)
LG2_extend <- try_seq(LG2_frame, 21)
LG2_frame <- make_seq(LG2_extend, 4)
LG2_extend <- try_seq(LG2_frame, 23)
LG2_frame <- make_seq(LG2_extend, 5)
LG2_extend <- try_seq(LG2_frame, 27)
LG2_frame <- make_seq(LG2_extend, 1)
LG2_extend <- try_seq(LG2_frame, 29)
LG2_frame <- make_seq(LG2_extend, 12)
LG2_extend <- try_seq(LG2_frame, 45)
LG2_frame <- make_seq(LG2_extend, 8)
LG2_extend <- try_seq(LG2_frame, 46)
LG2_frame <- make_seq(LG2_extend, 7)
LG2_extend <- try_seq(LG2_frame, 47)
LG2_frame <- make_seq(LG2_extend, 7)
LG2_extend <- try_seq(LG2_frame, 48)
LG2_final <- make_seq(LG2_extend, 10)
Checking graphically:
The process of adding markers can be automated with the use of
function order_seq
.
LG2_ord <- order_seq(LG2, n.init = 5, THRES = 3)
#> | | | 0% | |= | 2% | |== | 3% | |==== | 5% | |===== | 7% | |====== | 8% | |======= | 10% | |======== | 12% | |========= | 13% | |========== | 15% | |============ | 17% | |============= | 18% | |============== | 20% | |=============== | 22% | |================ | 23% | |================== | 25% | |=================== | 27% | |==================== | 28% | |===================== | 30% | |====================== | 32% | |======================= | 33% | |======================== | 35% | |========================== | 37% | |=========================== | 38% | |============================ | 40% | |============================= | 42% | |============================== | 43% | |================================ | 45% | |================================= | 47% | |================================== | 48% | |=================================== | 50% | |==================================== | 52% | |===================================== | 53% | |====================================== | 55% | |======================================== | 57% | |========================================= | 58% | |========================================== | 60% | |=========================================== | 62% | |============================================ | 63% | |============================================== | 65% | |=============================================== | 67% | |================================================ | 68% | |================================================= | 70% | |================================================== | 72% | |=================================================== | 73% | |==================================================== | 75% | |====================================================== | 77% | |======================================================= | 78% | |======================================================== | 80% | |========================================================= | 82% | |========================================================== | 83% | |============================================================ | 85% | |============================================================= | 87% | |============================================================== | 88% | |=============================================================== | 90% | |================================================================ | 92% | |================================================================= | 93% | |================================================================== | 95% | |==================================================================== | 97% | |===================================================================== | 98% | |======================================================================| 100%
This function automates what the try_seq
function does,
using some predefined rules. In the function, n.init = 5
means that five markers (the most informative ones) will be used in the
compare
step; THRES = 3
indicates that the
try_seq
step will only add markers to the sequence which
can be mapped with LOD Score greater than 3.
NOTE: Although very useful, this function can be misleading, especially if there are not many fully informative markers, so use it carefully. Results can vary between multiple runs on the same markers, of course.
Check the final order:
LG2_ord
#>
#> Best sequence found.
#> Printing map:
#>
#> Markers Position Parent 1 Parent 2
#>
#> 27 M27 0.00 b | | o a | | a
#> 16 M16 11.76 a | | a b | | o
#> 20 M20 22.94 a | | b c | | d
#> 4 M4 35.71 a | | o o | | b
#> 23 M23 53.69 a | | o a | | o
#> 45 SNP17 72.86 a | | b a | | a
#> 50 SNP23 103.35 a | | b b | | a
#> 24 M24 108.37 a | | b b | | a
#> 49 SNP22 112.25 a | | b b | | a
#>
#> 9 markers log-likelihood: -551.4083
#>
#>
#>
#> The following markers could not be uniquely positioned.
#> Printing most likely positions for each unpositioned marker:
#>
#> ------------------------------------------------------
#> | | 9 | 21 | 29 | 46 | 47 | 48 | 51 | 52 |
#> |----|-----|-----|-----|-----|-----|-----|-----|-----|
#> | | | | | | | | | |
#> | 27 | | | | | | | | |
#> | | | | | | | | | |
#> | 16 | | | | | | | | |
#> | | | | | | | | | |
#> | 20 | | | | | | | | |
#> | | | | | | | | | |
#> | 4 | | | | | | | | |
#> | | | *** | | | | | | |
#> | 23 | | | | | | | | |
#> | | *** | * | | | | *** | | |
#> | 45 | | | | | | | | |
#> | | * | | ** | *** | *** | | *** | |
#> | 50 | | | | | | | | |
#> | | | | | | | | | |
#> | 24 | | | | | | | | |
#> | | | | | | | | | |
#> | 49 | | | | | | | | |
#> | | | | *** | | | | ** | *** |
#> ------------------------------------------------------
#>
#> '***' indicates the most likely position(s) (LOD = 0.0)
#>
#> '**' indicates very likely positions (LOD > -1.0)
#>
#> '*' indicates likely positions (LOD > -2.0)
Note that markers 9
, 21
, 29
,
46
, 47
, 48
, 51
and
52
could not be safely mapped to a single position
(LOD Score > THRES
in absolute value). The output
displays the safe
order and the most likely positions for
markers not mapped, where ***
indicates the most likely
position and *
corresponds to other plausible
positions.
To get the safe order (i.e., without markers 9
,
21
, 29
, 46
, 47
,
48
, 51
and 52
), use
and to get the order with all markers, use
LG2_all <- make_seq(LG2_ord, "force")
LG2_all
#>
#> Printing map:
#>
#> Markers Position Parent 1 Parent 2
#>
#> 27 M27 0.00 b | | o a | | a
#> 16 M16 11.76 a | | a b | | o
#> 20 M20 22.94 a | | b c | | d
#> 4 M4 35.71 a | | o o | | b
#> 21 M21 49.40 o | | o a | | b
#> 23 M23 55.18 a | | o a | | o
#> 48 SNP21 67.73 a | | b a | | a
#> 9 M9 74.32 a | | b a | | a
#> 46 SNP18 80.85 a | | b a | | a
#> 45 SNP17 96.14 a | | b a | | a
#> 47 SNP20 121.33 a | | b a | | a
#> 29 M29 170.70 o | | a o | | o
#> 50 SNP23 192.38 a | | b b | | a
#> 24 M24 197.52 a | | b b | | a
#> 49 SNP22 200.80 a | | b b | | a
#> 52 SNP25 205.70 a | | b b | | a
#> 51 SNP24 216.64 a | | b b | | a
#>
#> 17 markers log-likelihood: -871.9124
Notice that, for this linkage group, the forced
map
obtained with order_seq
is different from that obtained
with compare
plus try_seq
. It depends on which
markers we choose to try to add first when doing manually.
The order_seq
function can also perform two rounds of
the try_seq
algorithms, first using THRES
and
then THRES - 1
as a threshold. This generally results in
safe orders with more markers mapped but may take longer to run. To do
this, use the touchdown
option:
LG2_ord
#>
#> Best sequence found.
#> Printing map:
#>
#> Markers Position Parent 1 Parent 2
#>
#> 27 M27 0.00 b | | o a | | a
#> 16 M16 11.76 a | | a b | | o
#> 20 M20 22.94 a | | b c | | d
#> 4 M4 35.71 a | | o o | | b
#> 23 M23 53.66 a | | o a | | o
#> 45 SNP17 72.85 a | | b a | | a
#> 46 SNP18 87.61 a | | b a | | a
#> 50 SNP23 116.87 a | | b b | | a
#> 24 M24 121.89 a | | b b | | a
#> 49 SNP22 125.19 a | | b b | | a
#> 52 SNP25 130.20 a | | b b | | a
#>
#> 11 markers log-likelihood: -626.9232
#>
#>
#>
#> The following markers could not be uniquely positioned.
#> Printing most likely positions for each unpositioned marker:
#>
#> ------------------------------------------
#> | | 9 | 21 | 29 | 47 | 48 | 51 |
#> |----|-----|-----|-----|-----|-----|-----|
#> | | | | | | | |
#> | 27 | | | | | | |
#> | | | | | | | |
#> | 16 | | | | | | |
#> | | | | | | | |
#> | 20 | | | | | | |
#> | | | | | | | |
#> | 4 | | | | | | |
#> | | | *** | | | | |
#> | 23 | | | | | | |
#> | | | * | | * | *** | |
#> | 45 | | | | | | |
#> | | *** | | | | * | |
#> | 46 | | | | | | |
#> | | | | *** | *** | * | ** |
#> | 50 | | | | | | |
#> | | | | | | | |
#> | 24 | | | | | | |
#> | | | | | | | |
#> | 49 | | | | | | |
#> | | | | | | | |
#> | 52 | | | | | | |
#> | | | | ** | | | *** |
#> ------------------------------------------
#>
#> '***' indicates the most likely position(s) (LOD = 0.0)
#>
#> '**' indicates very likely positions (LOD > -1.0)
#>
#> '*' indicates likely positions (LOD > -2.0)
For this particular sequence, the touchdown
step could
map safely markers 46
and 52
, but this depends
on the specific dataset.
Finally, to check for alternative orders (because we did not use
exhaustive search), use the ripple_seq
function:
ripple_seq(LG2_all, ws = 4, LOD = LOD_sug)
#> 27-16-20-4-|-21-... OK
#>
#> ...-27-|-16-20-4-21-|-23-... OK
#>
#> ...-16-|-20-4-21-23-|-48-...
#> Alternative orders:
#> ... 16 20 4 21 23 48 ... : 0.00 ( linkage phases: ... 1 2 2 1 1 ... )
#> ... 16 20 4 23 21 48 ... : -2.15 ( linkage phases: ... 1 2 2 1 1 ... )
#>
#> ...-20-|-4-21-23-48-|-9-...
#> Alternative orders:
#> ... 20 4 21 23 48 9 ... : 0.00 ( linkage phases: ... 2 2 1 1 1 ... )
#> ... 20 4 23 21 48 9 ... : -2.15 ( linkage phases: ... 2 2 1 1 1 ... )
#>
#> ...-4-|-21-23-48-9-|-46-...
#> Alternative orders:
#> ... 4 21 23 48 9 46 ... : 0.00 ( linkage phases: ... 2 1 1 1 1 ... )
#> ... 4 23 21 48 9 46 ... : -2.15 ( linkage phases: ... 2 1 1 1 1 ... )
#>
#> ...-21-|-23-48-9-46-|-45-...
#> Alternative orders:
#> ... 21 23 48 9 46 45 ... : 0.00 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 21 23 48 46 9 45 ... : -1.05 ( linkage phases: ... 1 1 1 1 1 ... )
#>
#> ...-23-|-48-9-46-45-|-47-...
#> Alternative orders:
#> ... 23 48 9 46 45 47 ... : 0.00 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 23 48 46 9 45 47 ... : -1.05 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 23 48 45 9 46 47 ... : -1.83 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 23 48 9 45 46 47 ... : -3.21 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 23 45 48 9 46 47 ... : -3.49 ( linkage phases: ... 1 1 1 1 1 ... )
#>
#> ...-48-|-9-46-45-47-|-29-...
#> Alternative orders:
#> ... 48 9 46 45 47 29 ... : 0.00 ( linkage phases: ... 1 1 1 1 3 ... )
#> ... 48 46 9 45 47 29 ... : -1.05 ( linkage phases: ... 1 1 1 1 3 ... )
#> ... 48 45 9 46 47 29 ... : -1.83 ( linkage phases: ... 1 1 1 1 3 ... )
#> ... 48 9 45 46 47 29 ... : -3.21 ( linkage phases: ... 1 1 1 1 3 ... )
#>
#> ...-9-|-46-45-47-29-|-50-...
#> Alternative orders:
#> ... 9 46 45 47 29 50 ... : 0.00 ( linkage phases: ... 1 1 1 3 4 ... )
#> ... 9 45 46 47 29 50 ... : -3.21 ( linkage phases: ... 1 1 1 3 4 ... )
#>
#> ...-46-|-45-47-29-50-|-24-... OK
#>
#> ...-45-|-47-29-50-24-|-49-... OK
#>
#> ...-47-|-29-50-24-49-|-52-... OK
#>
#> ...-29-|-50-24-49-52-|-51-...
#> Alternative orders:
#> ... 29 50 24 49 52 51 : 0.00 ( linkage phases: ... 4 1 1 1 1 )
#> ... 29 52 49 24 50 51 : -0.90 ( linkage phases: ... 4 1 1 1 1 )
#> ... 29 50 24 52 49 51 : -2.75 ( linkage phases: ... 4 1 1 1 1 )
#> ... 29 50 52 49 24 51 : -2.79 ( linkage phases: ... 4 1 1 1 1 )
#> ... 29 49 52 24 50 51 : -3.29 ( linkage phases: ... 4 1 1 1 1 )
#> ... 29 50 49 52 24 51 : -3.30 ( linkage phases: ... 4 1 1 1 1 )
#>
#> 50-|-24-49-52-51
#> Alternative orders:
#> ... 50 24 49 52 51 : 0.00 ( linkage phases: ... 1 1 1 1 )
#> ... 50 24 52 49 51 : -2.75 ( linkage phases: ... 1 1 1 1 )
#> ... 50 52 49 24 51 : -2.79 ( linkage phases: ... 1 1 1 1 )
#> ... 50 51 24 49 52 : -2.84 ( linkage phases: ... 1 1 1 1 )
#> ... 50 49 52 24 51 : -3.30 ( linkage phases: ... 1 1 1 1 )
We should do this to any of the orders we found, either using
try_seq
or order_seq
. Here, we choose
LG2_all
for didactic purposes only. The second argument,
ws = 4
, means that subsets (windows) of four markers will
be permuted sequentially (4!
orders for each window), to
search for other plausible orders. The LOD
argument means
that only orders with LOD Score smaller than 3.68 will be printed.
The output shows sequences of four numbers, because
ws = 4
. They are followed by an OK
if there is
no alternative order with LOD Score smaller than
LOD = LOD_sug
in absolute value, or by a list of
alternative orders. In the example, some sequences showed alternative
orders with LOD smaller than LOD = LOD_sug
. However, the
best order was the original one (LOD = 0.00
).
If there were an alternative order more likely than the original, one should check the difference between these orders (and linkage phases).
In some cases, even if there are no better alternative orders
suggested by ripple_seq
, the graphic showed a color pattern
different from the expected. Then, we can remove doubtful markers (for
this groups markers 23
and 29
) and try to
position them again. First, we use the function drop_marker
to remove the selected marker of our sequence.
The function will provide a sequence with the same order as the
estimated map (LG2_all
). After, we should estimate the map
again using this predefined order (see section Map estimation for an arbitrary order
for further information). For this we use the map
function:
Warning: If you find an error message like:
Error in as_mapper(.f, ...) : argument ".f" is missing, with no default
It’s because the map
function has a very common name,
and you can have in your environment other functions with the same name.
In the case of the pointed error, R is using the map
function from purrr
package instead of OneMap
,
to solve this, simply specify that you want the OneMap
function with ::
command from stringr
package:
library(stringr)
(LG2_test_map <- onemap::map(LG2_test_seq))
#>
#> Printing map:
#>
#> Markers Position Parent 1 Parent 2
#>
#> 27 M27 0.00 b | | o a | | a
#> 16 M16 11.75 a | | a o | | b
#> 20 M20 22.93 a | | b d | | c
#> 4 M4 35.70 a | | o b | | o
#> 21 M21 52.67 o | | o b | | a
#> 48 SNP21 53.59 a | | b a | | a
#> 9 M9 60.40 a | | b a | | a
#> 46 SNP18 66.83 a | | b a | | a
#> 45 SNP17 81.83 a | | b a | | a
#> 47 SNP20 106.46 a | | b a | | a
#> 50 SNP23 145.68 a | | b a | | b
#> 24 M24 150.69 a | | b a | | b
#> 49 SNP22 153.97 a | | b a | | b
#> 52 SNP25 158.87 a | | b a | | b
#> 51 SNP24 169.81 a | | b a | | b
#>
#> 15 markers log-likelihood: -780.5746
NOTE: If your sequence has many markers (more than
60), we suggest to speed up map
using BatchMap
parallelization approach. See section Speed up analysis with parallelization
for more information.
Now, we have the map without markers 23
and
51
.
We use the try_seq
function to positioned them
again:
LG2_test_seq <- try_seq(LG2_test_map, 23)
LG2_test_23 <- make_seq(LG2_test_seq, 6)
LG2_test_seq <- try_seq(LG2_test_23, 29)
LG2_test_23_29 <- make_seq(LG2_test_seq, 12)
rf_graph_table(LG2_test_23_29, mrk.axis = "numbers")
Marker 23
kept its previous position, but marker
29
was re-positioned, configuring now a gap between markers
47
and 50
. We removed marker 29
from our map because its color pattern is too different from expected.
Then, our final map is:
LG2_final <- LG2_test_23
LG2_final
#>
#> Printing map:
#>
#> Markers Position Parent 1 Parent 2
#>
#> 27 M27 0.00 b | | o a | | a
#> 16 M16 11.75 a | | a o | | b
#> 20 M20 22.93 a | | b d | | c
#> 4 M4 35.70 a | | o b | | o
#> 21 M21 50.26 o | | o b | | a
#> 23 M23 54.91 a | | o o | | a
#> 48 SNP21 71.92 a | | b a | | a
#> 9 M9 78.57 a | | b a | | a
#> 46 SNP18 85.00 a | | b a | | a
#> 45 SNP17 100.01 a | | b a | | a
#> 47 SNP20 124.66 a | | b a | | a
#> 50 SNP23 163.76 a | | b a | | b
#> 24 M24 168.76 a | | b a | | b
#> 49 SNP22 172.05 a | | b a | | b
#> 52 SNP25 176.95 a | | b a | | b
#> 51 SNP24 187.89 a | | b a | | b
#>
#> 16 markers log-likelihood: -812.0006
rf_graph_table(LG2_final, mrk.axis = "numbers")
Finally, linkage group 1 (the largest one) will be analyzed. Extract markers:
Construct the linkage map, by automatically using the try algorithm:
LG1_ord <- order_seq(LG1, n.init = 6, touchdown = TRUE)
#> Warning in compare_outcross(input.seq = input.seq, n.best = n.best, tol = tol, : This operation may take a VERY long time
Notice that the second round of try_seq
added markers
10
, 31
, 32
, 35
,
36
, and 40
.
LG1_ord
#>
#> Best sequence found.
#> Printing map:
#>
#> Markers Position Parent 1 Parent 2
#>
#> 34 SNP5 0.00 a | | b a | | b
#> 32 SNP2 13.48 a | | b a | | b
#> 12 M12 17.26 b | | a c | | a
#> 31 SNP1 22.81 a | | b a | | b
#> 3 M3 46.09 o | | a o | | o
#> 14 M14 59.22 a | | o b | | o
#> 2 M2 66.93 o | | o o | | a
#> 36 SNP7 92.49 b | | a b | | a
#> 35 SNP6 105.45 b | | a b | | a
#> 1 M1 109.86 b | | a b | | a
#> 10 M10 115.37 a | | a o | | b
#> 28 M28 132.05 a | | o b | | o
#> 26 M26 164.42 a | | b d | | c
#> 42 SNP13 179.54 b | | a b | | a
#> 6 M6 185.89 b | | a b | | a
#> 39 SNP10 192.06 b | | a b | | a
#> 41 SNP12 208.53 b | | a b | | a
#> 40 SNP11 225.65 b | | a b | | a
#> 15 M15 267.89 o | | a o | | b
#>
#> 19 markers log-likelihood: -1313.595
#>
#>
#>
#> The following markers could not be uniquely positioned.
#> Printing most likely positions for each unpositioned marker:
#>
#> ------------------------------------------
#> | | 5 | 11 | 30 | 33 | 37 | 38 |
#> |----|-----|-----|-----|-----|-----|-----|
#> | | | | | | | |
#> | 34 | | | | | | |
#> | | | | | *** | | |
#> | 32 | | | | | | |
#> | | | | *** | * | | |
#> | 12 | | | | | | |
#> | | | | | | | |
#> | 31 | | | | | | |
#> | | | | | | | |
#> | 3 | | | | | | |
#> | | | | | | | |
#> | 14 | | | | | | |
#> | | | | | | | |
#> | 2 | | | | | | |
#> | | | | | | * | |
#> | 36 | | | | | | |
#> | | | | | | ** | * |
#> | 35 | | | | | | |
#> | | | | | | | |
#> | 1 | | | | | | |
#> | | | | | | *** | *** |
#> | 10 | | | | | | |
#> | | | | | | | |
#> | 28 | | | | | | |
#> | | | | | | | |
#> | 26 | | | | | | |
#> | | | | | | | |
#> | 42 | | | | | | |
#> | | | | | | | |
#> | 6 | | | | | | |
#> | | | | | | | |
#> | 39 | | | | | | |
#> | | | | | | | |
#> | 41 | | | | | | |
#> | | | | | | | |
#> | 40 | | | | | | |
#> | | ** | ** | | | | |
#> | 15 | | | | | | |
#> | | *** | *** | | | | |
#> ------------------------------------------
#>
#> '***' indicates the most likely position(s) (LOD = 0.0)
#>
#> '**' indicates very likely positions (LOD > -1.0)
#>
#> '*' indicates likely positions (LOD > -2.0)
Now, get the order with all markers:
(LG1_frame <- make_seq(LG1_ord, "force"))
#>
#> Printing map:
#>
#> Markers Position Parent 1 Parent 2
#>
#> 34 SNP5 0.00 a | | b a | | b
#> 33 SNP3 12.27 a | | b a | | b
#> 32 SNP2 18.45 a | | b a | | b
#> 30 M30 21.21 a | | b a | | b
#> 12 M12 22.22 b | | a c | | a
#> 31 SNP1 27.77 a | | b a | | b
#> 3 M3 51.05 o | | a o | | o
#> 14 M14 64.18 a | | o b | | o
#> 2 M2 72.05 o | | o o | | a
#> 36 SNP7 95.86 b | | a b | | a
#> 37 SNP8 112.04 b | | a b | | a
#> 35 SNP6 126.75 b | | a b | | a
#> 1 M1 131.15 b | | a b | | a
#> 38 SNP9 137.28 b | | a b | | a
#> 10 M10 148.79 a | | a o | | b
#> 28 M28 165.35 a | | o b | | o
#> 26 M26 197.72 a | | b d | | c
#> 42 SNP13 212.84 b | | a b | | a
#> 6 M6 219.19 b | | a b | | a
#> 39 SNP10 225.36 b | | a b | | a
#> 41 SNP12 241.82 b | | a b | | a
#> 40 SNP11 258.95 b | | a b | | a
#> 15 M15 301.19 o | | a o | | b
#> 5 M5 306.20 o | | o o | | a
#> 11 M11 326.21 o | | o b | | a
#>
#> 25 markers log-likelihood: -1554.628
Check the map graphically:
Check for alternative orders:
ripple_seq(LG1_frame)
#> 34-33-32-30-|-12-...
#> Alternative orders:
#> 34 33 32 30 12 ... : 0.00 ( linkage phases: 1 1 1 4 ... )
#> 34 32 33 30 12 ... : -1.28 ( linkage phases: 1 1 1 4 ... )
#> 34 33 30 32 12 ... : -1.57 ( linkage phases: 1 1 1 4 ... )
#>
#> ...-34-|-33-32-30-12-|-31-...
#> Alternative orders:
#> 34 33 12 30 32 31 ... : 0.00 ( linkage phases: 1 4 4 1 1 ... )
#> 34 33 30 12 32 31 ... : -0.24 ( linkage phases: 1 1 4 4 1 ... )
#> 34 33 32 30 12 31 ... : -0.59 ( linkage phases: 1 1 1 4 4 ... )
#> 34 32 33 30 12 31 ... : -1.87 ( linkage phases: 1 1 1 4 4 ... )
#> 34 33 30 32 12 31 ... : -2.16 ( linkage phases: 1 1 1 4 4 ... )
#> 34 33 32 12 30 31 ... : -2.74 ( linkage phases: 1 1 4 4 1 ... )
#>
#> ...-33-|-32-30-12-31-|-3-...
#> Alternative orders:
#> ... 33 12 30 32 31 3 ... : 0.00 ( linkage phases: ... 4 4 1 1 4 ... )
#> ... 33 30 12 32 31 3 ... : -0.24 ( linkage phases: ... 1 4 4 1 4 ... )
#> ... 33 32 30 12 31 3 ... : -0.59 ( linkage phases: ... 1 1 4 4 4 ... )
#> ... 33 30 32 12 31 3 ... : -2.16 ( linkage phases: ... 1 1 4 4 4 ... )
#> ... 33 32 12 30 31 3 ... : -2.74 ( linkage phases: ... 1 4 4 1 4 ... )
#>
#> ...-32-|-30-12-31-3-|-14-...
#> Alternative orders:
#> ... 32 30 12 31 3 14 ... : 0.00 ( linkage phases: ... 1 4 4 4 4 ... )
#> ... 32 12 30 31 3 14 ... : -2.15 ( linkage phases: ... 4 4 1 4 4 ... )
#>
#> ...-30-|-12-31-3-14-|-2-... OK
#>
#> ...-12-|-31-3-14-2-|-36-... OK
#>
#> ...-31-|-3-14-2-36-|-37-... OK
#>
#> ...-3-|-14-2-36-37-|-35-...
#> Alternative orders:
#> ... 3 14 2 36 37 35 ... : 0.00 ( linkage phases: ... 4 2 3 1 1 ... )
#> ... 3 14 2 37 36 35 ... : -0.93 ( linkage phases: ... 4 2 3 1 1 ... )
#>
#> ...-14-|-2-36-37-35-|-1-...
#> Alternative orders:
#> ... 14 2 36 37 35 1 ... : 0.00 ( linkage phases: ... 2 3 1 1 1 ... )
#> ... 14 2 37 36 35 1 ... : -0.93 ( linkage phases: ... 2 3 1 1 1 ... )
#>
#> ...-2-|-36-37-35-1-|-38-...
#> Alternative orders:
#> ... 2 36 37 1 35 38 ... : 0.00 ( linkage phases: ... 3 1 1 1 1 ... )
#> ... 2 36 37 35 1 38 ... : -0.25 ( linkage phases: ... 3 1 1 1 1 ... )
#> ... 2 37 36 35 1 38 ... : -1.17 ( linkage phases: ... 3 1 1 1 1 ... )
#> ... 2 37 36 1 35 38 ... : -2.38 ( linkage phases: ... 3 1 1 1 1 ... )
#>
#> ...-36-|-37-35-1-38-|-10-...
#> Alternative orders:
#> ... 36 37 1 35 38 10 ... : 0.00 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 36 37 35 1 38 10 ... : -0.25 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 36 37 38 35 1 10 ... : -1.22 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 36 35 1 38 37 10 ... : -1.25 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 36 38 35 1 37 10 ... : -1.43 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 36 37 38 1 35 10 ... : -1.53 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 36 38 1 35 37 10 ... : -1.70 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 36 35 38 1 37 10 ... : -2.46 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 36 1 35 38 37 10 ... : -2.51 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 36 37 35 38 1 10 ... : -2.72 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 36 37 1 38 35 10 ... : -2.78 ( linkage phases: ... 1 1 1 1 1 ... )
#>
#> ...-37-|-35-1-38-10-|-28-...
#> Alternative orders:
#> ... 37 1 35 38 10 28 ... : 0.00 ( linkage phases: ... 1 1 1 1 4 ... )
#> ... 37 35 1 38 10 28 ... : -0.25 ( linkage phases: ... 1 1 1 1 4 ... )
#> ... 37 38 35 1 10 28 ... : -1.22 ( linkage phases: ... 1 1 1 1 4 ... )
#> ... 37 38 1 35 10 28 ... : -1.53 ( linkage phases: ... 1 1 1 1 4 ... )
#> ... 37 35 38 1 10 28 ... : -2.72 ( linkage phases: ... 1 1 1 1 4 ... )
#> ... 37 1 38 35 10 28 ... : -2.78 ( linkage phases: ... 1 1 1 1 4 ... )
#>
#> ...-35-|-1-38-10-28-|-26-...
#> Alternative orders:
#> ... 35 1 38 10 28 26 ... : 0.00 ( linkage phases: ... 1 1 1 4 2 ... )
#> ... 35 38 1 10 28 26 ... : -2.47 ( linkage phases: ... 1 1 1 4 2 ... )
#>
#> ...-1-|-38-10-28-26-|-42-... OK
#>
#> ...-38-|-10-28-26-42-|-6-... OK
#>
#> ...-10-|-28-26-42-6-|-39-...
#> Alternative orders:
#> ... 10 28 26 42 6 39 ... : 0.00 ( linkage phases: ... 4 2 3 1 1 ... )
#> ... 10 28 26 6 42 39 ... : -2.29 ( linkage phases: ... 4 2 3 1 1 ... )
#>
#> ...-28-|-26-42-6-39-|-41-...
#> Alternative orders:
#> ... 28 26 42 6 39 41 ... : 0.00 ( linkage phases: ... 2 3 1 1 1 ... )
#> ... 28 26 6 42 39 41 ... : -2.29 ( linkage phases: ... 2 3 1 1 1 ... )
#>
#> ...-26-|-42-6-39-41-|-40-...
#> Alternative orders:
#> ... 26 42 6 39 41 40 ... : 0.00 ( linkage phases: ... 3 1 1 1 1 ... )
#> ... 26 6 42 39 41 40 ... : -2.29 ( linkage phases: ... 3 1 1 1 1 ... )
#>
#> ...-42-|-6-39-41-40-|-15-... OK
#>
#> ...-6-|-39-41-40-15-|-5-... OK
#>
#> ...-39-|-41-40-15-5-|-11-...
#> Alternative orders:
#> ... 39 41 40 15 5 11 : 0.00 ( linkage phases: ... 1 1 1 1 1 )
#> ... 39 41 40 5 15 11 : -2.57 ( linkage phases: ... 1 1 1 1 1 )
#>
#> 41-|-40-15-5-11
#> Alternative orders:
#> ... 41 40 15 5 11 : 0.00 ( linkage phases: ... 1 1 1 1 )
#> ... 41 40 11 5 15 : -0.83 ( linkage phases: ... 1 1 1 1 )
#> ... 41 40 5 15 11 : -2.57 ( linkage phases: ... 1 1 1 1 )
#> ... 41 40 11 15 5 : -2.59 ( linkage phases: ... 1 1 1 1 )
No better order was observed.
Let’s check how it behaves with the MDS approach:
LG1_mds <- mds_onemap(LG1, rm_unlinked = TRUE, hmm = F)
#> Stress: 0.113012203185713
#> Mean Nearest Neighbour Fit: 22.2771218113624
Based on the drafts from order_seq
or/and
mds_onemap
, we can remove some doubtful markers accordingly
with the graphic, try to position them again, and decide if and where we
will maintain them.
LG1_extend <- try_seq(LG1_test_map,10)
LG1_test_map <- make_seq(LG1_extend,15)
LG1_extend <- try_seq(LG1_test_map,11)
LG1_test <- make_seq(LG1_extend,23) # We choose to remove this marker
LG1_extend <- try_seq(LG1_test_map,28)
LG1_test_map <- make_seq(LG1_extend,16)
LG1_extend <- try_seq(LG1_test_map,42)
LG1_final <- make_seq(LG1_extend,17)
Print it:
LG1_final
#>
#> Printing map:
#>
#> Markers Position Parent 1 Parent 2
#>
#> 34 SNP5 0.00 a | | b a | | b
#> 33 SNP3 12.27 a | | b a | | b
#> 32 SNP2 18.45 a | | b a | | b
#> 30 M30 21.21 a | | b a | | b
#> 12 M12 22.22 b | | a c | | a
#> 31 SNP1 27.77 a | | b a | | b
#> 3 M3 51.05 o | | a o | | o
#> 14 M14 64.18 a | | o b | | o
#> 2 M2 72.10 o | | o o | | a
#> 36 SNP7 95.93 b | | a b | | a
#> 37 SNP8 112.12 b | | a b | | a
#> 35 SNP6 126.84 b | | a b | | a
#> 1 M1 131.24 b | | a b | | a
#> 38 SNP9 137.34 b | | a b | | a
#> 10 M10 150.59 a | | a o | | b
#> 28 M28 166.82 a | | o b | | o
#> 42 SNP13 204.00 b | | a b | | a
#> 26 M26 220.38 a | | b d | | c
#> 6 M6 229.36 b | | a b | | a
#> 39 SNP10 235.52 b | | a b | | a
#> 41 SNP12 251.89 b | | a b | | a
#> 40 SNP11 268.92 b | | a b | | a
#> 15 M15 309.70 o | | a o | | b
#> 5 M5 314.72 o | | o o | | a
#>
#> 24 markers log-likelihood: -1518.795
As an option, different algorithms to order markers could be applied:
There are some differences between the results. Seriation did not provide good results in this case. See Mollinari et al. (2009) for an evaluation of these methods.
In our example, we have reference genome chromosome and position information for some of the markers; here, we will exemplify one method of using this information to help build the genetic map.
With the CHROM
information in the input file, you can
identify markers belonging to some chromosome using the function
make_seq
with the rf_2pts
object. For example,
assign the string "1"
for the second argument to get
chromosome 1 makers. The output sequence will be automatically ordered
by POS
information.
CHR1 <- make_seq(twopts, "1")
CHR1
#>
#> Number of markers: 12
#> Markers in the sequence:
#> SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10 SNP11 SNP12 SNP13
#>
#> Parameters not estimated.
CHR2 <- make_seq(twopts, "2")
CHR3 <- make_seq(twopts, "3")
Here we use the string "1"
because it is our chromosome
ID, you can have a different string as ID, check this with:
We can see that we have markers without chromosome information
(NA
) and markers with chromosome ID "1"
,
"2"
and "3"
.
According to CHROM
information we have three defined
linkage groups, now we can try to group the markers without chromosome
information to them using recombination information. For this, we can
use the function group_seq
:
CHR_mks <- group_seq(input.2pts = twopts, seqs = "CHROM", unlink.mks = mark_no_dist,
repeated = FALSE)
#> Selecting markers:
#> group 1
#> ........................
#> group 2
#> ........
#> group 3
#> ....
#> Selecting markers:
#> group 1
#> ......
#> group 2
#> ............
#> group 3
#> ........
#> Selecting markers:
#> group 1
#> ................
#> group 2
#> ............
#> group 3
#> ....
The function works as the function group
but considering
pre-existing sequences. Setting seqs
argument with the
string "CHROM"
, it will consider the pre-existing sequences
according to CHROM
information. You can also indicate other
pre-existing sequences if they make sense for your study. For that, you
should inform a list of objects of class sequences
, as the
example:
CHR_mks <- group_seq(input.2pts = twopts, seqs = list(CHR1=CHR1, CHR2=CHR2, CHR3=CHR3),
unlink.mks = mark_no_dist, repeated = FALSE)
In this case, the command had the same effect as the previous because we indicate chromosome sequences, but other sequences can be used.
The unlink.mks
argument receives an object of class
sequence
; this defines which markers will be tested to
group with the sequences in seqs
. In our example, we will
indicate only the markers with no segregation distortion, using the
sequence mark_no_dist
. It is also possible to use the
string "all"
to test all the remaining markers at the
rf_2pts
object.
In some cases, the same marker can group into more than one sequence;
those markers will be considered repeated
. We can choose if
we want to remove or not (FALSE/TRUE
) them of the output
sequences, with the argument repeated
. Anyway, their
numbers will be informed at the list repeated
in the output
object. In the example case, there are no repeated markers. However, if
they exist, it could indicate that their groups actually constitute the
same group. Also, genotyping errors can generate repeated markers.
Anyway, they deserve better investigations.
We can access detailed information about the results, just printing:
CHR_mks
#> This is an object of class 'group_seq'
#> Criteria used to assign markers to groups:
#> LOD = 3 , Maximum recombination fraction = 0.5
#>
#> No. markers in input sequences:
#> CHR1 : 12 markers
#> CHR2 : 2 markers
#> CHR3 : 8 markers
#>
#> No. unlinked input markers: 27 markers
#>
#> No. markers in output sequences:
#> CHR1 : 25 markers
#> CHR2 : 7 markers
#> CHR3 : 17 markers
#> No. unlinked: 0 markers
#> No. repeated: 0 markers
#>
#> Printing output sequences:
#> Group CHR1 : 25 markers
#> SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10 SNP11 SNP12 SNP13 M1 M2 M3 M5 M6 M10 M11 M12 M14 M15 M26 M28 M30
#>
#> Group CHR2 : 7 markers
#> SNP14 SNP16 M7 M8 M13 M18 M22
#>
#> Group CHR3 : 17 markers
#> SNP17 SNP18 SNP20 SNP21 SNP22 SNP23 SNP24 SNP25 M4 M9 M16 M20 M21 M23 M24 M27 M29
#>
#> Unlinked markers: 0 markers
#>
#>
#> Repeated markers: 0 markers
#>
Also, we can access the numbers of repeated markers with:
We have no repeated markers.
In the same way, we can access the output sequences:
CHR_mks$sequences$CHR1
#>
#> Number of markers: 25
#> Markers in the sequence:
#> SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10 SNP11 SNP12 SNP13 M1 M2 M3 M5 M6
#> M10 M11 M12 M14 M15 M26 M28 M30
#>
#> Parameters not estimated.
# or
CHR_mks$sequences[[1]]
#>
#> Number of markers: 25
#> Markers in the sequence:
#> SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10 SNP11 SNP12 SNP13 M1 M2 M3 M5 M6
#> M10 M11 M12 M14 M15 M26 M28 M30
#>
#> Parameters not estimated.
For this function, optional arguments are LOD
and
max.rf
, which define thresholds to be used when assigning
markers to linkage groups. If none is provided (default), criteria
previously defined for the object rf_2pts
are used.
Now we can order the markers in each group as we made before in (Genetic mapping of linkage group 1,2 and 3). As shown, we can choose different approaches to order the markers.
To order those groups, first, we will use the order_seq
function to access a preliminary order, and after, we will edit some
marker’s positions or remove some of them according to their color
pattern in the rf_graph_table
graphic, and other parameters
as likelihood and map size.
# CHR1_frame <- mds_onemap(CHR_mks$sequences$CHR1, hmm = F)
# or
CHR1_ord <- order_seq(CHR_mks$sequences$CHR1)
CHR1_frame <- make_seq(CHR1_ord, "force")
The group is similar to that built before with only recombinations
information. We will better explore differences in the later step. Only
marker 11
does not follow the expected color pattern; then,
we will try to reposition it.
CHR1_test_seq <- drop_marker(CHR1_frame, 11)
CHR1_test_map <- onemap::map(CHR1_test_seq)
CHR1_add11_seq <- try_seq(CHR1_test_map, 11)
CHR1_add11 <- make_seq(CHR1_add11_seq, 25) # marker 11 was placed at the same position as before
Based in those results, we decide not to include marker 11 in our map.
CHR1_test_map
#>
#> Printing map:
#>
#> Markers Position Parent 1 Parent 2
#>
#> 34 SNP5 0.00 a | | b a | | b
#> 33 SNP3 12.27 a | | b a | | b
#> 30 M30 15.58 a | | b a | | b
#> 12 M12 16.59 b | | a c | | a
#> 32 SNP2 20.15 a | | b a | | b
#> 31 SNP1 28.31 a | | b a | | b
#> 3 M3 51.17 o | | a o | | o
#> 14 M14 64.54 a | | o b | | o
#> 2 M2 72.44 o | | o o | | a
#> 36 SNP7 96.32 b | | a b | | a
#> 35 SNP6 109.13 b | | a b | | a
#> 1 M1 113.53 b | | a b | | a
#> 38 SNP9 119.59 b | | a b | | a
#> 37 SNP8 136.22 b | | a b | | a
#> 10 M10 149.87 a | | a o | | b
#> 28 M28 167.09 a | | o b | | o
#> 26 M26 199.46 a | | b d | | c
#> 42 SNP13 214.58 b | | a b | | a
#> 6 M6 220.93 b | | a b | | a
#> 39 SNP10 227.10 b | | a b | | a
#> 41 SNP12 243.56 b | | a b | | a
#> 40 SNP11 260.69 b | | a b | | a
#> 15 M15 302.93 o | | a o | | b
#> 5 M5 307.95 o | | o o | | a
#>
#> 24 markers log-likelihood: -1507.316
Checking for better orders:
ripple_seq(CHR1_final)
#> 34-33-30-12-|-32-...
#> Alternative orders:
#> 34 33 12 30 32 ... : 0.00 ( linkage phases: 1 4 4 1 ... )
#> 34 33 30 12 32 ... : -0.24 ( linkage phases: 1 1 4 4 ... )
#>
#> ...-34-|-33-30-12-32-|-31-...
#> Alternative orders:
#> 34 33 12 30 32 31 ... : 0.00 ( linkage phases: 1 4 4 1 1 ... )
#> 34 33 30 12 32 31 ... : -0.24 ( linkage phases: 1 1 4 4 1 ... )
#> 34 33 32 30 12 31 ... : -0.59 ( linkage phases: 1 1 1 4 4 ... )
#> 34 32 33 30 12 31 ... : -1.87 ( linkage phases: 1 1 1 4 4 ... )
#> 34 33 30 32 12 31 ... : -2.16 ( linkage phases: 1 1 1 4 4 ... )
#> 34 33 32 12 30 31 ... : -2.74 ( linkage phases: 1 1 4 4 1 ... )
#>
#> ...-33-|-30-12-32-31-|-3-...
#> Alternative orders:
#> ... 33 12 30 32 31 3 ... : 0.00 ( linkage phases: ... 4 4 1 1 3 ... )
#> ... 33 30 12 32 31 3 ... : -0.24 ( linkage phases: ... 1 4 4 1 3 ... )
#> ... 33 32 30 12 31 3 ... : -0.59 ( linkage phases: ... 1 1 4 4 3 ... )
#> ... 33 30 32 12 31 3 ... : -2.16 ( linkage phases: ... 1 1 4 4 3 ... )
#> ... 33 32 12 30 31 3 ... : -2.74 ( linkage phases: ... 1 4 4 1 3 ... )
#>
#> ...-30-|-12-32-31-3-|-14-...
#> Alternative orders:
#> ... 30 12 32 31 3 14 ... : 0.00 ( linkage phases: ... 4 4 1 3 3 ... )
#> ... 30 32 12 31 3 14 ... : -1.92 ( linkage phases: ... 1 4 4 3 3 ... )
#>
#> ...-12-|-32-31-3-14-|-2-... OK
#>
#> ...-32-|-31-3-14-2-|-36-... OK
#>
#> ...-31-|-3-14-2-36-|-35-... OK
#>
#> ...-3-|-14-2-36-35-|-1-... OK
#>
#> ...-14-|-2-36-35-1-|-38-...
#> Alternative orders:
#> ... 14 2 36 35 1 38 ... : 0.00 ( linkage phases: ... 2 3 1 1 1 ... )
#> ... 14 2 36 1 35 38 ... : -1.26 ( linkage phases: ... 2 3 1 1 1 ... )
#>
#> ...-2-|-36-35-1-38-|-37-...
#> Alternative orders:
#> ... 2 38 1 35 36 37 ... : 0.00 ( linkage phases: ... 3 1 1 1 1 ... )
#> ... 2 36 35 1 38 37 ... : -0.97 ( linkage phases: ... 3 1 1 1 1 ... )
#> ... 2 35 1 38 36 37 ... : -1.13 ( linkage phases: ... 3 1 1 1 1 ... )
#> ... 2 38 35 1 36 37 ... : -1.15 ( linkage phases: ... 3 1 1 1 1 ... )
#> ... 2 36 38 35 1 37 ... : -1.16 ( linkage phases: ... 3 1 1 1 1 ... )
#> ... 2 36 38 1 35 37 ... : -1.43 ( linkage phases: ... 3 1 1 1 1 ... )
#> ... 2 36 35 38 1 37 ... : -2.18 ( linkage phases: ... 3 1 1 1 1 ... )
#> ... 2 36 1 35 38 37 ... : -2.23 ( linkage phases: ... 3 1 1 1 1 ... )
#>
#> ...-36-|-35-1-38-37-|-10-...
#> Alternative orders:
#> ... 36 37 1 35 38 10 ... : 0.00 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 36 37 35 1 38 10 ... : -0.25 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 36 37 38 35 1 10 ... : -1.22 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 36 35 1 38 37 10 ... : -1.25 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 36 38 35 1 37 10 ... : -1.43 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 36 37 38 1 35 10 ... : -1.53 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 36 38 1 35 37 10 ... : -1.70 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 36 35 38 1 37 10 ... : -2.46 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 36 1 35 38 37 10 ... : -2.51 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 36 37 35 38 1 10 ... : -2.72 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 36 37 1 38 35 10 ... : -2.78 ( linkage phases: ... 1 1 1 1 1 ... )
#>
#> ...-35-|-1-38-37-10-|-28-...
#> Alternative orders:
#> ... 35 1 38 37 10 28 ... : 0.00 ( linkage phases: ... 1 1 1 1 4 ... )
#> ... 35 38 1 37 10 28 ... : -1.21 ( linkage phases: ... 1 1 1 1 4 ... )
#>
#> ...-1-|-38-37-10-28-|-26-... OK
#>
#> ...-38-|-37-10-28-26-|-42-... OK
#>
#> ...-37-|-10-28-26-42-|-6-... OK
#>
#> ...-10-|-28-26-42-6-|-39-...
#> Alternative orders:
#> ... 10 28 26 42 6 39 ... : 0.00 ( linkage phases: ... 4 2 3 1 1 ... )
#> ... 10 28 26 6 42 39 ... : -2.29 ( linkage phases: ... 4 2 3 1 1 ... )
#>
#> ...-28-|-26-42-6-39-|-41-...
#> Alternative orders:
#> ... 28 26 42 6 39 41 ... : 0.00 ( linkage phases: ... 2 3 1 1 1 ... )
#> ... 28 26 6 42 39 41 ... : -2.29 ( linkage phases: ... 2 3 1 1 1 ... )
#>
#> ...-26-|-42-6-39-41-|-40-...
#> Alternative orders:
#> ... 26 42 6 39 41 40 ... : 0.00 ( linkage phases: ... 3 1 1 1 1 ... )
#> ... 26 6 42 39 41 40 ... : -2.29 ( linkage phases: ... 3 1 1 1 1 ... )
#>
#> ...-42-|-6-39-41-40-|-15-... OK
#>
#> ...-6-|-39-41-40-15-|-5-... OK
#>
#> 39-|-41-40-15-5
#> Alternative orders:
#> ... 39 41 40 15 5 : 0.00 ( linkage phases: ... 1 1 1 1 )
#> ... 39 41 40 5 15 : -0.81 ( linkage phases: ... 1 1 1 1 )
# CHR2_frame <- mds_onemap(CHR_mks$sequences$CHR2)
# or
CHR2_ord <- order_seq(CHR_mks$sequences$CHR2)
CHR2_frame <- make_seq(CHR2_ord, "force")
As did before, we will not change the positions of the markers of this group.
# CHR2_frame <- mds_onemap(CHR_mks$sequences$CHR2)
# or
CHR2_ord <- order_seq(CHR_mks$sequences$CHR2)
CHR2_frame <- make_seq(CHR2_ord, "force")
As did before, we will not change the positions of the markers of this group.
# CHR3_frame <- mds_onemap(CHR_mks$sequences$CHR3)
# or
CHR3_ord <- order_seq(CHR_mks$sequences$CHR3)
CHR3_frame <- make_seq(CHR3_ord, "force")
Here, marker 29
has a color pattern too different from
the expected, removing it could be influential in other markers
ordering. Then we will remove them and search for a new order.
CHR3_test_seq <- drop_marker(CHR3_frame, c(29))
CHR3_test_ord <- order_seq(CHR3_test_seq)
CHR3_test_map <- make_seq(CHR3_test_ord, "force")
Trying to add marker 29
again.
CHR3_add29_seq <- try_seq(CHR3_test_map, 29)
CHR3_add29 <- make_seq(CHR3_add29_seq, 12) # Marker 29 increase the map size disproportionately, it was removed from the map
Checking for better orders:
ripple_seq(CHR3_final)
#> 27-16-20-4-|-21-... OK
#>
#> ...-27-|-16-20-4-21-|-23-... OK
#>
#> ...-16-|-20-4-21-23-|-48-...
#> Alternative orders:
#> ... 16 20 4 21 23 48 ... : 0.00 ( linkage phases: ... 1 2 2 1 1 ... )
#> ... 16 20 4 23 21 48 ... : -2.20 ( linkage phases: ... 1 2 2 1 1 ... )
#>
#> ...-20-|-4-21-23-48-|-9-...
#> Alternative orders:
#> ... 20 4 21 23 48 9 ... : 0.00 ( linkage phases: ... 2 2 1 1 1 ... )
#> ... 20 4 23 21 48 9 ... : -2.20 ( linkage phases: ... 2 2 1 1 1 ... )
#>
#> ...-4-|-21-23-48-9-|-46-...
#> Alternative orders:
#> ... 4 21 23 48 9 46 ... : 0.00 ( linkage phases: ... 2 1 1 1 1 ... )
#> ... 4 23 21 48 9 46 ... : -2.20 ( linkage phases: ... 2 1 1 1 1 ... )
#>
#> ...-21-|-23-48-9-46-|-45-...
#> Alternative orders:
#> ... 21 23 48 9 46 45 ... : 0.00 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 21 23 48 46 9 45 ... : -1.00 ( linkage phases: ... 1 1 1 1 1 ... )
#>
#> ...-23-|-48-9-46-45-|-47-...
#> Alternative orders:
#> ... 23 48 9 46 45 47 ... : 0.00 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 23 48 46 9 45 47 ... : -1.00 ( linkage phases: ... 1 1 1 1 1 ... )
#> ... 23 48 45 9 46 47 ... : -1.70 ( linkage phases: ... 1 1 1 1 1 ... )
#>
#> ...-48-|-9-46-45-47-|-50-...
#> Alternative orders:
#> ... 48 9 46 45 47 50 ... : 0.00 ( linkage phases: ... 1 1 1 1 2 ... )
#> ... 48 46 9 45 47 50 ... : -1.00 ( linkage phases: ... 1 1 1 1 2 ... )
#> ... 48 45 9 46 47 50 ... : -1.70 ( linkage phases: ... 1 1 1 1 2 ... )
#>
#> ...-9-|-46-45-47-50-|-24-... OK
#>
#> ...-46-|-45-47-50-24-|-49-... OK
#>
#> ...-45-|-47-50-24-49-|-52-... OK
#>
#> ...-47-|-50-24-49-52-|-51-...
#> Alternative orders:
#> ... 47 52 49 24 50 51 : 0.00 ( linkage phases: ... 2 1 1 1 1 )
#> ... 47 50 24 49 52 51 : -0.34 ( linkage phases: ... 2 1 1 1 1 )
#> ... 47 49 52 24 50 51 : -2.56 ( linkage phases: ... 2 1 1 1 1 )
#>
#> 50-|-24-49-52-51
#> Alternative orders:
#> ... 50 24 49 52 51 : 0.00 ( linkage phases: ... 1 1 1 1 )
#> ... 50 24 52 49 51 : -2.75 ( linkage phases: ... 1 1 1 1 )
Once all linkage groups were obtained using both strategies, we can
draw a map for each approach using the function draw_map
.
Since version 2.1.1007, OneMap
has a new version of
draw_map
, called draw_map2
. The new function
draws elegant linkage groups and presents new arguments to personalize
your draw.
If you prefer the old function, we also keep it. Follow examples on how to use both of them.
Drawing the map, which was built with only recombinations information.
map1 <- list(LG1_final, LG2_final, LG3_final)
draw_map(map1, names = TRUE, grid = TRUE, cex.mrk = 0.7)
Drawing the map, which built with reference genome and recombinations information
map2 <- list(CHR1_final, CHR2_final, CHR3_final)
draw_map(map2, names = TRUE, grid = TRUE, cex.mrk = 0.7)
We also can draw maps comparing corresponding linkage groups in each strategy:
CHR1_comp <- list(LG1_final, CHR1_final)
draw_map(CHR1_comp, names = TRUE, grid = TRUE, cex.mrk = 0.7)
CHR2_comp <- list(LG3_final, CHR2_final)
draw_map(CHR2_comp, names = TRUE, grid = TRUE, cex.mrk = 0.7)
Both strategies produced the same result for CHR2 (the map is only inverted).
CHR3_comp <- list(LG2_final, CHR3_final)
draw_map(CHR3_comp, names = TRUE, grid = TRUE, cex.mrk = 0.7)
Or groups alone:
Function draw_map
draws a straightforward graphic
representation of the genetic map. More recently, we developed a new
version called draw_map2
that brings a more sophisticated
figure. Furthermore, once the distances and the linkage phases are
estimated, other map figures can be drawn by the user with any
appropriate software. There are several free software that can be used,
such as MapChart
(Voorrips, 2002).
The same figures did with draw_map
can be done with the
draw_map2
function. But it has different capacities and
arguments. Here are some examples, but you can find more options on the
help page ?write_map2
.
Drawing the map, which was built with only recombinations information:
draw_map2(LG1_final, LG2_final, LG3_final, main = "Only with linkage information",
group.names = c("LG1", "LG2", "LG3"), output = "map.png")
NOTE: Check the GitHub vignette version to visualize the graphic.
The figure will be saved in your work directory with the default name
map.eps
. You can change the file name and extension
specifying them in the argument output
.
Drawing the map, which was built with reference genome and recombinations information
draw_map2(CHR1_final, CHR2_final, CHR3_final, output= "map_ref.pdf",
col.group = "#58A4B0",
col.mark= "#335C81")
NOTE: Check the GitHub vignette version to visualize the graphic.
With the argument tag
, we can highlight some markers
with other colors. The arguments col.group
,
col.mark
and col.tag
can be changed to
personalize the color of the groups, the markers, and the highlighted
markers, respectively.
We also can draw maps comparing corresponding linkage groups in each strategy:
NOTE: Check the GitHub vignette version to visualize the graphic.
When defining marker names in the tag
argument, all
markers with these names will be highlighted no matter in which group
it/they is/are.
draw_map2(LG2_final, CHR3_final, tag= c("SNP17", "SNP18", "M29"), main = "Chromosome 3",
group.names = c("Only linkage", "With genome"), centered = TRUE, output = "map_comp2.pdf")
NOTE: Check the GitHub vignette version to visualize the graphic.
If for any reason, one wants to estimate parameters for a given
linkage map (e.g., for other orders on published papers), it is
possible to define a sequence and use the map
function. For
example, for markers M30
, M12
,
M3
, M14
and M2
, in this order,
use:
any_seq <- make_seq(twopts, c(30, 12, 3, 14, 2))
(any_seq_map <- map(any_seq))
#>
#> Printing map:
#>
#> Markers Position Parent 1 Parent 2
#>
#> 30 M30 0.00 a | | b a | | b
#> 12 M12 1.00 b | | a c | | a
#> 3 M3 20.58 o | | a o | | o
#> 14 M14 33.63 a | | o b | | o
#> 2 M2 41.70 o | | o o | | a
#>
#> 5 markers log-likelihood: -320.9012
NOTE: If your sequence has many markers (more than
60), we suggest to speed up map
using BatchMap
parallelization approach. See section Speed up analysis with parallelization
for more information.
Warning: If you find an error message like:
Error in as_mapper(.f, ...) : argument ".f" is missing, with no default
It’s because the map
function has a very common name,
and you can have in your environment other functions with the same name.
In the case of the pointed error, R is using map
function
from purrr
package instead of OneMap
, to solve
this, simply specify that you want the OneMap
function with
::
command from stringr
package::
This is a subset of the first linkage group. When used this way, the
map
function searches for the best combination of phases
between markers and prints the results.
Warning: It is not our case in this example, but,
sometimes, it can happen that some markers in your sequence don’t reach
the OneMap
linkage criteria when linkage are estimated by
HMM multipoint approach using map
, it will produce an error
like this:
ERROR: The linkage between markers 1 and 2 did not reach the OneMap default criteria. They are probably segregating independently
You can evaluate the marker manually, or you can remove them
automatically using the map
argument
rm_unlinked = TRUE
. The map
function will
return a vector with marker numbers excluding the problematic marker;
then, you can repeat the process without the marker, using
make_seq
to create a new sequence and repeat the
map
. You can also do it automatically using the
map_avoid_unlinked
function:
Using this, if map
finds a problematic marker, it will
print a warning pointing to the marker number, which was removed, and
will automatically repeat the analysis without it.
Furthermore, a sequence can also have user-defined linkage phases. The next example shows (incorrect) phases used for the same order of markers:
any_seq <- make_seq(twopts, c(30, 12, 3, 14, 2), phase = c(4, 1, 4, 3))
(any_seq_map <- map(any_seq))
If one needs to add or drop markers from a predefined sequence,
functions add_marker
and drop_marker
can be
used. For example, to add markers 4 to 8 to any_seq
.
(any_seq <- add_marker(any_seq, 4:8))
#>
#> Number of markers: 10
#> Markers in the sequence:
#> M30 M12 M3 M14 M2 M4 M5 M6 M7 M8
#>
#> Parameters not estimated.
Removing markers 3, 4, 5, 12, and 30 from any_seq
:
(any_seq <- drop_marker(any_seq, c(3, 4, 5, 12, 30)))
#>
#> Number of markers: 5
#> Markers in the sequence:
#> M14 M2 M6 M7 M8
#>
#> Parameters not estimated.
After that, the map needs to be re-estimated.
Warning: Only available for outcrossing and f2 intercross populations.
As already mentioned, OneMap
uses HMM multipoint
approach to estimate genetic distances, a very robust method, but it can
take time to run if you have many markers. In 2017, Schiffthaler et
al. released an OneMap
fork with modifications in CRAN and
in GitHub with the possibility of parallelizing the HMM chain dividing
markers in batches and use different cores for each phase. Their
approach speeds up our HMM and keeps the genetic distance estimation
quality. It allows us to divide the job into a maximum of four cores
according to the four possible phases for outcrossing mapping
populations. We add this parallelized approach to the functions:
map
, mds_onemap
, seriation
,
rcd
, record
and ug
. For better
efficiency, batches must be composed of 50 markers or more; therefore,
this approach is only recommended for linkage groups with many
markers.
The parallelization is here available for all types of operational
systems, however, we suggest setting argument
parallelization.type
to FORK
if you are not
using Windows system. It will improve the procedure speed.
Here we will show an example of how to use the BatchMap approach in some functions that requires HMM. For this, we simulated a group with 294 markers (we don’t want this vignette to take too much time to run, but usually maps with markers from high-throughput technologies result in larger groups). Before start, you can see the time spent on each approach (see also Session Info) in this example:
Without parallelization (h) | With parallelization (h) | |
---|---|---|
rcd | 0.6700558 | 0.1612458 |
record_map | 1.4368436 | 0.2907308 |
ug_map | 0.7145778 | 0.1884214 |
mds_onemap | 1.0643083 | 0.2827314 |
map | 2.0994486 | 0.6107456 |
Reading the simulated dataset:
simParallel <- read_onemap(system.file(package = "onemap", "extdata/simParall_out.raw")) # dataset available only in onemap github version
plot(simParallel, all=FALSE)
# Calculates two-points recombination fractions
twopts <- rf_2pts(simParallel)
seq_all <- make_seq(twopts, "all")
# There are no redundant markers
find_bins(simParallel)
# There are no distorted markers
p <- plot(test_segregation(simParallel))
To prepare the data with a defined bach size, we use the function
pick_batch_sizes
. It selects a batch size that splits the
data into even groups. Argument size
defines the batch size
next to which an optimum size will be searched. overlap
defines the number of markers that overlap between the present batch and
next. This is used because pre-defined phases at these overlap markers
in the present batch are used to start the HMM in the next batch. The
around
argument defines how much the function can vary
around the defined number in size
to search for the optimum
batch size.
Some aspects should be considered to define these arguments because if the batch size were set too high, there would be less gain in execution time. If the overlap size were too small, phases would be incorrectly estimated, and large gaps would appear in the map, inflating its size. In practice, these values will depend on many factors such as population size, marker quality, and species. BatchMap authors recommended trying several configurations on a subset of data and select the best performing one.
batch_size <- pick_batch_sizes(input.seq = seq_all,
size = 80,
overlap = 30,
around = 10)
batch_size
To use parallelized approach you just need to include the arguments when using the functions:
# Without parallelization
rcd_map <- rcd(input.seq = seq_all)
# With parallelization
rcd_map_par <- rcd(input.seq = seq_all,
phase_cores = 4,
size = batch_size,
overlap = 30)
a <- rf_graph_table(rcd_map, mrk.axis = "none")
b <- rf_graph_table(rcd_map_par, mrk.axis = "none")
p <- ggarrange(a,b , common.legend = TRUE,
labels = c("rcd", "rcd + parallel"),
vjust = 0.2,
hjust= -1.4,
font.label = list(size = 10),
ncol=2, nrow=1)
ggsave(p, filename = "rcd.jpg")
# Without parallelization
record_map <- record(input.seq = seq_all)
# With parallelization
record_map_par <- record(input.seq = seq_all,
phase_cores = 4,
size = batch_size,
overlap = 30)
a <- rf_graph_table(record_map, mrk.axis = "none")
b <- rf_graph_table(record_map_par, mrk.axis = "none")
p <- ggarrange(a,b , common.legend = TRUE,
labels = c("record", "record + parallel"),
vjust = 0.2,
hjust= -0.8,
font.label = list(size = 10),
ncol=2, nrow=1)
ggsave(p, filename = "record.jpg")
Because we simulate this dataset, we know the correct order. We can
use map_overlapping_batches
to estimate genetic distance in
this case. This is equivalent to map
, but with a
parallelized process.
Similarly, with map
, using argument
rm_unlinked = TRUE
the function will return a vector with
marker numbers without the problematic marker. To repeat the analysis
removing automatically all problematic markers use
map_avoid_unlinked
:
# Without parallelization
batch_map <- map_avoid_unlinked(input.seq = seq_all)
# With parallelization
batch_map_par <- map_avoid_unlinked(input.seq = seq_all,
size = batch_size,
phase_cores = 4,
overlap = 30)
a <- rf_graph_table(batch_map, mrk.axis = "none")
b <- rf_graph_table(batch_map_par, mrk.axis = "none")
p <- ggarrange(a,b , common.legend = TRUE,
labels = c("map", "map + parallel"),
vjust = 0.2,
hjust= -1,
font.label = list(size = 10),
ncol=2, nrow=1)
ggsave(p, filename = "map.jpg")
As you can see in the above maps, heuristic ordering algorithms do not return an optimal order result, mostly if you don’t have many individuals in your population. Because of the erroneous order, generated map size is not close to the simulated size (100 cM) and their heatmaps don’t present the expected color pattern. Two of them get close to the color pattern, they are the ug and the MDS method. They present the right global ordering but not local. If you have a reference genome, you can use its position information to rearrange the local order.
OneMap is designed with a nested object structure, ensuring that
every sequence
object contains all the necessary
information to reproduce your analysis. While this structure enhances
recoverability and organization, it can lead to significantly large
objects when saving your data. To address this, we have developed
functions to streamline and optimize the saving process.
my_sequences <- list(CHR1_final, CHR2_final, CHR3_final)
save_onemap_sequences(my_sequences, filename = "out_final_sequences.RData")
my_sequences <- load_onemap_sequences("out_final_sequences.RData")
# access the sequences
my_sequences[[1]]
# access the onemap object
onemap.obj <- my_sequences[[1]]$data.name
# access the twopoints object
onemap.obj <- my_sequences[[1]]$twopt
In the older version, users could only access the estimated linkage phase by observing the print in the console:
CHR3_final
#>
#> Printing map:
#>
#> Markers Position Parent 1 Parent 2
#>
#> 27 M27 0.00 b | | o a | | a
#> 16 M16 11.76 a | | a b | | o
#> 20 M20 22.94 a | | b c | | d
#> 4 M4 35.71 a | | o o | | b
#> 21 M21 49.41 o | | o a | | b
#> 23 M23 55.24 a | | o a | | o
#> 48 SNP21 67.50 a | | b a | | a
#> 9 M9 74.02 a | | b a | | a
#> 46 SNP18 80.43 a | | b a | | a
#> 45 SNP17 95.40 a | | b a | | a
#> 47 SNP20 119.95 a | | b a | | a
#> 50 SNP23 158.43 a | | b b | | a
#> 24 M24 163.43 a | | b b | | a
#> 49 SNP22 166.72 a | | b b | | a
#> 52 SNP25 171.62 a | | b b | | a
#> 51 SNP24 182.56 a | | b b | | a
#>
#> 16 markers log-likelihood: -811.5279
Now, you can export this information into a data.frame using:
(parents_haplot <- parents_haplotypes(CHR3_final))
#> group mk.number mk.names dist P1_1 P1_2 P2_1 P2_2
#> 1 Group - 1 27 M27 0.00000 b o a a
#> 2 Group - 1 16 M16 11.75794 a a b o
#> 3 Group - 1 20 M20 22.93773 a b c d
#> 4 Group - 1 4 M4 35.70751 a o o b
#> 5 Group - 1 21 M21 49.40653 o o a b
#> 6 Group - 1 23 M23 55.24437 a o a o
#> 7 Group - 1 48 SNP21 67.49797 a b a a
#> 8 Group - 1 9 M9 74.02256 a b a a
#> 9 Group - 1 46 SNP18 80.43320 a b a a
#> 10 Group - 1 45 SNP17 95.40128 a b a a
#> 11 Group - 1 47 SNP20 119.95318 a b a a
#> 12 Group - 1 50 SNP23 158.42668 a b b a
#> 13 Group - 1 24 M24 163.43242 a b b a
#> 14 Group - 1 49 SNP22 166.72017 a b b a
#> 15 Group - 1 52 SNP25 171.61824 a b b a
#> 16 Group - 1 51 SNP24 182.55626 a b b a
The data.frame contains: group ID (group), marker number (mk.number) and names (mk.names), position in centimorgan (dist) and parents haplotypes (P1_1, P1_2, P2_1, P2_2).
You can also obtain a data.frame with a list of sequences and personalize the group names:
parents_haplotypes(CHR2_final,CHR3_final, group_names=c("CHR2","CHR3"))
#> group mk.number mk.names dist P1_1 P1_2 P2_1 P2_2
#> 1 CHR2 43 SNP14 0.00000 a b a a
#> 2 CHR2 22 M22 13.50513 a b a a
#> 3 CHR2 7 M7 23.08421 a a a b
#> 4 CHR2 18 M18 64.51337 a b c d
#> 5 CHR2 8 M8 70.19889 b a b a
#> 6 CHR2 13 M13 72.86386 a o a o
#> 7 CHR2 44 SNP16 78.74245 b a b a
#> 8 CHR3 27 M27 0.00000 b o a a
#> 9 CHR3 16 M16 11.75794 a a b o
#> 10 CHR3 20 M20 22.93773 a b c d
#> 11 CHR3 4 M4 35.70751 a o o b
#> 12 CHR3 21 M21 49.40653 o o a b
#> 13 CHR3 23 M23 55.24437 a o a o
#> 14 CHR3 48 SNP21 67.49797 a b a a
#> 15 CHR3 9 M9 74.02256 a b a a
#> 16 CHR3 46 SNP18 80.43320 a b a a
#> 17 CHR3 45 SNP17 95.40128 a b a a
#> 18 CHR3 47 SNP20 119.95318 a b a a
#> 19 CHR3 50 SNP23 158.42668 a b b a
#> 20 CHR3 24 M24 163.43242 a b b a
#> 21 CHR3 49 SNP22 166.72017 a b b a
#> 22 CHR3 52 SNP25 171.61824 a b b a
#> 23 CHR3 51 SNP24 182.55626 a b b a
Function progeny_haplotypes
generates a data.frame with
progeny phased haplotypes estimated by OneMap
HMM. For
progeny, the HMM results in probabilities for each possible genotype,
then the generated data.frame contains all possible genotypes. If
most_likely = TRUE
, the most likely genotype receives 1 and
the rest 0 (if there are two most likely both receive 0.5), if
most_likely = FALSE
genotypes probabilities will be
according to the HMM results. You can choose which individual to be
evaluated in ind
. The data.frame is composed by the
information: individual (ind) and group (grp) ID, position in
centimorgan (pos), progeny homologs (homologs), and from each parent the
allele came (parents).
(progeny_haplot <- progeny_haplotypes(CHR2_final, most_likely = TRUE, ind = c(1,2), group_names = "CHR2_final"))
#> ind marker grp pos prob parents parents.homologs allele
#> 1 IND1 SNP14 CHR2_final 0.00000 0 P1 H1 P1_H1
#> 2 IND1 M22 CHR2_final 13.50513 0 P1 H1 P1_H1
#> 3 IND1 M7 CHR2_final 23.08421 0 P1 H1 P1_H1
#> 4 IND1 M18 CHR2_final 64.51337 1 P1 H1 P1_H1
#> 5 IND1 M8 CHR2_final 70.19889 1 P1 H1 P1_H1
#> 6 IND1 M13 CHR2_final 72.86386 1 P1 H1 P1_H1
#> 7 IND1 SNP16 CHR2_final 78.74245 1 P1 H1 P1_H1
#> 8 IND2 SNP14 CHR2_final 0.00000 1 P1 H1 P1_H1
#> 9 IND2 M22 CHR2_final 13.50513 1 P1 H1 P1_H1
#> 10 IND2 M7 CHR2_final 23.08421 1 P1 H1 P1_H1
#> 11 IND2 M18 CHR2_final 64.51337 1 P1 H1 P1_H1
#> 12 IND2 M8 CHR2_final 70.19889 1 P1 H1 P1_H1
#> 13 IND2 M13 CHR2_final 72.86386 1 P1 H1 P1_H1
#> 14 IND2 SNP16 CHR2_final 78.74245 1 P1 H1 P1_H1
#> 15 IND1 SNP14 CHR2_final 0.00000 1 P1 H2 P1_H2
#> 16 IND1 M22 CHR2_final 13.50513 1 P1 H2 P1_H2
#> 17 IND1 M7 CHR2_final 23.08421 1 P1 H2 P1_H2
#> 18 IND1 M18 CHR2_final 64.51337 0 P1 H2 P1_H2
#> 19 IND1 M8 CHR2_final 70.19889 0 P1 H2 P1_H2
#> 20 IND1 M13 CHR2_final 72.86386 0 P1 H2 P1_H2
#> 21 IND1 SNP16 CHR2_final 78.74245 0 P1 H2 P1_H2
#> 22 IND2 SNP14 CHR2_final 0.00000 0 P1 H2 P1_H2
#> 23 IND2 M22 CHR2_final 13.50513 0 P1 H2 P1_H2
#> 24 IND2 M7 CHR2_final 23.08421 0 P1 H2 P1_H2
#> 25 IND2 M18 CHR2_final 64.51337 0 P1 H2 P1_H2
#> 26 IND2 M8 CHR2_final 70.19889 0 P1 H2 P1_H2
#> 27 IND2 M13 CHR2_final 72.86386 0 P1 H2 P1_H2
#> 28 IND2 SNP16 CHR2_final 78.74245 0 P1 H2 P1_H2
#> 29 IND1 SNP14 CHR2_final 0.00000 0 P2 H1 P2_H1
#> 30 IND1 M22 CHR2_final 13.50513 0 P2 H1 P2_H1
#> 31 IND1 M7 CHR2_final 23.08421 0 P2 H1 P2_H1
#> 32 IND1 M18 CHR2_final 64.51337 0 P2 H1 P2_H1
#> 33 IND1 M8 CHR2_final 70.19889 0 P2 H1 P2_H1
#> 34 IND1 M13 CHR2_final 72.86386 0 P2 H1 P2_H1
#> 35 IND1 SNP16 CHR2_final 78.74245 0 P2 H1 P2_H1
#> 36 IND2 SNP14 CHR2_final 0.00000 0 P2 H1 P2_H1
#> 37 IND2 M22 CHR2_final 13.50513 0 P2 H1 P2_H1
#> 38 IND2 M7 CHR2_final 23.08421 0 P2 H1 P2_H1
#> 39 IND2 M18 CHR2_final 64.51337 0 P2 H1 P2_H1
#> 40 IND2 M8 CHR2_final 70.19889 0 P2 H1 P2_H1
#> 41 IND2 M13 CHR2_final 72.86386 0 P2 H1 P2_H1
#> 42 IND2 SNP16 CHR2_final 78.74245 0 P2 H1 P2_H1
#> 43 IND1 SNP14 CHR2_final 0.00000 1 P2 H2 P2_H2
#> 44 IND1 M22 CHR2_final 13.50513 1 P2 H2 P2_H2
#> 45 IND1 M7 CHR2_final 23.08421 1 P2 H2 P2_H2
#> 46 IND1 M18 CHR2_final 64.51337 1 P2 H2 P2_H2
#> 47 IND1 M8 CHR2_final 70.19889 1 P2 H2 P2_H2
#> 48 IND1 M13 CHR2_final 72.86386 1 P2 H2 P2_H2
#> 49 IND1 SNP16 CHR2_final 78.74245 1 P2 H2 P2_H2
#> 50 IND2 SNP14 CHR2_final 0.00000 1 P2 H2 P2_H2
#> 51 IND2 M22 CHR2_final 13.50513 1 P2 H2 P2_H2
#> 52 IND2 M7 CHR2_final 23.08421 1 P2 H2 P2_H2
#> 53 IND2 M18 CHR2_final 64.51337 1 P2 H2 P2_H2
#> 54 IND2 M8 CHR2_final 70.19889 1 P2 H2 P2_H2
#> 55 IND2 M13 CHR2_final 72.86386 1 P2 H2 P2_H2
#> 56 IND2 SNP16 CHR2_final 78.74245 1 P2 H2 P2_H2
You can also have a view of progeny estimated haplotypes using
plot
. It shows which markers came from each parent’s
homologs. position
argument defines if haplotypes will be
plotted by homologs (stack
) or alleles
(split
). split
option is a good way to view
the likelihoods of each allele.
VIEWpoly
OneMap
output can now be visualized on VIEWpoly
,
an interactive app to display results from linkage analysis.
viewpoly.obj <- export_viewpoly(seqs.list = list(CHR1_final,CHR2_final, CHR3_final))
save(viewpoly.obj, file = "onemap_viewpoly_map.RData")
Check VIEWpoly
tutorial for further information on how to upload this data file to
the app and how to visualized the generated graphics.
QTLpoly
You can use your built map to map QTL using QTLpoly
:
# Only one group
genoprob <- export_mappoly_genoprob(CHR1_final)
str(genoprob)
# All groups
groups_list <- list(CHR1_final,CHR2_final, CHR3_final)
genoprobAll <- lapply(groups_list, export_mappoly_genoprob)
# Read in QTLpoly (check its tutorial for further information about pheno4x object format)
library(qtlpoly)
data = read_data(ploidy = 2, geno.prob = genoprobAll, pheno = pheno4x, step = 1)
sessionInfo()
#> R version 4.4.2 (2024-10-31)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.1 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] knitr_1.49 stringr_1.5.1 vcfR_1.15.0 onemap_3.2.0 rmarkdown_2.29
#>
#> loaded via a namespace (and not attached):
#> [1] polynom_1.4-1 gridExtra_2.3 permute_0.9-7
#> [4] rlang_1.1.4 magrittr_2.0.3 rebus.base_0.0-3
#> [7] e1071_1.7-16 compiler_4.4.2 mgcv_1.9-1
#> [10] gdata_3.0.1 reshape2_1.4.4 vctrs_0.6.5
#> [13] memuse_4.2-3 pkgconfig_2.0.3 shape_1.4.6.1
#> [16] fastmap_1.2.0 backports_1.5.0 labeling_0.4.3
#> [19] rebus_0.1-3 nloptr_2.1.1 purrr_1.0.2
#> [22] xfun_0.50 glmnet_4.1-8 jomo_2.7-6
#> [25] cachem_1.1.0 jsonlite_1.8.9 pan_1.9
#> [28] broom_1.0.7 parallel_4.4.2 cluster_2.1.8
#> [31] R6_2.5.1 bslib_0.8.0 stringi_1.8.4
#> [34] RColorBrewer_1.1-3 smacof_2.1-7 car_3.1-3
#> [37] boot_1.3-31 rpart_4.1.24 jquerylib_0.1.4
#> [40] Rcpp_1.0.13-1 iterators_1.0.14 base64enc_0.1-3
#> [43] weights_1.0.4 Matrix_1.7-1 nnls_1.6
#> [46] splines_4.4.2 nnet_7.3-20 tidyselect_1.2.1
#> [49] rstudioapi_0.17.1 abind_1.4-8 yaml_2.3.10
#> [52] viridis_0.6.5 vegan_2.6-8 doParallel_1.0.17
#> [55] codetools_0.2-20 plyr_1.8.9 rebus.datetimes_0.0-2
#> [58] lattice_0.22-6 tibble_3.2.1 withr_3.0.2
#> [61] evaluate_1.0.3 pinfsc50_1.3.0 foreign_0.8-87
#> [64] survival_3.8-3 proxy_0.4-27 rebus.numbers_0.0-1
#> [67] pillar_1.10.1 ggpubr_0.6.0 carData_3.0-5
#> [70] mice_3.17.0 checkmate_2.3.2 foreach_1.5.2
#> [73] ellipse_0.5.0 plotly_4.10.4 generics_0.1.3
#> [76] ggplot2_3.5.1 munsell_0.5.1 scales_1.3.0
#> [79] minqa_1.2.8 gtools_3.9.5 princurve_2.1.6
#> [82] class_7.3-23 glue_1.8.0 lazyeval_0.2.2
#> [85] Hmisc_5.2-2 maketools_1.3.1 tools_4.4.2
#> [88] dendextend_1.19.0 sys_3.4.3 data.table_1.16.4
#> [91] lme4_1.1-35.5 ggsignif_0.6.4 buildtools_1.0.0
#> [94] grid_4.4.2 plotrix_3.8-4 ape_5.8-1
#> [97] tidyr_1.3.1 colorspace_2.1-1 nlme_3.1-166
#> [100] rebus.unicode_0.0-2 htmlTable_2.4.3 Formula_1.2-5
#> [103] cli_3.6.3 viridisLite_0.4.2 dplyr_1.1.4
#> [106] gtable_0.3.6 rstatix_0.7.2 sass_0.4.9
#> [109] digest_0.6.37 wordcloud_2.6 farver_2.1.2
#> [112] htmlwidgets_1.6.4 htmltools_0.5.8.1 lifecycle_1.0.4
#> [115] httr_1.4.7 mitml_0.4-5 MASS_7.3-64
Buetow, K. H., Chakravarti, A. Multipoint gene mapping using seriation. I. General methods. American Journal of Human Genetics 41, 180-188, 1987.
Doerge, R.W. Constructing genetic maps by rapid chain delineation. Journal of Agricultural Genomics 2, 1996.
Mollinari, M., Margarido, G. R. A., Vencovsky, R. and Garcia, A. A. F. Evaluation of algorithms used to order markers on genetics maps. Heredity 103, 494-502, 2009.
Schiffthaler, B., Bernhardsson, C., Ingvarsson, P. K., & Street, N. R. BatchMap: A parallel implementation of the OneMap R package for fast computation of F1 linkage maps in outcrossing species. PLoS ONE, 12(12), 1–12, 2017.
Tan, Y., Fu, Y. A novel method for estimating linkage maps. Genetics 173, 2383-2390, 2006.
Taniguti, C. H.; Taniguti, L. M.; Amadeu, R. R.; Lau, J.; de Siqueira Gesteira, G.; Oliveira, T. de P.; Ferreira, G. C.; Pereira, G. da S.; Byrne, D.; Mollinari, M.; Riera-Lizarazu, O.; Garcia, A. A. F. Developing best practices for genotyping-by-sequencing analysis in the construction of linkage maps. GigaScience, 12, giad092. https://doi.org/10.1093/gigascience/giad092
Van Os H, Stam P, Visser R.G.F., Van Eck H.J. RECORD: a novel method for ordering loci on a genetic linkage map. Theor Appl Genet 112, 30-40, 2005.
Voorrips, R.E. MapChart: software for the graphical presentation of linkage maps and QTLs. Journal of Heredity 93, 77-78, 2002.
Voorrips, R. E., Maliepaard, C. A. The simulation of meiosis in diploid and tetraploid organisms using various genetic models. BMC Bioinformatics, 13(1), 248, 2012.
Wu, R., Ma, C.X., Painter, I. and Zeng, Z.-B. Simultaneous maximum likelihood estimation of linkage and linkage phases in outcrossing species. Theoretical Population Biology 61, 349-363, 2002a.
Wu, R., Ma, C.-X., Wu, S. S. and Zeng, Z.-B. Linkage mapping of sex-specific differences. Genetical Research 79, 85-96, 2002b.
OneMap
datasetsVIEWpoly
QTLpoly
diaQTL