--- The authors selected markers from assembly 1 (Wm82.gnm1) and reported those coordinates. In preparing the markers for the Data Store, we incorrectly assumed that the markers came from assembly 2 (Wm82.gnm2); To fix the problem, find the marker sequences from assembly 1 and find their locations in assembly 2. Flanking sequence around markers was extracted from Wm82.gnm1 using bedtools getfasta; then that sequence was used to identify marker position in Wm82.gnm2 Also, the SLAF-Seq (aka SLAFSeq) markers have been used in at least two publications: Zhao_Teng_2017 and Zhao_Han_2015 (doi: 10.1111/tpj.12810). Rename marker filenames to refer to the respective papers: glyma.Wm82.gnm1.mrk.Zhao_Teng_2017.gff3 and glyma.Wm82.gnm1.mrk.Zhao_Han_2015.gff3 From the publication (Zhao et al. BMC Genomics (2017) 18:462 DOI 10.1186/s12864-017-3843-y ), generate a GFF representing the 13 SLAFSeq markers reported in the paper: Gm07 rs35898587 Gm07 rs36423980 Gm08 rs7631207 Gm08 rs7640250 Gm08 rs7671170 Gm08 rs7662003 Gm08 rs7664479 Gm08 rs7622492 Gm08 rs7661660 Gm08 rs16268025 Gm14 rs3853672 Gm15 rs38522986 Gm18 rs46625879 From the GFF, derive the BED file. The GFF is in one-based coords and BED is zero-based coords. Use bedops to do the right thing: conda create -n bedops conda install -n bedops -c conda-forge -c bioconda bedops conda activate bedops gff2bed < glyma.Wm82.gnm1.mrk.SLAFseq.gff3 > glyma.Wm82.gnm1.mrk.SLAFseq.bed cat glyma.Wm82.gnm1.mrk.SLAFseq.bed | awk -v OFS="\t" '{print $1, $2-1000, $3+1000, $10}' | perl -pe 's/ID=([^;]+);.+/$1/; s/glyma.Wm82.gnm1.rs/rs/' > glyma.Wm82.gnm1.mrk.SLAFseq_1kflank.bed Activate bedtools and extract the sequences from gnm1 conda activate bedtools bedtools getfasta -name -fi glyma.Wm82.gnm1.FCtY.genome_main.fna \ -bed glyma.Wm82.gnm1.mrk.SLAFseq_1kflank.bed \ -fo glyma.Wm82.gnm1.mrk.SLAFseq_1kflank.fna Simplify the marker names, stripping out the coordinate information added by bedtools perl -pi -e 's/::\S+//' glyma.Wm82.gnm1.mrk.SLAFseq_1kflank.fna Activate BLAST and search the genome sequence - first in Wm82.gnm1 to confirm sequence locations The locations should be same as BED. conda activate blast mkdir blastdb blastout makeblastdb -in glyma.Wm82.gnm1.FCtY.genome_main.fna \ -dbtype nucl -hash_index -out blastdb/glyma.Wm82.gnm1 blastn -db blastdb/glyma.Wm82.gnm1 \ -query glyma.Wm82.gnm1.mrk.SLAFseq_1kflank.fna \ -evalue 1e-10 -outfmt 6 -perc_identity 99 \ -out blastout/glyma.Wm82.gnm1.mrk.SLAFseq.x.glyma.Wm82.gnm1.bln & Confirm that the query sequences find the correct coordinates on Wm82.gnm1: grep -v "^#" blastout/glyma.Wm82.gnm1.mrk.SLAFseq.x.glyma.Wm82.gnm1.bln | top_line.awk Search glyma.Wm82.gnm2.DTC4 makeblastdb -in glyma.Wm82.gnm2.DTC4.genome_main.fna \ -dbtype nucl -hash_index -out blastdb/glyma.Wm82.gnm2 blastn -db blastdb/glyma.Wm82.gnm2 \ -query glyma.Wm82.gnm1.mrk.SLAFseq_1kflank.fna \ -evalue 1e-10 -outfmt 6 -perc_identity 99 \ -out blastout/glyma.Wm82.gnm1.mrk.SLAFseq.x.glyma.Wm82.gnm2.bln & Find locations in Wm82.gnm2 and print the marker coordinates (at hit start position + 1000) cat blastout/glyma.Wm82.gnm1.mrk.SLAFseq.x.glyma.Wm82.gnm2.bln | awk -v OFS="\t" '$3>=99 && $4>=1900 && $11==0 {print $2, $9+1001, $9+1001, $1}' | cat > glyma.Wm82.gnm2.mrk.SLAFseq.bed Reshape bed file into gff cat glyma.Wm82.gnm2.mrk.SLAFseq.bed | awk -v OFS="\t" '{print $1, "Zhao_Teng_2017", "genetic_marker", $2, $3, ".", ".", ".", "ID=" $1 "." $4 ";Name=" $4}' | cat > glyma.Wm82.gnm2.mrk.SLAFseq.gff3 RESULT: glyma.Wm82.gnm2.mrk.SLAFseq.gff3 contains the marker locations on Wm82.gnm2 . Check the work by looking at the sequences in gnm2 and comparing with those from gnm1. Activate bedtools and extract the sequences from gnm2 conda activate bedtools cat glyma.Wm82.gnm2.mrk.SLAFseq.bed | awk -v OFS="\t" '{print $1, $2-1000, $3+1000, $4}' > glyma.Wm82.gnm2.mrk.SLAFseq_1kflank.bed bedtools getfasta -name -fi glyma.Wm82.gnm2.DTC4.genome_main.fna \ -bed glyma.Wm82.gnm2.mrk.SLAFseq_1kflank.bed \ -fo glyma.Wm82.gnm2.mrk.SLAFseq_1kflank.fna