--- - 2022-11-21: Create this gnm7 repository from gnm7 and comparisons to the gnm7 assembly file_transformation: # Name hashes between gnm6 and gnm7: # From the BLAST output, extract the information to be hashed into the VCF file: seqid, position, markername mkdir new_vcf_coords cat blastout_top/vigra.VC1973A.gnm6.variants_filt_1kseq.x.vigra.VC1973A.gnm7.SB53.bln_top | awk -v OFS="\t" '$4>=990 {printf("%s\t%d\t%s\n"), $2, $9+(($10-$9)/2), $1 }' | cat > new_vcf_coords/vigra.VC1973A.gnm6.div.Sandhu_Singh_2020.var_filt_posn_map.tsv cat blastout_top/vigra.VC1973A.gnm6.variants_main_1kseq.x.vigra.VC1973A.gnm7.SB53.bln_top | awk -v OFS="\t" '$4>=990 {printf("%s\t%d\t%s\n"), $2, $9+(($10-$9)/2), $1 }' | cat > new_vcf_coords/vigra.VC1973A.gnm6.div.Sandhu_Singh_2020.var_main_posn_map.tsv # looks like # vigra.VC1973A.gnm7.chr4 33603402 1_16010 # vigra.VC1973A.gnm7.chr4 33603452 1_16060 # vigra.VC1973A.gnm7.chr4 33650083 1_62820 # Replace the seqid and position fields (1 and 2), based on ID in field 3. mkdir vigra.VC1973A.gnm7.diversity zcat vigra.VC1973A.gnm6.diversity/vigra.VC1973A.gnm6.div.Sandhu_Singh_2020.variants_filt.vcf.gz | hash_into_vcf.pl -hash new_vcf_coords/vigra.VC1973A.gnm6.div.Sandhu_Singh_2020.var_filt_posn_map.tsv \ -swap -out vigra.VC1973A.gnm7.diversity/vigra.VC1973A.gnm7.div.Sandhu_Singh_2020.variants_filt.vcf zcat vigra.VC1973A.gnm6.diversity/vigra.VC1973A.gnm6.div.Sandhu_Singh_2020.variants_main.vcf.gz | hash_into_vcf.pl -hash new_vcf_coords/vigra.VC1973A.gnm6.div.Sandhu_Singh_2020.var_main_posn_map.tsv \ -swap -out vigra.VC1973A.gnm7.diversity/vigra.VC1973A.gnm7.div.Sandhu_Singh_2020.variants_main.vcf bcftools sort vigra.VC1973A.gnm7.div.Sandhu_Singh_2020.variants_filt.vcf_unsort > vigra.VC1973A.gnm6.div.Sandhu_Singh_2020.variants_filt.vcf & bcftools sort vigra.VC1973A.gnm7.div.Sandhu_Singh_2020.variants_main.vcf_unsort > vigra.VC1973A.gnm6.div.Sandhu_Singh_2020.variants_main.vcf & for file in vigra*vcf; do bgzip $file & done for file in vigra*vcf.gz; do tabix $file & done changes: - 2021-09-16: Added full variants.vcf file, as vigra.VC1973A.gnm7.div.Sandhu_Singh_2020.variants_main.vcf.gz. For both filtered and unfiltered vcf, added marker names, and changed contig/chrom names using hash file vigra.VC1973A.gnm7.3nL8.scaff_name_hash.tsv, derived from GenBank JJMO01 contigs file.