--- file_transformation: - # Simplify gene deflines - RNA - perl -pe 's/^>(\S+)\s+.+OriSeqID=(\S+)\s+OriID=(\S+)\s+OriGeneID=(\S+)/>$3 $4 $2 $1/' GWHAAEV00000000.RNA.fasta > glyma.Zh13.gnm1.ann1.8VV3.protein.cds.fna - # Simplify gene deflines - protein - perl -pe 's/^>(\S+)\s+.+OriTrascriptID=(\S+)\s+OriGeneID=(\S+)\s+OriSeqID=(\S+)/>$2 $3 $4 $1/' GWHAAEV00000000.Protein.faa > glyma.Zh13.gnm1.ann1.8VV3.protein.faa - # Get a hash of GWH IDs and DataStore names from assembly - grep '>' ../Zh13.gnm1.8VV3/glyma.Zh13.gnm1.8VV3.genome_main.fna | perl -pe 's/>glyma.Zh13.gnm1.(\S+)\s+(\S+)/$2\t$1/' > hsh.ref_GWH_DS - # Hash DataStore names into GFF - hash_into_gff_refID.pl -gff GWHAAEV00000000.gff -hash hsh.ref_GWH_DS | perl -pe 's/;Accession.+//; s/;Parent_Accession=.+//i' > glyma.Zh13.gnm1.ann1.8VV3.gene_models_main.gff3 changes: - 2019-05-13 Initial repository creation - 2019-06-10 Add AHRD functional descriptors and make repository public - 2019-07-03 Add "glyma." to IDs in CDS files - 2020-04-14 Re-sorted glyma.Zh13.gnm1.ann1.8VV3.gene_models_main.gff3 with gff3sort.pl --precise - 2020-09-22 added gene family assignments - 2021-05-25 updated gene family assignments to use score instead of e-value - 2022-01-21 SCannon - Rename primaryTranscript.fna.gz file to canonical mrna_primary.fna mv glyma.Zh13.gnm1.ann1.8VV3.primaryTranscript.fna.gz glyma.Zh13.gnm1.ann1.8VV3.mrna_primary.fna.gz