--- file_transformation: - Prefixed fasta files with glyma.Wm82.gnm2. and removed .p suffix: - perl -pi -e 's/>(\S+).p* />glyma.Wm82.gnm4.ann1.$1 /' *.faa - Prefixed gene IDs in nucleotide fasta: - perl -pi -e 's/>(\S+) />glyma.Wm82.gnm4.ann1.$1 /' *fna - Added chromosome prefix to GFFs: - perl -pi -e 's/^([^#]\S+)/glyma.Wm82.gnm4.$1/' *gff3 - Added gene prefixes to GFFs (also replacing version suffixes): - perl -pi -e 's/.Wm82.a4.v1//g; s/=Glyma\./glyma.Wm82.gnm4.ann1.Glyma./g' *gff3 - Derived gene function information from GFF3: - TAB=$'\t'; - zcat glyma.Wm82.gnm4.ann1.T8TQ.gene_models_main.gff3.gz | - awk -v FS="$TAB" '$3=="gene" {print $9}' | - perl -pe 's/.+Name=//; s/;ancestorIdentifier=[^;]+;/\t/; s/;Dbxref=[^;]+;/\t/; s/Note=//; s/^(\w+\.\w+);/$1\t/; s/%2C/,/ig; s/%3B/;/ig; s/%3D/=/ig' \ - > glyma.Wm82.gnm2.ann1.T8TQ.info_gene_annot.txt changes: - 2018-07-20 Initial preparation for Legume Federation data store. - 2018-08-16 Applied prefixes to fasta and GFF files - see file_transformations above. - 2019-05-28 Added files .info_gene_annot.txt and .info_annot.txt - 2020-09-22 added gene family assignments - 2020-12-20 sorted and bgzip/tabix-ed glyma.Wm82.gnm4.ann1.T8TQ.gene_models_main.gff3.gz - 2021-05-25 updated gene family assignments to use score rather than e-value - 2021-10-15 adf: applied bgzip/faidx to glyma.Wm82.gnm4.ann1.T8TQ.protein_primaryTranscript.faa and glyma.Wm82.gnm4.ann1.T8TQ.protein.faa - 2022-03-05 adf: s/Gm[0-9][0-9]_scaff/scaff/ on gene_models_main files to make consistent with genome_main - 2024-04-11 sbc: update synopsis to include JGI numbering