--- file_transformation: changes: Begin each change note with a date string, e.g. 2018-03-23 --> - 2018-05-08 Changed gff column 1 IDs from CcLG to cajca.ICPL87119.gnm1.Cc - 2018-07-12 2018-07-11 Changed fasta header in all *.fna and *.faa files (perl -pi -e 's/>Aradu/>aradu.V14167.gnm1.ann1/') - 2018-09-15 Changed fasta header in all *.fna and *.faa files to include .1 suffixes and amend the above to match the gff3 (e.g. C.cajan had been removed) - 2019-08-28 Tweak GFF for compliance - changing "nan" in 6th field to "." - 2020-09-22 added gene family assignments file - 2021-07-26 adf: added ID attributes via add_IDs_to_gff_features.pl (fixes https://github.com/legumeinfo/mine-issues/issues/38) - 2021-07-27 adf: gffread creation of transcript.fna; fixed naming of protein.faa (was protein_main.faa) - 2021-07-28 adf: added cajca.ICPL87119.gnm1.ann1.Y27M.pathway.tsv derived from Plant Reactome - 2022-01-20 scannon: renamed sequence files to bring them in line with Data Store patterns: rm cajca.ICPL87119.gnm1.ann1.Y27M.mrna.fna.gz because it is the same as cajca.ICPL87119.gnm1.ann1.Y27M.cds.fna.gz and seems to be CDS rather than mrna sequence mv cajca.ICPL87119.gnm1.ann1.Y27M.protein.faa.gz cajca.ICPL87119.gnm1.ann1.Y27M.protein_with_TEs.faa.gz mv cajca.ICPL87119.gnm1.ann1.Y27M.gene_main.fna.gz cajca.ICPL87119.gnm1.ann1.Y27M.gene_main.fna.gz mv cajca.ICPL87119.gnm1.ann1.Y27M.gene_filterTE.fna.gz cajca.ICPL87119.gnm1.ann1.Y27M.mrna.fna.gz mv cajca.ICPL87119.gnm1.ann1.Y27M.gene_main.fna.gz cajca.ICPL87119.gnm1.ann1.Y27M.mrna_with_TEs.fna.gz mv cajca.ICPL87119.gnm1.ann1.Y27M.protein_filterTE.faa.gz cajca.ICPL87119.gnm1.ann1.Y27M.protein.faa.gz The resulting sequence counts: zgrep -c '>' *.f?a.gz cajca.ICPL87119.gnm1.ann1.Y27M.cds.fna.gz:40071 cajca.ICPL87119.gnm1.ann1.Y27M.mrna_with_TEs.fna.gz:48680 cajca.ICPL87119.gnm1.ann1.Y27M.mrna.fna.gz:40071 cajca.ICPL87119.gnm1.ann1.Y27M.protein_with_TEs.faa.gz:48680 cajca.ICPL87119.gnm1.ann1.Y27M.protein.faa.gz:40071 - 2022-03-05 adf: reran gene family assignment since older one included content no longer found in gene_models_main