--- file_transformation: - # prefix IDs - CDS - perl -pe 's/>/>glyso.W05.gnm1.ann1./' Gsoja.W05.gene.cds.fa > glyso.W05.gnm1.ann1.T47J.cds.fna - # prefix IDs - CDS - perl -pe 's/>/>glyso.W05.gnm1.ann1./' Gsoja.W05.gene.transcript.fa > glyso.W05.gnm1.ann1.T47J.transcript.fna - # Prefix functional annotations - perl -pe 's/^Glyso/glyso.W05.gnm1.ann1.Glyso/' Gsoja.W05.gene.function.InterProScan.tsv.txt > glyso.W05.gnm1.ann1.T47J.info_annot_InterProScan.txt - # Prefix IDs - protein - perl -pe 's/>/>glyso.W05.gnm1.ann1./' Gsoja.W05.gene.protein.fa > glyso.W05.gnm1.ann1.T47J.protein.faa - # Prefix reference IDs and gene IDs in GFF - perl -pe 's/^(\S+)/glyso.W05.gnm1.$1/' Gsoja.W05.gene.gff > glyso.W05.gnm1.ann1.T47J.gene_models_main.gff3 - perl -pi -e 's/=Glysoja/=glyso.W05.gnm1.ann1.Glysoja/g' glyso.W05.gnm1.ann1.T47J.gene_models_main.gff3 - # Rename name-mapping files - rename 's/NCBI/glyso.W05.gnm1.ann1.T47J.info.NCBI/' NCBI* changes: - 2019-05-14 Initial repository created - 2019-05-10 Added AHRD descriptors; moved collection to public - 2019-06-12 Tweak GFF for compliance; s/Range:/range=/ - 2019-06-17 Tweak GFF for compliance; s/Name=ID=/Name=/ - 2019-08-28 Tweak GFF for compliance (one gene ID has too few dot-separated parts); s/ID=cds.Glysoja.10G027808/glyso.W05.gnm1.ann1.Glysoja.10G027808/ - 2020-04-16 Re-sorted glyso.W05.gnm1.ann1.T47J.gene_models_main.gff3 and removed problematic gene model 10G027808 - 2020-09-22 added gene family assignments - 2021-05-25 updated gene family assignments to use score rather than e-value - 2023-02-13 adf: moved "bad gene" Glysoja.10G027808 out of main gfa and into its own separate file as had been done with its gff records (unclear to me why it has been found wanting) - 2023-08-31 sc: add the following lines to the bed file: glyso.W05.gnm1.Chr10 46840243 46855327 glyso.W05.gnm1.ann1.Glysoja.10G027808.1 0 - glyso.W05.gnm1.Chr10 46840243 46855175 glyso.W05.gnm1.ann1.Glysoja.10G027808.2 0 -