--- file_transformation: - cat QAOA01.1.fsa_nt | perl -pe 's/^>/>cerca.ISC453364.gnm1./' > cerca.ISC453364.gnm1.B05Z.fna changes: - 2019-11-27: Initial repository creation - 2022-01-20: SCannon - rename files 's/cerca.ISC453364.gnm1.B05Z/cerca.ISC453364.gnm1.ann1.B05Z/' - 2022-01-22: SCannon - Duplicate cds.fna file as mrna.fna file, as these would be the same, derived from this GFF. - 2022-11-08: SHokin - rename files s/B05Z/HZJM/ to match collection name - 2022-11-11: adf - add gene records, rename mRNAs/CDS/Proteins to have .1 isoform suffixes: mv cerca.ISC453364.gnm1.ann1.HZJM.gene_models_main.gff3.gz cerca.ISC453364.gnm1.ann1.HZJM.wo_gene_models_main.gff3.gz; zcat cerca.ISC453364.gnm1.ann1.HZJM.wo_gene_models_main.gff3.gz | awk 'BEGIN {FS=OFS="\t"} $3 == "mRNA" {$3 = "gene"; print $0; split($0,a,"ID="); gene=a[2]; $3 = "mRNA"; print $0".1;Parent="gene} $3 != "mRNA" {print $0".1"}' | ~/datastore-specifications/scripts/add_IDs_to_gff_features.pl | bgzip -l9 -c > cerca.ISC453364.gnm1.ann1.HZJM.gene_models_main.gff3.gz (also handled fasta and bed files); all leading up to: added cerca.ISC453364.gnm1.B05Z.legfed_v1_0.M65K.gfa.tsv.gz which fixes part of https://github.com/legumeinfo/datastore-issues/issues/134 - 2023-12-21 sbc Remove cerca.ISC453364.gnm1.ann1.HZJM.wo_gene_models_main.gff3.gz, which was a precursor to cerca.ISC453364.gnm1.ann1.HZJM.gene_models_main.gff3.gz The former (wo_gene_models_main) lacked gene records (had only mRNA and CDS records). - 2024-01-09 sbc - Add Name attribute to gene records and recalculate bed file to add corrected name attribute in 7th column