--- file_transformation: - Prefixed CDS fasta file : - cat SETOT.complete.CDS.fa |perl -pe 's/>ISG\|(\S+)\|len.+/>sento.Myeongyun.gnm1.ann1.$1/' >sento.Myeongyun.gnm1.ann1.cds.fna - Prefixed protein fasta file - cat SETOT.complete.protein.fa |perl -pe 's/>/>sento.Myeongyun.gnm1.ann1./' >sento.Myeongyun.gnm1.ann1.protein.faa - Added Name attributes to the gene and mRNA features; and corrected Parent attribute for mRNA, exon and cds; Added prefix to chromosome, ID and Parent -cat SETOT.complete.cds_gap.gff3 |perl -pe 's/^\s+$//' | perl -ne 'chomp; $line = $_; @rows = split("\t", $line); if (($rows[2] !~ /gene/) && ($rows[2] !~ /mRNA/) ){ if ($rows[8] =~ /ID=(\S+);Parent=(\S+)/){$newline= "ID=".$1.";Parent=TR.".$2; $rows[8] = $newline; $new_rows1 = join ("\t", @rows); print "$new_rows1\n";}}elsif ($rows[2] =~ /mRNA/){ if($rows[8] =~ /ID=(\S+);Parent=(\S+)/){$newline1= "ID=".$1.";Name=".$1.";Parent=".$2; $rows[8] = $newline1; $new_rows2 = join ("\t", @rows); print "$new_rows2\n";}}else { if($rows[8] =~ /ID=(\S+);note=(.+)/){$newline2= "ID=".$1.";Name=".$1.";Note=".$2; $rows[8] = $newline2; $new_rows3 = join ("\t", @rows); print "$new_rows3\n";}}' | perl -pe 's/^/sento.Myeongyun.gnm1./; s/ID=/ID=sento.Myeongyun.gnm1./; s/Parent=/Parent=sento.Myeongyun.gnm1./' | sed '1 s/^/##gff-version 3\n/' \ >sento.Myeongyun.gnm1.ann1.gene_model_main.gff3 changes: - 2021-06-07 initial Legume Federation Data Store file preparation - 2021-06-12 Applied prefixes to fasta and GFF files - see file_transformations above. - 2021-06-14 Sorted GFF, i.e. gff3sort.pl --precise sento.Myeongyun.gnm1.ann1.gene_models_main.gff3 - 2023-08-28 Shorten some scaffold names, in which runs of underscores caused problems - 2023-08-31 Remove "TR." from gene IDs: perl -pi -e 's/sento.Myeongyun.gnm1.TR./sento.Myeongyun.gnm1.ann1./' - 2024-01-27 sbc: Add key 5WXB to files