--- file_transformation: - #add gene feature to the gff3 with ID and Name attributes in the 9th column; for mRNA feature, add Parent, Name attributes and .1 to ID attribute - cat IGA1003.gene.gff | perl -ne 'if ( /^#/ ) {print;} else{ s/_rc//; s/;$//; s/chr(\d\t)/chr0$1/; s/;Source=.+//; chomp;$line = $_; @rows = split ("\t" , $line); if ($rows[2] =~/mRNA/) {$new_line = $1 if ($rows[8] =~ /ID=(.+)/); $gene_line = $rows[0]."\t".$rows[1]."\t"."gene"."\t".$rows[3]."\t".$rows[4]."\t".$rows[5]."\t".$rows[6]."\t".$rows[7]."\t".$rows[8].";Name=".$new_line; $rows[8] = $rows[8].".1".";Parent=".$new_line.";Name=".$new_line.".1"; $mrna_line =join ("\t", @rows); print "$gene_line\n$mrna_line\n";}else { $other_line =join ("\t", @rows); $other_new = $other_line.".1"; print "$other_new\n";}}' >IGA1003.gene_with_gene_feature.gff3 - #Use Connor's bionorm program to add prefix and sort the gff3. - #Compress and index files - bgzip glyma.F_IGA1003.gnm1.ann1.G61B.cds.fna glyma.F_IGA1003.gnm1.ann1.G61B.cds.fna.gz - bgzip glyma.F_IGA1003.gnm1.ann1.G61B.protein.faa glyma.F_IGA1003.gnm1.ann1.G61B.protein.faa.gz - bgzip glyma.F_IGA1003.gnm1.ann1.G61B.gene_models_main.gff3 glyma.F_IGA1003.gnm1.ann1.G61B.gene_models_main.gff3.gz - tabix -p gff glyma.F_IGA1003.gnm1.ann1.G61B.gene_models_main.gff3.gz glyma.F_IGA1003.gnm1.ann1.G61B.gene_models_main.gff3.gz.tbi changes: - 2020-11-04 Initial repository creation - 2021-05-13 Add README, MANIFEST and make repository public - 2021-09-13 Add AHRD annotation to the main gene model gff - 2021-11-16 adf: add IDs to various features not strictly needing them for the sake of intermine loader; fixes https://github.com/legumeinfo/datastore-issues/issues/58 - 2021-11-24 sbc: Change the following "chr" molecules to "unanchor": glyso.F_IGA1003.gnm1.chr102 --> glyso.F_IGA1003.gnm1.unanchor102 glyso.F_IGA1003.gnm1.chr115 --> glyso.F_IGA1003.gnm1.unanchor115 glyso.F_IGA1003.gnm1.chr131 --> glyso.F_IGA1003.gnm1.unanchor131 glyso.F_IGA1003.gnm1.chr189 --> glyso.F_IGA1003.gnm1.unanchor189 glyso.F_IGA1003.gnm1.chr208 --> glyso.F_IGA1003.gnm1.unanchor208 For gene SoyGsojaF_11G000200, change gene feature from unanchor105 to chr11 For the following 35 genes, remove models and sequences, as these spanned molecules (chr-unanchor or unancor-unanchor): SoyGsojaF_12R050226 SoyGsojaF_14R021151 SoyGsojaF_19R009129 SoyGsojaF_UR057537 SoyGsojaF_UR057771 SoyGsojaF_UR057820 SoyGsojaF_UR057823 SoyGsojaF_UR057833 SoyGsojaF_UR057848 SoyGsojaF_UR057864 SoyGsojaF_UR057877 SoyGsojaF_UR057881 SoyGsojaF_UR057885 SoyGsojaF_UR057889 SoyGsojaF_UR057890 SoyGsojaF_UR057892 SoyGsojaF_UR057894 SoyGsojaF_UR057896 SoyGsojaF_UR057897 SoyGsojaF_UR057899 SoyGsojaF_UR057900 SoyGsojaF_UR057911 SoyGsojaF_UR057913 SoyGsojaF_UR057915 SoyGsojaF_UR057927 SoyGsojaF_UR057931 SoyGsojaF_UR057935 SoyGsojaF_UR057939 SoyGsojaF_UR057942 SoyGsojaF_UR057945 SoyGsojaF_UR057946 SoyGsojaF_UR057949 SoyGsojaF_UR057951 SoyGsojaF_UR057960 SoyGsojaF_UR057961 - 2023-02-27 adf: resort the gene_models_main.gff file using datastore-specifications/scripts/special_or_deprecated/sort_gff.pl as part of https://github.com/legumeinfo/datastore-issues/issues/146 - 2023-06-08 adf: add AHRD with GO/IPR in descriptors - 2023-09-01 sc: remove glyso.F_IGA1003.gnm1.ann1.SoyGsojaF_UR057932.1 from sequence files, since it is not in the gff