--- file_transformation: # Strip citation and chromosome location from deflines - since locations refer to assembly 1 and will generally not be useful # relative to other assemblies or accessions cat SoyBase_TE_Fasta.txt | perl -pe 's/>name=(\S+) .+ (Class=.+)/>$1 $2/; s/ Chromosome=.+//; s/ Unanchored_scaffold.+//' | perl -pe 's/ Description=$/ Description=none/' > SOY_TEdb.fna # The file SOY_TE_LIB_id60.fna has centroid sequences at 60% identity, as identified by usearch: usearch -cluster_fast SOY_TE_LIB.fna -id 0.60 -centroids SOY_TE_LIB_id60.fna changes: - 2024-02-21 sbc: Initial Data Store collection, ported from https://soybase.org/soytedb/