I’m building a databank for Cyanobacteria specific information. Like a genbank but for Cyanobacteria. I’m not even a biologist so I don’t know what the impact will be, but my girlfriend is and I saw her struggle with some things and I decided to jump on it.
Right now I’m just building the basic forms and such, but I plan on implementing fasta file parsing and an algorithm to locate the conserved regions
I mean, actual genbank has cyanobacteria in it[0]. Not to be discouraging, but you're just duplicating a lot of work there. And I assume that's where you'd have to go get your sequences from anyway. So are you actually just building the file parsing and analysis tools?
My project would focus on the ITS region motifs. Which I think are not in genbank. Genbank has the whole sequence but the motifs are not identified.
The way the bio people I talked to explained it to me, when they want to compare ITS structures they need to find the motifs themselves and compare it that way. Is this incorrect?
Right now I’m just building the basic forms and such, but I plan on implementing fasta file parsing and an algorithm to locate the conserved regions