12.9.11

Interrelating proteins, objects, and data sources.

So ... I've been trying to refine the visualization of informatics repositories for biomolecular data, focused of course on proteins. I'm finding that communicating data models to people is always tricky... And I really want to find a maximally intuitive way of accomplishing this, because its so important. In bioinformatics, the importance of precise models cannot be overstated. Some people might say that the double-helix discovery of Watson and Crick - which revolutionized molecular biology, our understanding of heredity, and ultimately, genomics - was simply an exersize in accurate spatial modeling and thinking


But its not just about bioinformatics. Over at PeerIndex, where we are trying to determine the various influencers of the world, we deal with data on a large scale that comes from twitter, facebook, etc... And we map those data types into objects. These data structures are extremely complex, although they seem simple --- they often are self referencing, deeply nested, and highly redundant. In social networks, just like in bioinformatics, the rapid rate of change makes this particularly problematic at times --- but nevertheless, data types do exist.

I find that UML is, in spite of its shortcomings as a software modelling tool, extremely valuable for informally visualizing complex data models. This is because it can be used to convey cardinality, inheritance, state, and dependencies.

By glomming alot of UML features together, I've cobbled together what I think is an accurate representation of the important higher level relationships between protein sequences, taxonomies, and structures. This is extremely informal, but it adopts UML's shortcuts for conveying separate physical collections, inheritance, etc..., to convey the nature of protein informatics repositories in ( what I believe is ) an intuitve manner :




For function ... it turns out that relationships get alot more complex. In order to deal with this, I'd refer you to the "Syntax for Minimotifs" which we produced a while back.



Ultimately, my goal would be, if possible, to merge these into a single, higher level model of sequence, structure, and function.

No comments:

Post a Comment