More than cells, more than bytes
October 23rd, 2009
In a recent New York Times article, Jimmy Lin from of the University of Maryland is quoted as saying, “Science these days has basically turned into a data-management problem.” If this is true, then those responsible for data management have failed. The last thing scientists should be worrying about is managing data. Mining data, sure, but managing data? While the efforts documented in that story to begin to teach scientists how to grapple with large amounts of data are laudable, they all seem to focus on computer scientists, not biologists or chemists or physicists. There will be few people who can understand the worlds of, for example, biology and computer science deeply. What is needed are those who can understand one of these disciplines deeply and extend into other disciplines as needed. These individuals can act as connections, glue, between disciplines and accelerate research in these areas, which more and more require many domains of expertise. For example, designing DNA sequencing instruments requires deep understanding in fields as diverse as optics, quantum mechanics, chemistry, biology, mechanical engineering, computer science, and computer engineering. No one person can master all these fields, but people are desperately needed to bridge between them. As Chad Fowler writes in the section of his book The Passionate Programmer entitled Coding Don’t Cut It Anymore,
If you want to stay relevant, you’re going to have to dive into the domain of the business you’re in.
In fact, a software person should understand a business domain not only well enough to develop software for it but also to become one of its authorities.
Posted in genomics, IT | 4 Comments »
Tagged with: genomics, informatics, IT
You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.
October 26th, 2009 at 8:51 am
My thoughts exactly when I read that article! If science is data management, get me out! In my limited experience, there are some computer scientists and biologists, especially in the younger generation, who are genuinely intellectually curious about the “other” field, which leads to fruitful collaboration. However, as long as the conventional science education still emphasizes narrow specialization, there need to be additional incentives for programmers and computer scientists to get more deeply involved in these kinds of data-intensive projects. Off the cuff, I would say that part of the problem is, as usual, the grant review process. Proposals from biologists that include large (or even reasonable) sums for collaborators in the computer science department often get accused of being overambitious and/or unfocused, and that Aim is the first to get cut. The end result is a biologist with a massive amount of data, and hopefully enough skill or resources to extract results for his own lab’s publications. But the data remains relatively inaccessible to collaborators or the scientific community at large.
October 26th, 2009 at 9:39 am
Matt,
I agree that funding is a much larger problem than education. In my experience, getting funds to write “data management” types of software/applications is much tougher than getting funds to develop novel algorithms. Also, when you write grants that generate massive amounts of data, reviewers balk at large sums of money for computational resources and analysis.
October 28th, 2009 at 9:57 am
A related challenge is keeping alive the existing groups that do tackle a lot of data management and data integration for scientists. A case in point is the Arabidopsis database, TAIR. It’s arguably one of the best resources of its kind helping a large genomics community mananage, access, mine, integrate and annotate its data. In their recently funded grant their budget will be cut by 25% each year, reaching 25% of its current level by year 4 (See http://arabidopsis.org/doc/about/tair_funding/410).
One of the suggestions to TAIR from NSF was to go out and seek external funding or charge subscriptions. It would be quite a task for TAIR to move from a free, open resource to a paid subscription model, fighting upstream against the torrent of journals and other resources moving in the other direction. If TAIR and organizations like it are not managing the fundamental data and information that scientists rely on, that is another level of data management that will be pushed back to the individual researcher level.
October 28th, 2009 at 10:22 am
Simon,
The suggestion of TAIR moving to subscriptions has to be one of the worst I have ever heard, but its to be expected from NSF. They have demonstrated time and time again a deep disdain for the more mundane aspects of scientific research. I have seen them cut funds for computing equipment, disk storage, and system administrators from a grant for a data collection center!?!