UNM Cyberinfrastructure Day 2010 presentation: Applications in Biocomputing, biomedical and cheminformatics research computing cyberinfrastructure issues.
Promiscuous patterns and perils in PubChem and the MLSCN
Cyberinfrastructure Day 2010: Applications in Biocomputing
1. Jeremy Yang
Software Systems Manager
Division of Biocomputing
Dept. of Biochemistry & Molecular Biology
UNM School of Medicine
Cyberinfrastructure Day -- April 22, 2010
2. I. What is Biocomputing?
II. Cyber Revolution (~1980-2010+)
III. Cyberinfrastructure (To be or not to be?)
IV. Super Computing, Redefined
3. Division of Biocomputing
http://biocomp.health.unm.edu/
Department of Biochemistry & Molecular Biology
School of Medicine
Also affiliated with the NIH Roadmap-funded UNM
Center for Molecular Discovery
4. Biomolecular screening Data mining, machine
informatics learning
Cheminformatics 3D visualization
Bioinformatics Public data integration
Genomics Collaborations in
Virtual screening chemistry, biology,
medicine, comp sci
Molecular modeling
BIOMED 505 course
SAR (Structure-
Activity-Relationship) Software development,
management, deployment
& support
5. Larry Sklar, et al., UNMCMD (NIH Roadmap)
~$20M NIH awarded to date
6. 32 cpu Linux cluster 2+ Oracle instances
32GB RAM server PostgreSQL, MySQL
Linux: OpenSUSE, CentOS, Stereo graphics
RedHat, Fedora, Ubuntu workstation
SGI/IRIX 25+ scientific software
Windows, Mac OS X packages
Automated integration with
Supported in-house
NIH databases applications
We are cyberinfrastructure users and providers!
9. Nucleotide and protein sequence analysis
Genomics, proteomics
Merging with chemical biology, etc.
10. Computational search for likely
biological actives Example:
3D shape search;
Database may be real or virtual prozac & paxil
compounds
2D and 3D methods
2D similarity search
3D similarity search (shape,
pharmacophore)
docking (3D, protein binding site)
c/o OpenEye Rocs
12. Computational models for protein-ligand binding
Abl kinase
(1iep.pdb)
interaction potentia
hydrophobic (green
hbond acceptors (r
Gleevec in binding site
Gleevec is a leukemia drug known to bind with Abl kinase.
17. Rapid change, challenge and opportunity
Learning from history, trends (new not enough)
Winners and losers
Science, experts have led and followed.
~1980-2010 covers 3σ (99.7%)
And evolution...
18. Rapid change, challenge and opportunity
Learning from history, trends
Winners and losers
Science, experts have led and followed.
~1980-2010 covers 3σ (99.7%)
And evolution...
19. 1977: Atari 2600
1978: Space Invaders
1981: IBM-PC (MS-DOS)
1983: cellphone
1983: GNU Project
1984: Neuromancer,
William Gibson,
“cyberspace”
1984: Apple Mac, mouse,
windows & icons
20. 1985: Oracle 5 (client-server)
1989: Intel 486 Pentium (1M
transistors, 50MHz)
1990: MS Windows 3.0
1990: WWW (Berners-Lee)
1991: High Perf Comp &
Comm Act (Al Gore)
1991: Linux (Linux Torvalds)
1991: AOL
1991: ETrade
21. 1993: Jurassic Park (via SGI)
1993: NCSA Mosaic
1994: Netscape Navigator
1994: “Good Times” hoax
1994: Match.com
1995: “Concept” virus (Word)
1995: Internet Explorer
1995: Apache project
1995: Yahoo!
22. 1995: Amazon.com
1995: My mother gets email
1997: Google
1997: eBay
1999: Melissa virus (Outlook)
1999: Napster (p2p)
2000: MS convicted
2000: 3M USA broadband*
2000: dot-com bubble pops
*Fixed non dial-up internet connections >56k (FCC).
23. 2000: 802.11b wireless
2001: Apple iPod
2001: Apple iTunes
2001: Wikipedia
2003: Skype
2005: YouTube
2005: Rio power grid hacked
2005: NSA domestic surveillance
2006: Facebook
24. 2006: Amazon Cloud
2007: DOD hacked
2008: 70M USA broadband*
2009: Cyberdefense USA priority
2009: Twitter role in Iran election
protests
2010: UAVs are SOPs
2011: Cyber terrorism?
*Fixed non dial-up internet connections >56k (FCC).
25. The dotted line keeps moving...
Case study: database cheminformatics in
pharma research, 1990→2000.
26. In 1990, high speed chemical searching was
beyond standard capabilities.
Research groups managed local servers in
their labs & specialized DB engines (e.g.
Daylight Inc.).
By 2000, this function had moved to IT (via
Oracle cartridges, etc.) corporate informatics
infrastructure
Transition not smooth, but very beneficial.
27. Standard cocaine
functions:
substructure,
similarity,
identity
chemical
searching
imidazoles
28. (1) office equipment
(2) lab equipment
(3) experimental apparatus
(4) the experiment
(5) a commodity
(6) custom configured experimental
vehicle for exploration
(5) all of the above
29. (1) office equipment
(2) lab equipment
(3) experimental apparatus
(4) the experiment
(5) a commodity
(6) custom configured experimental
vehicle for exploration
(5) all of the above
31. Scientific research Scientific software for
experts
Computational research
Enabling software for
High performance
scientists
computing as a research
tool Commoditization (e.g.
cloud computing)
High performance
infrastructure as a Plumbing vs.
productivity tool experimental apparatus
Appropriate tiers and
domains
32. IT: “Poorly managed Research: “We need
computers and needy ill- power, flexibility and
trained users put the access and not another
system at risk.” lame PC.”
34. In ~5 yrs, super → un-super
Super computing? Define computer.
Advances from unexpected places:
gaming, movies (graphics -- vs. AI)
social networking (crowdsourcing)
even business (web standards, UIs, security)
Super computing is pushing the current limits
But where are the key frontiers?