2. Big Science and Infrastructure
• Hurricanes affect humans
• Multi-physics: atmosphere, ocean, coast, vegetation, soil
– Sensors and data as inputs
• Humans: what have they built, where are they, what will they do
– Data and models as inputs
• Infrastructure:
– Urgent/scheduled processing, workflows
– Software applications, workflows
– Networks
– Decision-support systems,
visualization
– Data storage,
interoperability
3. Long-tail Science and Infrastructure
• Exploding data volumes &
powerful simulation methods
mean that more researchers
need advanced infrastructure
• Such “long-tail” researchers
cannot afford expensive
expertise and unique
infrastructure
• Challenge: Outsource and/or
automate time-consuming
common processes
– Tools, e.g., Globus Online
and data management
• Note: much LHC data is moved
by Globus GridFTP, e.g., May/
June 2012, >20 PB, >20M files
– Gateways, e.g., nanoHUB,
CIPRES, access to scientific
simulation software
NSF grant size, 2007.
(“Dark data in the long tail
of science”, B. Heidorn)
4. Cyberinfrastructure (e-Research)
• “Cyberinfrastructure consists of computing systems,
data storage systems, advanced instruments and
data repositories, visualization environments, and
people, all linked together by software and high
performance networks to improve research
productivity and enable breakthroughs not otherwise
possible.”
-- Craig Stewart
• Infrastructure elements:
– parts of an infrastructure,
– developed by individuals and groups,
– international,
– developed for a purpose,
– used by a community
5. Cyberinfrastructure Framework for 21st Century
Science and Engineering (CIF21)
• Cross-NSF portfolio of activities to provide integrated cyber resources
that will enable new multidisciplinary research opportunities in all
science and engineering fields by leveraging ongoing investments and
using common approaches and components (http://www.nsf.gov/cif21)
• ACCI task force reports (http://www.nsf.gov/od/oci/taskforces/index.jsp)
– Campus Bridging, Cyberlearning & Workforce Development, Data
& Visualization, Grand Challenges, HPC, Software for Science &
Engineering
– Included recommendation for NSF-wide CDS&E program
• Vision and Strategy Reports
– ACI - http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf12051
– Software - http://www.nsf.gov/publications/pub_summ.jsp?ods_key=nsf12113
– Data - http://www.nsf.gov/od/oci/cif21/DataVision2012.pdf
• Implementation
– Implementation of Software Vision
http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=504817
6. Infrastructure Role & Lifecycle
Create and maintain a
software ecosystem
providing new
capabilities that
advance and accelerate
scientific inquiry at
unprecedented
complexity and scale
Support the
foundational
research necessary
to continue to
efficiently advance
scientific software
Enable transformative,
interdisciplinary,
collaborative, science
and engineering
research and
education through the
use of advanced
software and services
Transform practice through new
policies for software addressing
challenges of academic culture, open
dissemination and use, reproducibility
and trust, curation, sustainability,
governance, citation, stewardship, and
attribution of software authorship
Develop a next generation diverse
workforce of scientists and
engineers equipped with essential
skills to use and develop software,
with software and services used in
both the research and education
process
7. Learning and Workforce Development
CI-focused
Workforce as Cyberinfrastructure
Cyber Scientists
to develop
new capabilities
CI-enabled
Area Scientists
to exploit
new capabilities
CI-skilled
Professional Staff
to support
new capabilities
CI
8. Example: Interactions in
Understanding the Universe (I2U2)
• An "educational virtual organization" to strengthen education
and outreach activities of scientific experiments at U.S.
universities and laboratories (CMS, Cosmic Rays, LIGO)
• Creates and maintains infrastructure and common fabric to
develop hands-on laboratory course content and provide
interactive learning experience bringing tangible aspects of
experiments into an accessible "virtual laboratory”
– e-Labs: for classrooms, using web tools
– i-Labs: for museums, using physical interactive interfaces
• Collaboration of scientists, computer scientists and educators
to grow and sustain scientific workforce, and to promote public
appreciation of and support for the complex collaborations of
our national scientific programs
• https://www.i2u2.org
9. Example: Data Science for the
Social Good (DSSG)
• Eric & Wendy Schmidt Data Science for Social Good
fellowship is a University of Chicago summer program for
aspiring data scientists to work on data mining, machine
learning, big data, and data science projects with social
impact
• 48 undergrad and grad students come to Chicago, work in
small-teams with governments and non-profits, led by full-time
mentors, to tackle real-world problems in education, health,
energy, transportation, etc.
• http://dssg.io
10. Common Elements of Examples
• Bring together students and
educators/teachers/mentors
• Use/build tools (preexisting for U2I2,
some preexisting, some novel for
DSSG)
• Have impact for students, and
ideally for science/society
12. Curricula Activities (CS)
• Parallel Computing
– TCPP model curriculum
• http://www.cs.gsu.edu/~tcpp/curriculum/
– Others also exploring this space
• E.g., new curriculum at CMU - http://hiperfit.dk/pdf/HIPERFIT-2-harper.pdf
– Intel Academic Community
• http://software.intel.com/en-us/academic
• Distributed Computing
– NSF Workshop: Designing Tools and Curricula for Undergraduate
Courses in Distributed Systems
• Parallel and Distributed Computing
– ACM/IEEE-CS Computer Science Curricula 2013
• Common elements that can apply outside CS
– Concerned faculty come together
– Work through needed changes (not just additions)
– Work with early adopters
– Update and expand
13. HPC University (HPCU)
• A virtual organization
• Goal: to provide cohesive, persistent, and sustainable on-line
environment to share educational and training materials for
continuum of HPC environments from desktops to the highest-end
facilities
• Resources to guide researchers, educators and students to
– Choose successful paths for HPC learning and workforce development
– Contribute high-quality and pedagogically effective materials that allow
individuals at all levels and in all fields of study to advance scientific discovery
• Actively seeks participation from all parts of HPC community to:
– Assess the learning and workforce development needs and requirements of
the community
– Catalog, disseminate and promote peer-reviewed and persistent HPC
resources
– Develop new content to fill the gaps to address community needs
– Broaden access by a larger and more diverse community via a variety of
delivery methods
– Pursue other activities as needed to address community needs
• http://hpcuniversity.org/
15. The role of training
• There’s a lot of computer/computing training available
• Synchronous and asynchronous
• Often led by centers, and universities with computational
research programs (XSEDE, DOE, etc.)
• Also supported by organizations such as Shodor Foundation,
http://www.computationalscience.org/
• Some aimed at users, others at educators (both K-12 and
university)
• Some is fairly specific computational science (e.g., molecular
dynamics), but other is fairly general (parallel programming)
• OSG offers training for grid computing, but mostly for actually
running or using OSG or HTDC systems (Condor) - https://
opensciencegrid.org/bin/view/Education/WebHome
– E.g. security for users, security for admins
– Also see http://www.campusgrids.org, and U Wisconsin and U Nebraska
classes and curricula based on their experience with and collaboration with
OSG
16. Software Carpentry
• Helps researchers be more productive by teaching them
basic computing skills
• Runs boot camps at dozens of sites around the world, and
also provides open access material online for self-paced
instruction
– Introduction to Unix shell; introduce pipes, loops, history, and the
idea of scripting
– Introduction to Python, to building components that can be used in
pipelines, and to when and why to break code into reusable
functions.
– Version control for file sharing, collaboration, and reproducibility.
– Testing (both the mechanics and the use of tests to define problems
more precisely).
– An introduction to either databases or NumPy, depending on the
audience
17. Software Carpentry Model
• Good expansion model – Initial instructors still are active,
but new instructors come from those trained who want to
share what they have learned, using the material provided
• Material is all open for use and improvement
• In some sense, Software Carpentry has become the
coordinators between those who want to learn and those
who want to teach
• http://software-carpentry.org
18. Argonne Training Program on
Extreme-Scale Computing (ATPESC)
• Two-week program for computational scientists
• Provides intensive hands-on training on the key
skills, approaches, and tools to design,
implement, and execute computational science
and engineering applications on current high-end
computing systems and the leadership-class
computing systems of the future
• As a bridge to that future, this program fills the
gap that exists in the training computational
scientists typically receive through formal
education or other shorter courses.
19. How can NSF help
(if you are in the US)
• Research Experiences for
Undergraduates (REU)
• Division of Undergraduate Education
(EHR/DUE)
• Cyberlearning
• NSF Research Traineeship (NRT)
• Support of workshops
• Support undergraduate travel to
conferences
20. Research Experiences for
Undergraduates - Supplements
• Goals: expand student participation in all kinds of research;
attract diversified pool of talented students into careers in
science and engineering; help ensure that they receive the best
education possible
• REU supplements typically provides support for 1-2 undergrads
to participate in research, can be more for large projects
• Mentoring is important; project should develop students'
research skills, involve them in the culture of research in the
discipline, and connect their research experience with their
overall course of study
• Support for undergrads involved in carrying out research should
be included as part of the research proposal rather than as a
post-award supplement, unless it was not foreseeable at the
time of the original proposal
• REU supplement requests are handled by the NSF program
officer for the underlying research grant
21. Research Experiences for
Undergraduates - Sites
• REU Sites host a summer cohort of
undergraduates for a structured research-learning
experience
– Vision: Extend research participation to students who
would otherwise lack such opportunities
• At least 50% from institutions other than the host
• At least 50% from schools with limited STEM research
opportunities
• Outreach to underrepresented groups is a plus
– Implementation: Create empowering cohort
experience that promotes STEM engagement
• Coherent intellectual focus to research topics
• Research mentoring and support
• Professional development, grad school prep
• Cohort building, networking opportunities, social events
22. Education and Human Resources /
Undergraduate Education (DUE)
• DUE goals include:
– Support Curriculum Development
• Stimulate and support research on learning.
• Promote development of exemplary materials and strategies for
education.
• Support model assessment programs and practices.
• Effect broad dissemination of effective pedagogy and materials.
• Enable long-term sustainability of effective activities.
– Prepare the Workforce
• Promote technological, quantitative, and scientific literacy.
• Support an increase in diversity, size, and quality of the next
generation of STEM professionals who enter the workforce with
two- or four-year degrees or who continue their studies in
graduate and professional schools.
• Invest in the nation's future K-12 teacher workforce.
• Fund research to evaluate and improve workforce initiatives.
• http://www.nsf.gov/div/index.jsp?org=DUE
23. Cyberlearning and Future Learning
Technologies NSF 14-526
• Vision:
– New technologies will transform learning opportunities, interests,
and outcomes in all phases of life, making it possible for learning to
be tailored to individuals and groups
– Best technological genres and socio-technical systems designed for
these purposes will be informed by how people learn
– Can make progress in understanding learning, moving toward
predictive computational models of individual and group learning
• Aims:
– Learning how to design and effectively use the learning technologies
of the future (Future Learning Technologies)
– Understanding processes involved in learning when learners can
have experiences that only technology allows (Cyberlearning)
• Every project addresses and connects 3 thrusts:
– Innovation
– Advancing understanding of how people learn in technology-rich
learning environments
– Promoting generalizability and transferability of new genres
24. NSF Research Traineeship
(NRT) NSF14-548
• To develop bold, new, potentially transformative, and
scalable models for STEM graduate training
• Ensure that graduate students develop the skills,
knowledge, and competencies needed to pursue a range
of STEM careers
• 1 initial priority research theme - Data-Enabled Science
and Engineering (DESE) - but other crosscutting,
interdisciplinary themes are also allowed, aligned with
national research priorities
• Emphasizes the development of competencies for both
research and research-related careers
• Creation of sustainable programmatic capacity at
institutions is an expected outcome.
• Replaces IGERT
25. Other events
• Grace Hopper Celebration of Women in
Computing
– World’s largest gathering of technical women in
computing
– Place where technical women gather to network, find
or be mentors, create collaborative proposals, and
increase the visibility of women’s contributions to
computing
• Tapia Celebration of Diversity in Computing
– Brings together undergraduate and graduate students,
faculty, researchers, and professionals in computing
from all backgrounds and ethnicities
• Both aim to promote their attendees work and
increase networking & mentoring
• Both can be inspirational for undergraduates!!
26. Learning and Workforce Development
CI-focused
Workforce as Cyberinfrastructure
Cyber Scientists
to develop
new capabilities
CI-enabled
Area Scientists
to exploit
new capabilities
CI-skilled
Professional Staff
to support
new capabilities
CI
27. Conclusions
• Lots of demand for trained staff and users
– Those who have the need are trying to provide training to fill that need
(pull)
• Seems to be lots of demand for educated developers, staff, users
– In general, the burden for filling this need seems to be on the
traditional academic system (push)
– Software Carpentry as an exception?
• In CS, lots of people teaching
– Starting to share experiences and lessons learned
– Moving towards some consensus on what to teach or how to teach it?
• Cyberinfrastructure is a common point where big science, long-tail
science, and education meet; it has a dual role
– Used to train/educate
– Needs to be refreshed by trained/educated developers
• Lots of opportunities exist!
– Form a community
– Decide what needs to be done
– Find the right opportunity