This document summarizes a podcast discussion about how the TEAM Network at Conservation International is using big data analytics to study biodiversity in tropical rainforests. The TEAM Network collects sensor and camera trap data from protected areas worldwide and analyzes the data using Bayesian models on HP Vertica to monitor species populations and detect trends. Their end-to-end system brings field data into a central repository for analysis and shares results through a dashboard. They are working to expand monitoring to more countries and species using cloud deployment and advanced analytics that leverage hardware processing power.
Breaking the Kubernetes Kill Chain: Host Path Mount
How Big Data Helps Study Tropical Ecosystems
1. How Big Data Generates New Insights into What’s
Happening in Tropical Ecosystems Worldwide
Transcript of a sponsored discussion on how large-scale monitoring of rainforest, biodiversity
and climate has been enabled and accelerated by cutting-edge, big-data capture, retrieval and
analysis.
Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android.
Sponsor: Hewlett Packard Enterprise
Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I'm
Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this
ongoing sponsored discussion on IT innovation and how it’s making an impact on
people’s lives.
Our next big-data case study discussion explores how large-scale monitoring of
rainforest biodiversity and climate has been enabled and accelerated by cutting-
edge big-data capture, retrieval, and analysis.
We'll learn how quantitative analysis and modeling are generating new insights
into what’s happening in tropical ecosystems worldwide and we'll hear how such
insights are leading to better ways to attain and verify sustainable development
and preservation methods and techniques.
To learn more about data science and how hosting that data science in the cloud is helping the
study of biodiversity, we're pleased to welcome our guests. We're here with Eric Fegraus. He is
Senior Director of Technology of the TEAM Network at Conservation International in Arlington,
Virginia. Welcome, Eric.
Eric Fegraus: Hi, Dana. It’s great to be here. Thank you.
No-Compromise Big Data Analytics
With HP Vertica OnDemand
Request Your 30-Day Free Trial
Gardner: We're glad to have you. We're also here with Jorge Ahumada. He is Executive Director
of the TEAM Network also at Conservation International. Welcome, Jorge.
Jorge Ahumada: Great to be here.
Gardner: Let’s start with the driving trends here. Clearly, knowing what’s going on in
environments in the tropics helps us understand what to do and what not to do. How has that
Gardner
2. changed? We spoke about a year ago, Eric. Are there any trends or driving influences that have
made this data gathering more important than ever.
Fegraus: Over this last year, we’ve been able to roll out our analytic systems across the TEAM
Network. We're having more-and-more uptake with our protected-area managers using the
system and we have some good examples where the results are being used.
For example, in Uganda, we noticed that a particular cat species was trending
downwards. The folks there were really curious why this was happening. At first,
they were excited that there was this cat species, which was previously not
known to be there.
This particular forest is a gorilla reserve, and one of the main economic drivers
around the reserve is ecotourism, people paying to go see the gorillas. Once they
saw that these cats are going down, they started asking what could be impacting this. Our system
told them that the way they were bringing in the eco-tourists to see the gorillas had shifted and
that was potentially having an impact of where the cats were. It allowed them to readjust and
think about their practices to bring in the tourists to the gorillas.
Information at work
Gardner: Information at work.
Fegraus: Information at work at the protected-area level.
Gardner: Just to be clear for our audience, the TEAM Network stands for the Tropical Ecology
Assessment and Monitoring. Jorge, tell us a little bit about how that came about, the TEAM
Network and what it encompasses worldwide?
Ahumada: The TEAM Network was a program that started about 12 years ago and it was started
to fill a void in the information we have from tropical forests. Tropical forests cover a little bit
less than 10 percent of the terrestrial area in the world, but they have more than
50 percent of the biodiversity.
So they're the critical places to be conserved from that point of view, despite the
fact we didn’t have any information about what's happening in these places.
That’s how the TEAM Network was born, and the model was to use data
collection methods that were standardized, that were replicated across a number
of sites, and have systems that would store and analyze that data and make it
useful. That was the main motivation.
Gardner: Of course, it’s super-important to be able to collect and retrieve and put that data into
a place where it can be analyzed. It’s also, of course, important then to be able to share that
Fegraus
Ahumada
3. analysis. Eric, tell us what's been happening lately that has led to the ability for all of those parts
of a data lifecycle to really come to fruition?
Fegraus: Earlier this year, we completed our end-to-end system. We're able to take the data from
the field, from the camera traps, from the climate stations, and bring it into our central repository.
We then push the data into Vertica, which is used for the analytics. Then, we developed a really
nice front-end dashboard that shows the results of species populations in all the protected areas
where we work.
The analytical process also starts to identify what could be impacting the trends that we're seeing
at a per-species level. This dashboard also lets the user look at
the data in a lot of different ways. They can aggregate it and they
can slice and dice it in different ways to look at different trends.
Gardner: Jorge, what sort of technologies are they using for that
slicing and dicing? Are you seeing certain tools like Distributed
R or visualization software and business-intelligence (BI)
packages? What's the common thread or is it varied greatly?
Ahumada: It depends on the analysis, but we're really at the forefront of analytics in terms of
big data. As Michael Stonebraker and other big data thinkers have said, the big-data analytics
infrastructure has concentrated on the storage of big data, but not so much on the analytics. We
break that mold because we're doing very, very sophisticated Bayesian analytics with this data.
One of the problems of working with camera-trap data is that you have to separate the detection
process from the actual trend that you're seeing because you do have a detection process that has
error.
Hierarchical models
We do that with hierarchical models, and it's a fairly complicated model. Just using that kind
of model, a normal computer will take days and months. With the power of Vertica and power of
processing, we’ve been able to shrink that to a few hours. We can run 500 or 600 species from 13
sites, all over the world in 5 hours. So it’s a really good way to use the power of processing.
We’d been also more recently working with Distributed R, a new package that was written by
HP folks at Vertica, to analyze satellite images, because we're also interested in what’s happening
at these sites in terms of forest loss. Satellite images are really complicated, because you have
millions of pixels and you don’t really know what each pixel is. Is it forest, agricultural land, or a
house? So running that on normal R, it's kind of a problem.
Distributed R is a package that actually takes some of those functions, like random forest and
regression trees, and takes full power of the vertical processing of Vertica. So we’ve seen a 10-
4. fold increase in performance with that, and it allows us to get much more information out of
those images.
Gardner: Not only are you on the cutting-edge for the analytics, you've also moved to the
bleeding edge on infrastructure and distribution mechanisms. Eric, tell us a little bit about your
use of cloud and hybrid cloud?
Fegraus: To back up a little bit, we ended up building a system that uses Vertica. It’s an on-
premise solution and that's what we're using in the TEAM Network. We've since realized that
this solution we built for the TEAM Network can also be readily scalable to other organizations
and government agencies, etc., different people that want to manage camera trap data, they want
to do the analytics.
So now, we're at a process where we’ve been essentially doing software development and
producing software that’s scalable. If an organization wants to replicate what we’re doing, we
have a solution that we can spin up in the cloud that has all of the data management, the
analytics, the data transformations and processing, the collection, and all the data quality
controls, all built into a software instance that could be spun up in the cloud.
No-Compromise Big Data Analytics
With HP Vertica OnDemand
Request Your 30-Day Free Trial
Gardner: And when you say “in the cloud,” are you talking about a specific public cloud, in a
specific country or all the above, some of the above?
Fegraus: All of the above. Certainly we'll work within HP cloud. We'll be using Vertica or we're
using Vertica OnDemand. We're actually going to transition our existing on-premise solution into
Vertica OnDemand. The solution we’re developing uses mostly open-source software and it can
be replicated in the Amazon cloud or other clouds that have the right environments where we can
get things up and running.
Gardner: Jorge, how important is that to have that global choice for cloud deployment and
attract users and also keep your cost limited?
Ahumada: It’s really key, because in many of these countries, it's very difficult for some of those
governments to expand out their old solutions on the ground. Cloud solutions offer a very good,
effective way to manage data. As Eric was saying, the big limitation here is which cloud
solutions are available in each country. Right now, we have something with cloud OnDemand
here, but in some of the countries, we might not have the same infrastructure. So we'll have to
contract different vendors or whatever.
But it's a way to keep cost down, deliver the information really quick, and store the data in a way
that is safe and secure.
5. What's next?
Gardner: Eric, now that we have this ability to retrieve, gather, analyze, and now distribute,
what comes next in terms of having these organizations work together? Do we have any
indicators of what the results might be in the field? How can we measure the effectiveness at the
endpoint -- that is to say, in these environments based on what you have been able to accomplish
technically?
Fegraus: One of the nice things about the software that we built that can run in the various cloud
environments, is that it can also be connected. For example, if we start putting these solutions in
a particular continent, and there are countries that are doing this next to each other, there are not
going to be silos that will be unable to share an aggregated level of data across each other so that
we can get a holistic picture of what's happening.
So that was very important when we started going down this process, because one of the big
inhibitors for growth within the environmental sciences is that there are these traditional silos of
data that people in organizations keep and sit on and essentially don't share. That was a very
important driver for us as we were going down this path of building software.
Gardner: Jorge, what comes next in terms of technology. Are the scale issues something you
need to hurdle to get across? Are there analytics issues? What's the next requirements phase that
you would like to work through technically to make this even more impactful?
Ahumada: As we scale up in size and start having more granularity in the countries where we
work, the challenge is going to be keeping these systems responsive and information coming.
Right now, one of the big limitations is the analytics. We do have analytics running at top speeds,
but once we started talking about countries, we're going to have an the order of many more
species and many more protected areas to monitor.
This is something that the industry is starting to move forward on in terms of incorporating more
of the power of the hardware into the analytics, rather than just the storage and the management
of data. We're looking forward to keep working with our technology partners, and in particular
HP, to help them guide this process. As a case study, we're very well-positioned for that, because
we already have that challenge.
Gardner: Also it appears to me that you are a harbinger, a bellwether, for the Internet of Things
(IoT). Much of your data is coming from monitoring, sensors, devices, and cameras. It's in the
form of images and raw data. Any thoughts about what others who are thinking about the impact
of the IoT should consider, now that you have been there?
Fegraus: When we talk about big data, we're talking about data collected from phones, cars, and
human devices. Humans are delivering the data. But here we have a different problem. We're
6. talking about nature delivering the data and we don't have that infrastructure in places like
Uganda, Zimbabwe, or Brazil.
So we have to start by building that infrastructure and we have the camera traps as an example of
that. We need to be able to deploy much more, much larger-scale infrastructure to collect data
and diversify the sensors that we currently have, so that we can gather sound data, image data,
temperature, and environmental data in a much larger scale.
Satellites can only take us some part of the way, because we're always going to have problems
with resolution. So it's really deployment on the ground which is going to be a big limitation, and
it's a big field that is developing now.
Gardner: Drones?
Using drones
Fegraus: Drones, for example, have that capacity, especially small drones that are showing to
be intelligent, to be able to collect a lot of information autonomously. This is at the cutting edge
right now of technological development, and we're excited about it.
Gardner: Well great. I'm afraid we will have to leave it there. We have been learning and
exploring how large-scale monitoring of rainforest, biodiversity and climate has been enabled
and accelerated by cutting-edge, big-data capture, retrieval, and analysis. And we've seen how
quantitative analysis and modeling are generating new insights into what's happening in tropical
ecosystems worldwide.
So a big thanks to our guests. We've been here with Eric Fegraus, Senior Director of Technology
of the TEAM Network at Conservation International. Thank you, Eric.
No-Compromise Big Data Analytics
With HP Vertica OnDemand
Request Your 30-Day Free Trial
Fegraus: Thank you, Dana.
Gardner: And we've been joined also by Jorge Ahumada, the Executive Director of the TEAM
Network, also at Conservation International. Thanks so much.
Ahumada: Thank you.
Gardner: And a big thank you to our audience as well for joining us for this big data innovation
case study discussion.
7. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of
HP sponsored discussions. Thanks again for listening, and come back next time.
Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android.
Sponsor: Hewlett Packard Enterprise
Transcript of a sponsored discussion on how large-scale monitoring of rainforest, biodiversity
and climate has been enabled and accelerated by cutting-edge, big-data capture, retrieval and
analysis. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved.
You may also be interested in:
• Redcentric Uses Advanced Configuration Database to Focus Massive Merger Across
Multiple Networks
• HP at Discover delivers the industry's first open, hybrid, ecosystem-wide cloud
architecture
• How Tableau Software and Big Data Come Together: Strong Visualization Embedded on
an Agile Analytics Engine
• Big Data Helps Conservation International Proactively Respond to Species Threat in
Tropical Forests
• How Globe Testing helps startups make the leap to cloud- and mobile-first development
• GoodData analytics developers on what they look for in a big data platform
• ITIL-ITSM tagteam boosts Mexican ISP INFOTEC's operations quality
• Novel consumer retail behavior analysis from InfoScout relies on HP Vertica big data
chops
• IT Operations Modernization Helps Energy Powerhouse Exelon Acquire Businesses
• ECommerce portal Avito uses big data to master rapid fraud detection
• How a Hackathon Approach Juices Innovation on Big Data Applications for Thomson
Reuters
• How Waste Management Builds a Powerful Services Contiunuum Across Operations,
Infrastructure, Development, and IT Processes
• GSN Games hits top prize using big data to uncover deep insights into gamer preferences