SAS is the largest private software company in the world that has been doing machine learning for 39 years. It is serious about Hadoop, as demonstrated by its joint R&D with Hadoop vendors and being a certified workload engine on YARN. SAS accelerates the analytical life cycle with its tools for data preparation, exploration, modeling, and deployment in Hadoop. It is currently delivering big data analytics solutions for customers like Rogers Media.
- My name is Felix ………. Been with SAS ANZ for 5 years.
- Responsible for the areas of data management and all things big data related.
As SAS, we are extremely excited about Hadoop and think it truly is a game changer in terms types of big data analytics it will organisations to do.
We have been investing and focusing on this area for a little while and I want to tell you some of the cool and interesting things we are doing in terms of SAS and Hadoop.
As I only have 25 minuets which us not a lot of time I want to focus on telling you all 5 things that you didn’t know about SAS and what SAS is doing with Hadoop.
If you all walk away thinking, that was cool, didn’t know SAS do that with Hadoop, then I would be a very happy man,
Let’s see how well I go.
For the next 25 minuets, I want to tell you 5 things about SAS and hadoop that perhaps you did not know.
Firstly Hands up people who are familiar with who SAS is and what we do.
I had a funny suspicion you guys are not all that interested about just SAS so,
Start with a simple one related with just SAS.
Great one to remember for pub quiz and trivia
We have been for quite some time. Something I didn’t know when I joined SAS.
Have been in Australia New Zealand for a little while.
We have products that are rated by analyst as a leader in a number of product categories. How have we done that.
The combination of the R&D focus and broad product has been instrumental as we build new innovative solutions around Hadoop which we will get to. We have focused a lot of that R/D and engineering effort into better integration with Hadoop.
What you also might not be aware of is that we are a very R&D and engineering focused organisations.
12-15% industry average.
Continuous growth resulted in a yearly revenue of 3B last year.
We are proud of the fact that we have been included in the top 50 best place to work list for the last 5 years.
I think we have some of the smartest people in the industry and continue to attract top talent.
We do have a number of openings so come and speak to me afterwards if you are interested.
Whilst SAS has evolved over the years, and our $3B revenue comes from areas such as Data Management, Advanced Analytics, reporting and industry solutions.
Analytics and data mining or machine learning as the more recent buzz word, has been the focus from day one and continue to be the centre of everthing we do.
Advanced Analytics which really is a multiple discipline areas that spans around machine learning, super/unsuper. SAS has been providing solutions from day 1.
Advanced analytics is really a multi-displline area that has evolved over the years to include the areas of machine learning.
SAS has been the pioneer and continue to innovate in this space whether it be in the area of superverised learning (include regression) and unsupervised learning, (clustering and segmentation). We have been continue to be the 800-pound gorilla.
Machine learning is most common big data application, Post childs such as Netflix,
Applying it on big data with new modern architecture and technologies.
For more information on Machine Learning, see the SAS.com webpage on Machine Learning and the SAS Global Forum paper by Patrick Hall (SAS R&D) for more information:
http://www.sas.com/en_us/insights/analytics/machine-learning.html
http://support.sas.com/resources/papers/proceedings14/SAS313-2014.pdf
What many SAS customers will recognize as machine learning, I would claim is within the intersection of data mining and machine learning, which also includes tools from many fields. It’s a very rich area for data analysis algorithms.
38% of Advanced Analytics Market Share in 2013
Just to show you what I mean, I have taken a snapshot of the algorim and techniques we suported through our our various solutions.
What SAS bring is both the breadth, in terms of the sheet number of algorisms ( and we have a lot of those, decision tree, neural network) but also the depth in terms of of supporting the end of end deployment process. Which includes things like sampling technique and analytical model management.
This is just data mining algo
I should add, what I am showing here is a subset of only the data mining capabilities. And does not include things like, econometics, forecasting and optimisation algorithms.
IF I have not made my point clear, I guess the point here is that at SAS WE DO ANALYTICS AND MACHINE LEARNING !! And we are proud of it.
What we have done and will continue to do is enable all of our analytical capabilities to run on Hadoop.
Breath in terms of algorism, (regressions, decision tree, neural network) .
Depth in terms of sampling techniqies to deployment and management of analytical models.
Standard linear regression modelling to neural network to
Sampling techniques to model managements.
The most comprehensive analytical capabilities on the market.
Whether it be supervised learning (regression), unsupervised learning (
Extended a lot of our analycal model and procedure to run within a Hadoop environment, High performance analytcs or HP procedures.
Complete in-memory, complete parallel and complete within a hadoop cluster
Speaking of Hadoop. Point number 3 leads to our commitment and believe in the Hadoop eco system.
We are deadly serious about hadoop. Internally at SAS it is one of the most important initiative in recent years.
I guess I wouldn’t speaking to you all here today if we weren’t
3 points, around powerful catalyst, we want to bring SAS to hadoop but we also want to work with the community. We don’t mearly think Hadoop as another data source, we view it as a powerful platform to run all of SAS.
1, definitely not just another data source …..
All the analytical functions and applications, we are modernising them, building new architectures and bring them all across to the world of hadoop.
Continue R&D focus in this space.
We are committed and doing all this because customer are asking us but we also see tremendous benefits in leveraging more of Hadoop technology.
What customers want for us is loud and clear, organisations recognise our strength and heritage in analytics and wants us to bring proven, mature application to hadoop. That’s what we have been working on.
By committed, I don’t mean we will be contributing open source into the Hadoop community (not as yet), committed in building and enabling new applications to run natively on Hadoop
Partnership with all the leading vendors, very close co-development relationship with hortonworks
Recently we have taken the next step in terms of working with the Hadoop community.
One of the most recent initiative we have done is being part of the founding member of the open data platform initiative of which HortonWorks is one of the other key founding members. (also includes pivotal, ibm, teradata,
For those of you not familiar with the open data platform, it is a relatively new initiative led by Hortonworks and pivotal to create a common core platform (HDFS, YARN, mapreduce), for hadoop across multiple vendors and distributions. As as well as pivotal you also have the likes of IBM and teradata.
We are involved because as an application vendor on hadoop we believe in what the ODP is trying to do and the benefit it will have with our customers in the long term. We want to drive deep, robust, integration into the heart of Hadoop and being part of the ODP will help us to do that.
So what kinds of things are we looking at doing with other members of ODP alliance.
As a major application provider on Hadoop, we see the challenges faced by organisations as they deploy hadoop applications into producton.
Standardising the core components and working closely with Hortonworks will means more robust products and accelerate product release cycle.
“Hadoop and the ecosystem around it have been built on new ways to attack big problems. SAS remains committed to innovation in big data analytics and to providing high-quality software that our customers can count on. SAS’ participation in the Open Data Platform Alliance aligns with these commitments, and will benefit the increasing number of organizations – and SAS customers – that are turning to Hadoop to store and process big data. With SAS software managing and analyzing data from Hadoop, our customers can solve their most pressing challenges – better interacting with their customers, fighting fraud, managing risk, improving product quality and more.”
Early days
Speaking of integration into the heart of Hadoop ecosystem, I want to talk about fact number 4 which is that SAS is a certified workload engine on top of Hadoop.
So what kinds of things are we looking at doing with other members of ODP alliance. Deep, robust, integration into the heart of Hadoop is a top priority.
Which leads us to fact number 4 that you might not be aware of.
So what has been the output of our commitment in terms of product and technology I hear you ask?
Modernised SAS …..
Machine learning is the ultimate holy grail of big data applications. Post childs such as Netflix,
Whilst there are a raft of emerging new technologies
SAS has been doing this for
Whether it be supervised learning, unsupervised learning or semi-supervised learning
Deep learning
Model factory
Applying it on big data with new modern architecture and technologies.
One of the advantage of being a leader is you can innovate
Start with a simple one
This is a big deal for us and more important a big deal for organisations looking at using Hadoop as the big data platform
Data Locality
For those of you who are just learning about YARN.
Think of it as the resource management layer of a operating system, where the operating system in this case is Hadoop.
It helps organise manage workload in Hadoop, also help organise maximise the investment they have made with their Hadoop cluster.
With the deep integration we have built with Hadoop by taking advantage of data locality.
We have build applications on top of Hadoop to accelerate the analytical life cycle.
Making it easier and cost effective for organizations to drive insights out of Hadoop. We are doing that be building hadoop powered applications that drive the end to end analytical life cycle.
The complete analytical life cycle is important to understand, as this is the reality most companies face:
- Data needs to be prepared specifically for analytics (a crucial step), then it needs to be explored in a highly efficient environment, purpose built for interactive visualization, then it needs to be modeled in a purpose built advanced analytics environment. Finally, many times the final scoring can happen where the bulk of the data reside, in Hadoop.
Through it all, key metadata act as glue, ensuring proper governance of the processes and data, tracking lineage and impact analysis, so that the user can know what may result from any changes at any point in the cycle.
Easy to use interface that allows to do some.
Behind the scene we generate code that runs natively within Hadoop, taking advantage of the massive scalable framework work.
From a data discover and visualisation perspective. Our Visual analytics offerings leverage In-memory sas server that runs within the Hadoop cluster leveraging the YARN framework.
By taking an in-memory approach and bypassing MapReduce, we can make the data discovery process much more interactive. Eliminate the batch based latency of Mapreduced based work load.
Taking advantage of the same in-memory architecture on hadoop.
In-memory statistics are targeted more towards programmer or hardware data scientist who wants to do data manipulation and model development within a hadoop environment .
By interacting with hadoop data has been loaded persisted into memory, again we allow data scientist to much more productive through a low latency programming envionment against Hadoop data
Point number 5.
We are making big data analytics on Hadoop a reality today. Across different industries
### IAG
High performance analytics solutions. Accelerating the model development process. (17 hours to 1min)
Doing risk modelling IAG saw the time it took to analyse 20 million records against 186 variables (wide), reduce from 17 hours to just one minute. This could mean where actuaries and modelers were previously restricted to a cycle of one model a week, they can look toward cycling many models each day.
### Macy’s
The initial objective: stop the “one size fits all email marketing” approach, resulting in a reduction of 20% in churn subscription. This lead to generating more accurate, real-time decisions about customer preferences. The ability to gain customer insight across channels is a critical part of improving customer satisfaction and revenues, and Macys.com uses SAS to validate and guide the site's cross- and up-sell offer algorithms.
20% reduction in churn
$500,000 annual savings
Customer lifetime value analysis
More accurate response prediction
Optimized promotions
Finally but definitely not the least, I want to talk about Rogers.
Poster child of everthing we just talked about.
The ultimate goal was to position the most adequate advertising to a given visiting customer on Rogers’ web site.
Traits are a characteristics/parameter of each visit. For example, the time of a visit, the number of clicks, the target browser, the device used (iPad, Samsung, etc). The 600 traits used in the final model were actually derived from a list of 75,000 original traits.
Hortonworks youtube channel.
Recently we have taken the next step in terms of working with the Hadoop community.
We have the resource, commitment, we have the technology and we are making it real on Hadoop.
We are doing by working closely with the hadoop community and technoloy ecosystem. Which we recognise as being extremely important.
Where do I find out more.To find out more, a good whitepaper to get started.
Nice and easy to remember.
There is a lot to what SAS is doing with Hadoop.
My contact detail,
I will be sticking aorund and colleague with SAS shirt.
Enjoy your rest of your day here at the conference.