'Microsoft Research Infrastructure, the British Library and the Alan Turing Institute' presentation by Dr Kenji Takeda.
Fourth annual BL Labs Symposium, 7 Nov 2016
Microsoft Research Infrastructure, the British Library and the ATI
1. Microsoft Research, the
British Library and the
Alan Turing Institute
Dr Kenji Takeda (kenji.takeda@microsoft.com)
Microsoft Research
@MSFTResearch
2.
3. British Library Labs cloud
analysis of digital catalogues,
including 19th Century books
scanned by Microsoft.
Unlocking Humanities Research
@MechCuratorBot
mechanicalcurator.tumblr.com
20. A-series
• 1-16 cores
• 0.75-112GB RAM
• 20-605 GB HDD
• Up to 40Gbit/s
RDMA network
D-series
• 1-16 cores
• 3.5-112 GB RAM
• Up to 800GB SSD
F-series
• 16 cores
• 32 GB RAM
• 256 GB SSD
G-series
• 32 cores
• 448 GB RAM
• 6.1 TB SSD
H-series
• 8-16 cores
• 224GB RAM
• Up to 40Gbit/s
RDMA network
N-series
• 24 cores
• 224 GB RAM
• 4 x K80 GPUs
• Up to 40Gbit/s
RDMA network
21. 22
Platform Services
Infrastructure Services
Web Apps
Mobile
Apps
API
Management
API Apps
Logic Apps
Notification
Hubs
Content
Delivery
Network (CDN)
Media
Services
BizTalk
Services
Hybrid
Connections
Service Bus
Storage
Queues
Hybrid
Operations
Backup
StorSimple
Azure Site
Recovery
Import/Export
SQL
Database
DocumentDB
Redis
Cache
Azure
Search
Storage
Tables
Data
Warehouse Azure AD
Health Monitoring
AD Privileged
Identity
Management
Operational
Analytics
Cloud
Services
Batch
RemoteApp
Service
Fabric
Visual Studio
App
Insights
Azure
SDK
VS Online
Domain Services
HDInsight Machine
Learning
Stream
Analytics
Data
Factory
Event
Hubs
Mobile
Engagement
Data
Lake
IoT Hub
Data
Catalog
Security &
Management
Azure Active
Directory
Multi-Factor
Authentication
Automation
Portal
Key Vault
Store/
Marketplace
VM Image Gallery
& VM Depot
Azure AD
B2C
Scheduler
Facial recognition
Provided by Microsoft Cognitive Services’ Computer Vision and Emotion APIs.
Composition analysis
Developed by JoliBrain using DeepDetect. A set of deep neural networks reads the image pixels and extracts a high number of salient features. These features are then fed into a search engine that looks for the nearest per feature matches from the Tate archive.
Context analysis
Developed by JoliBrain using DeepDetect and word2vec. A variety of deep neural networks process both the images and their captions and tries to find inner relations, either based on location or semantic matching among words and sentences.
https://www.openacademic.ai/
Open academic society
The elegance of the solution is in its simplicity – something that has been lacking in the machine learning space
The first issue many enterprises face is data ingestion. With the cloud, you can bring in data sources with the ease of a drop down or drop your on-premises data set into the built in storage space. Users can then model in our development environment – Machine Learning Studio
where we’re offering R, Python and SQLite as first class citizens in addition to our world-class Microsoft algorithms.
The second issue – and often the primary one – is putting finished work into production in a way others can use. Client devices, websites, mobile applications, excel spreadsheets you name it anything that supports HTTP requests.
We’ve heard from many data scientists that they model in R on a Linux stack but then have to hand over their work to developers who need to translate that into another language to actually make it work.
This time consuming and unnecessary process has been eliminated with our system, as the model is with a click transformed into a web service end-point that can run over any data, anywhere and connect to any solution or client.
Next, not only can this model be put into production for your company, it can be made available for the world on our Machine Learning Marketplace. Microsoft hosts your solution and markets it for you, while you have the freedom to brand and monetize as you see fit.
The elegance of the solution is in its simplicity – something that has been lacking in the machine learning space
The first issue many enterprises face is data ingestion. With the cloud, you can bring in data sources with the ease of a drop down or drop your on-premises data set into the built in storage space. Users can then model in our development environment – Machine Learning Studio
where we’re offering R, Python and SQLite as first class citizens in addition to our world-class Microsoft algorithms.
The second issue – and often the primary one – is putting finished work into production in a way others can use. Client devices, websites, mobile applications, excel spreadsheets you name it anything that supports HTTP requests.
We’ve heard from many data scientists that they model in R on a Linux stack but then have to hand over their work to developers who need to translate that into another language to actually make it work.
This time consuming and unnecessary process has been eliminated with our system, as the model is with a click transformed into a web service end-point that can run over any data, anywhere and connect to any solution or client.
Next, not only can this model be put into production for your company, it can be made available for the world on our Machine Learning Marketplace. Microsoft hosts your solution and markets it for you, while you have the freedom to brand and monetize as you see fit.
Azure provides a huge range of virtual machines types, so you can always find one to match your research needs
A-series provides great value.
Azure Big Compute A8/A9 nodes provide high-performance networking (RDMA) and fast CPUs, for supercomputer-level performance.
Network latency of 2-3 microsecs matches that of most HPC clusters.
Application benchmarking shows as good, or better, performance than most HPC clusters
Also available are A10/A11 nodes, with fast CPUs but without high-speed networking, for more affordable compute-intensive tasks
D-series provides fast CPUs and local solid-state hard drives.
There are three types of D-series machines
Standard D-series
DS-series, designed for fast storage/IO requirements, for use with Azure Premium Storage
DV-Series, with the latest Azure-customized Intel E5v3 (formerly codename “Haswell”)processors, for even better CPU performance (35% faster than standard D-series VMs)
G-series provide massive VM capabilities
Up to 32 cores, and a huge 448GB of RAM
Coupled to over 6TB of solid-state disk
Ideal for data-intensive applications, including large databases
N-series coming in Spring 2016. These have nVidia GPUs, providing state-of-the-art high-performance computing, better than most university clusters
Supports up to 24 compute cores per node
With up to 4 K80 GPUs (two physical cards)
Including RDMA high-speed networking
Azure has a complete set of options to tackle any research problem
Why this Slide:
It shows we have a very broad platform. It about BOTH IaaS and PaaS, that these work together. It shows that we continue to lead in world class IT capabilities and that there’s really nothing missing.
Key Points:
We have already seen how the Azure Platform is IaaS + Pass – but I want you to understand that this is a huge number of capabilities – IT building blocks if you will.
Every one of these blocks you provision anytime, self-service anywhere in the world 24x7. You pay for what you use, you can get more or less anytime and you can fully automate everything…
DON’T spent too much time on this slide – you are going to DEMO (aren’t you!!!)… DON’T go through each block…
Transition to NEXT Slide: Make the build go backwards to show JUST IaaS and then you will go to the demo to show it.
Please apply for one of our 12 month awards, which give you substantial Azure resources for your project.
CALL TO ACTION:
Apply for an award at azure4research.com (you can also email us at azurerfp@microsoft.com)
Next deadline is 15 December, and every two months (15 Feb, 15 April, etc)
Join our LinkedIn group and join the conversation on Twitter.
PROPOSAL GUIDELINES for Q&A
We are particularly interested in researchers wanting to tackle data science, big compute, internet of things, and Azure Machine Learning
Proposals are very short: 3 pages, 1000 words
Format is:
Project summary (100 words). We are specifically looking for opportunities where researchers are making use of higher-level Azure services (i.e. Not just virtual machines and storage). You can see what is available at https://azure.microsoft.com/en-us/services/
Impact. Will this award be of significant benefit to a community of users, either within a discipline or organisation. Will it push science forwards? It cannot be just benchmarking or a single scientific paper, but a scientific service that researchers are asking for is ideal. It should be a single project/theme, not a whole group’s activities. List collaborators who you will work with, and/or use your service. How will it be dissemintaed, beyond academic publications, e.g. open-source solution
Feasibility. Is the project appropriate for Microsoft Azure? Is it a sensible fit for the Cloud, and is it technically sound? Do you have people, with the right experience and time, in place to develop and deploy your application on Azure?
Resources. Does the applicant have enough resources (people/effort) to complete the project successfully. The award is only for Azure resources (compute, storage, bandwidth, etc), not for people. Around $20,000 USD in Azure resources over 12 months. Applicants need to describe proposed usage. e.g. ???? compute hours in 12 months, peak 32 cores. ??? TB per month data. We can help with this.
Note these awards are only for Azure time, not other costs.