2. • The HDF Group is providing a hosted JupyterLab environment at
https://hdflab.hdfgroup.org
• Open to anyone (you just need to register with The HDF Group)
• Provides access to HDF Kita Server (aka HSDS) – HDF data on S3
• Comes with sample notebooks, tutorials, datasets
• There is a small subscription fee ($10/month)
• ESIP attendees get a free 90 day trial
2HDF Kita Lab
HDF Kita Lab
Kubernetes
3. • HDF Kita Lab is based on JupyterLab
• JupyterLab is the next-generation web-based interface for running Python notebooks
• Extends classic Ipython environment with:
• Content browser for documents
• Upload/downloading of files
• Terminal App
• HDF Kita Lab Extends JupyterLab:
• Auto configures Kita Server
• FAQ Page on launcher
• HDF branding
3JupyterLab
4. • No messing with Python, package
installs, AWS, etc.
• Data ready for you
• Simple means to harness compute
cluster
4Simplify your life…
5. • HDF Kita Lab runs on AWS in a Kubernetes cluster
• Cluster can scale to handle different number of users
• Each user gets:
• 1 CPU Core (2.5GHz Xeon)
• 8 GB RAM
• 10 GB Disk
• 100 GB S3 Storage
• Access to HDF Kita Server
• Ability to read/write HDF data stored on S3
• User environment configured for commonly used Python Packages for
HDF users:
• H5py(d), pandas, h5netcdf, xarray, bokeh, dask
• HDF Kita Command Line tools:
• Hsinfo, hsls, hsget, hsload, etc.
5Features
6. • JupyterLab and Kita Server both runs as a set of Docker containers
• Kubernetes transparently manages running these containers across multiple
machines
6Kubernetes Platform
AWS
Kubernetes
JupyterHub HDF Kita Server (HSDS)
{Containers
8. • The S3 bucket used for storing HDF data provides unlimited capacity
• Cost effective ($0.02/GB/month vs $0.10/GB/month for EBS)
• Built in redundancy – so no danger of losing data via a disk crash
• Kita Server is a turbo-booster for accessing data on S3
• Requests are parallelized
• RAM cache
• Read/Write consistency
• Multi-tennant Access control
• ACLs for Folders & Files
8HDF Data on S3
9. • Each EBS Volume is an island…
• You can’t directly share your EBS data with others in JupyterLab
• HDF content in S3 can be shared with any Kita Lab user
• For each folder or file you can:
• Make it private (no one else can read or write)
• Make publically readable (anyone can read)
• Share with just who you want
• Use the hsacl tool to manage permissions
• We’ve seeded the /shared folder will some content to play with:
• NASA NCEP3 dataset (100GB)
• NASA Terra dataset (50GB)
• Daily Stock Market (150MB)
• More coming!
9Data Sharing
10. • Additional samples/data sets/tutorials
• Custom extensions
• File browser for Kita Server content
• HDF Viewer
• Bring in other JupyterLab extensions as they become stable
• Collaboration tools
• Github integratation
10Future Directions