Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Bacalhau: Stable Diffusion on a GPU

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 35 Publicité

Plus De Contenu Connexe

Similaire à Bacalhau: Stable Diffusion on a GPU (20)

Plus récents (20)

Publicité

Bacalhau: Stable Diffusion on a GPU

  1. 1. AI-generated NFTs on FVM with Bacalhau Stable Diffusion Bacalhau Ally Haire Developer Relations Engineer DeveloperAlly
  2. 2. Building a Text to Image Model (Stable Diffusion - GPU) Bacalhau Ally Haire Developer Relations Engineer DeveloperAlly
  3. 3. Agenda aka the timestamps… ● Let’s see what we’re building in action! ● Bacal… what? The why’s and how’s of Bacalhau ● Brief intro to Machine Learning and Stable Diffusion (text->image model) ● Show me the code! Coding up a text- to-image script ● Running on Bacalhau
  4. 4. Stable Diffusion on GPU with Bacalhau Making Open Source Dall-E!
  5. 5. The example… https://docs.bacalhau.org/examples/model-inference/stable-diffusion-gpu/
  6. 6. The example in action: Running our Open Source Dall-E… All anyone needs to do to run this example at any time is install Bacalhau (one line of code) and run this docker image!!! The Docker Image saved on Bacalhau Registry The python script on Docker to run Output folder to save to The text input flag in python script Bac. CLI Docker command
  7. 7. So many cool cod (pun intended) …
  8. 8. Why Bacalhau? A Primer on Building Machine Learning Models
  9. 9. How do we make this? That one time… when I asked ChatGPT what I needed to get started with ML models…. FYI: ChatGPT is a large language model trained with reinforcement learning Ahh the paradox of asking ML for ML help ;)
  10. 10. Can I do this locally or on cloud though? You could run this example locally - in fact I did manage to get it running on my Mac M1 with a few code changes, however, computing the image from text took a good 20 minutes or more You could try and run this in the cloud with more computing power - though I wasn’t able to find anywhere that provided a free-tier GPU model to use for it (and I wasn’t willing to pay to try something!).
  11. 11. Bacalhau Architecture The decentralised computation network
  12. 12. Bacal.. - what ?? Bacalhau is a network of open compute resources available to serve any data processing workload - It’s simple to use (you don’t need an AI degree!) - Requires minimal operational overhead or setup - It’s decentralised-first (or edge-first) principled - Aims to provide efficient distributed computation with batched tasks Learn more about Bacalhau! @BacalhauProject https://youtu.be/RZopDyTJ1pk
  13. 13. Bacalhau & FVM? FVM - programmable data on small amounts of state Bacalhau - Computation over this or any data including big data and support for GPUs Future: Bacalhau + FVM - calling bacalhau in your smart contracts!
  14. 14. Bacalhau Platform Architecture Bacalhau provides a platform for public, transparent, and optionally verifiable computation. It enables users to run arbitrary Docker containers and WebAssembly (wasm) images as tasks against data stored in the InterPlanetary File System (IPFS) It operates as a peer-to-peer network of nodes where each node has both a requestor and compute component
  15. 15. Bacalhau System Components ● Requester node (component) ● Compute node (component) ● Transport layer (interface) ● Executer (interface) ● Storage Provider (interface) ● Verifier (interface) ● Publisher (interface)
  16. 16. Bacalhau Job Lifecycle Job Submission Job Acceptance Job Execution Job Verification Job Publishing Job Submission Job Acceptance Job Execution Job Verification Job Publishing
  17. 17. Err… Stable Diffusion? No, you don’t need to be a data scientist!
  18. 18. AI and Machine Learning Quick Intro Artificial Intelligence is an umbrella term for a few different concepts. AI is any technique that allows computers to bring meaning to data in similar ways to a human. Machine learning is a subset of AI application that learns by itself and has 3 main types: • Supervised learning • Regression & Classification Algorithms • Unsupervised learning • Clustering & Association Algorithms • Reinforcement learning • Value, Policy or Model Based reinforcement methods Deep learning is a subset of machine learning application that teaches itself to perform a specific task.
  19. 19. Stable what now…? Generically, stable diffusion is what happens when you put a couple of drops of dye into a bucket of water. Given time, the dye randomly disperses and eventually settles into a uniform distribution which colours all the water evenly In computer science, you define rules for your (dye) particles to follow and the medium this takes place in. In the example we’re doing, Stable Diffusion is a machine learning model used for text-to-image processing (like Dall-E) and based on a diffusion probabilistic model that uses a transformer to generate images from text.
  20. 20. Building the Scripts Stable Diffusion on GPU with Google Colab
  21. 21. Tools & Environment For this example we’ll need - Google Colab (for testing our scripts) - https://colab.research.google.com/ - Optional: Docker (if you want to deploy your own docker image) Run any of our docs examples in Google Colab!
  22. 22. Get & Start a Colab You’ll need to add Colab from the Google Marketplace if you want to create your own notebooks (you can run our docs examples without this though!)
  23. 23. Create a new notebook Go to https://colab.research. google.com/
  24. 24. Google Colab setup We’re running on a GPU for this example, so we want to set our (VM) runtime environment to GPU
  25. 25. Show me the code Install some of our python dependencies Fork of a keras/tensorflow implementation of Stable Diffusion: The text-to-image library Drivers for NVIDIA GPUs Lib for progress bars Tensorflow library & add-ons, unicode fixer
  26. 26. text2image.py This is the basic text to image script. It uses a keras/tensorflow implementation fork and then generates the images from a given text string and finally displays the image generated. The ML weights are pre- calculated in the library
  27. 27. We can do better… stable-diffusion.py This script adds input parameters to our text2image script and saves the output images to a file
  28. 28. Docker build Dockerfile Build & Push Docker Image
  29. 29. Build with Bacalhau Stable Diffusion on GPU
  30. 30. Run on Bacalhau
  31. 31. Pizza for everyone!
  32. 32. Join the discussion: - Twitter @BacalhauProject - YouTube @bacalhauproject - Slack #bacalhau @filecoinproject - Github @bacalhau.org - Forum github.com/filecoin-project /bacalhau/discussions See more examples: - docs.bacalhau.org Get Involved in the future of data!
  33. 33. Alan Kay - Computer Scientist “The best way to predict the future is to create it”
  34. 34. Computable The Future of Filecoin is
  35. 35. You. The Future of Filecoin is

Notes de l'éditeur

  • G’day Dev’s and Fil-ders!!
    I’m Ally and I’m a Developer Relations Engineer working with the FIlecoin Foundation and Protocol Labs.
    And today I wanted to introduce you to a project we are very excited about and that will hopefully help democratise the future of data processing - Bacalhau.
    I’m going to be showing you a really cool example of how to build your own text to image code and then run it on Bacalhau, which for those that haven’t heard of it, is not just a portuguese fish, but a peer-to-peer open computation network!
  • G’day Dev’s and Fil-ders!!
    I’m Ally and I’m a Developer Relations Engineer working with the FIlecoin Foundation and Protocol Labs.
    And today I wanted to introduce you to a project we are very excited about and that will hopefully help democratise the future of data processing - Bacalhau.
    I’m going to be showing you a really cool example of how to build your own text to image code and then run it on Bacalhau, which for those that haven’t heard of it, is not just a portuguese fish, but a peer-to-peer open computation network!
  • Warning from https://docs.oakhost.net/tutorials/tensorflow-apple-silicon/
    Caveat on running the first example - this may not work for your machine - which is one of the exact reasons we have bacalhau for large data!! Options are… (Collab notebook, paid cloud environment for testing)
  • So here’s the timestamps for those that want to go directly to what they’re interested in also….

    First we’ll see this fully built model in action on bacalhau, then I’ll chat a little bit about what Bacalhau is, how it works and what advantages it can offer you.

    I’ll then give a brief breakdown on what a Stable Diffusion model is and how it fits into the Machine Learning world

    And then I’ll move on to walking through how you can create this example end-to-end and run it on the Bacalhau network.


  • You can find the example I’m going through today in the bacalahau docs - along with a host of other awesome examples you can try out for yourself.

    In this video we’re going to be building, testing and running machine learning code with a machine learning model called stable diffusion (more on that later), which will take any text you provide it and transform that text into a funky and original image. Pretty cool right! I was excited to do this video and see how it works.

  • So, before we get started, let’s take a sneak peak of what our final example looks like for all the visual learner’s out there :)

    And to clarify before we start - you don’t need any prior knowledge of Bacalhau or data science or any special developer environment or hardware to join in with me here either!

    (install bacalhau then run the docker image on bacalhau in Google Colab - explain it later though)

    We will be using python, though i’ll walk you through everything the code does, so if you have any sort of coding background, you’ll be more than capable of doing this yourself!
    The main point here is “If you can write Python, Go, Javascript, R in any language and want to use ANY type of data, then Bacalhau is for you.”

    And…. even if you don’t…. If you can open the terminal on your computer and copy paste 2 lines of code into it - one to install Bacalhau and the other to run this example that’s already been ‘uploaded’ to the network…. (image should be loaded by now)... well then you can go ahead and use this on your machine not even worry about the rest of this tutorial! Though I hope you stick around to learn how to make more fun images like this one and perhaps gets some inspiration for how you’d go about building your own data projects on Bacalhau (and show off to us!!)


  • Building and testing machine learning models can be a tricky business and this is mostly because of the compute power you need to train and run them.
    Like most development, you need a few things to get started …

    If you’ve never built one before - this is the section for you!
  • - like knowing a programming language for writing and running your machine learning code and setting up a good developer environment for the task you’re looking to do.

    In fact when I asked fellow ML model ChatGPT what I needed to get started with machine learning .. it told me i needed
    A programming language for writing and running your machine learning code (Ok I know some python - check!)
    A machine learning framework or library that provides pre-built algorithms, tools, and other resources for building and training machine learning models. (Technically it’s not ESSENTIAL - you could build your own model implementation, but hey that’s hard and would be like building your own sort function, and luckily that’s not necessary - there are several open source libraries out there like Tensorflow which we’ll use here, as well as pyTorch and scikit-learn, which you could also play around with using - we’d love to see examples on these to include in a community cookbook if you do make one!)
    A data management and analysis tool for managing and working with the data that you will use to train and evaluate your machine learning model. This includes spreadsheet programs like Microsoft Excel, specialised data analysis tools like Pandas or NumPy. ( In this case, as we are just using a tensorflow implementation, the model has been pre-trained for us - so we won’t need to do any management or analysis of data to create it)
    (FYI - check out the landscape section in our docs for a comparison on the compute landscape too!)
    A development environment or integrated development environment (IDE) for writing and managing your machine learning code. This could be a simple text editor like VS Code, or a more advanced IDE like PyCharm or Jupyter Notebook. (I’m going to use Google Colab here alongside VS Code editor, as unfortunately Google Colab does not support docker images - which we’ll need after testing our python code)
    AND finally……
    A computing platform or cloud service for running your machine learning code and training your model. This could be your local machine, a dedicated server, or a cloud computing platform

    And this last one is where things get complicated - even if you are familiar with all the other items on this list…because: Machine learning models chew up A LOT of computing power and can take a very long time to run - and if you thought compiling a large code set or waiting for an ethereum transaction to be processed in a block was time consuming….. Well… machine learning model processing is what ping pong tables in the office were really made for.

  • Using your local machine for small examples is possible - in fact I did manage to get this particular example working on my (very unhappy about it) Mac M1, however once you start doing bigger data processing, you are going to need more gas (eth analogy intended) and if you don’t have a dedicated server lying around the house, you’re going to need to use a virtual machine on a cloud computing platform and not only is that inefficient - due to the data being an unknown distance from the computation machine, but it can also get costly fast.

    Luckily though, these problems are some of the issues Bacalhau is trying to solve. Making data processing and computation open and available to everyone and speeding up the processing times is possible in Bacalhau, firstly - by using batch processing across multiple nodes and secondly by putting the processing nodes where the data lives.
  • As I mentioned a bit earlier - Bacalhau is a decentralised computation network which provides a platform for public, transparent and optionally verifiable computation.

    It was originally conceived to bring useful compute resources to data stored on the IPFS & Filecoin network - bringing the same benefits of open collaboration on datasets stored in IPFS and Filecoin to generic compute tasks.

    I recommend this video by lead David Aronchick if you want to hear more too - check it out on the BacalhauProject youtube.

  • And yes - for those of you that are following the Filecoin starmap - it will go hand-in-hand with the Filecoin Virtual Machine - Filecoin’s EVM-compatible layer one, as while FVM can offer programmable data on small amounts of state - like most on-chain computation, Bacalhau provides you with compute over that data or any data, and that includes big data, with support for GPUs - and in the not too distant future - you should even be able to leverage it by calling Bacalhau in your smart contracts - giving you the ability to interact directly with data stored on the filecoin blockchain - a big win for developer experience and users! If you’re interested in this keep an eye on Project Frog… a POC the team is working on now.


    https://pl-strflt.notion.site/Project-Frog-FVM-Stable-Diffusion-Demo-6cb6c2f5c5614394a5468a5253b6c812 (need QR)
  • So, how does Bacalhau work?

    As I mentioned, Bacalhau is a peer-to-peer network of nodes that enable users to run Docker containers or Web Assembly images as tasks against data that is stored in IPFS (the interplanetary file system), providing a platform for public, transparent, and optionally verifiable computation - known as compute over data or COD for short - which fun fact - is where Bacalhau’s name comes from - as Bacalhau is Portugese for cod.

    =======

    Bacalhau provides a platform for public, transparent, and optionally verifiable computation. It enables users to run arbitrary Docker containers and WebAssembly (wasm) images as tasks against data stored in the InterPlanetary File System (IPFS). This architecture is also referred to as Compute Over Data (or CoD).
    Bacalhau operates as a peer-to-peer network of nodes where each node has both a requestor and compute component. To interact with the cluster, Bacalhau CLI requests are sent to a node in the cluster (via JSON over HTTP), which then broadcasts messages over the transport layer to other nodes in the cluster. All other nodes in the network are connected to the transport layer and as such have a shared view of the world.
    Architecture
    Transport layer (interface)
    Requester node (component)
    Compute node (component)
    Executer (interface)
    Storage Provider (interface)
    Verifier (interface)
    Publisher (interface)
    Job Lifecycle
    Job Submission
    Job Acceptance
    Job Execution
    Verification
    Publishing
    Networking
    Input/Output volumes
  • Each node in the Bacalhau network has both a requestor and compute component. To interact with the cluster, the Bacalhau CLI requests are sent to a node in the cluster (via JSON over HTTP), which then broadcasts messages over the transport layer to other nodes in the cluster. All other nodes in the network are connected to the transport layer and so have a shared view of the world.

    Architecture
    Transport layer (interface)
    Requester node (component)
    Compute node (component)
    Executer (interface)
    Storage Provider (interface)
    Verifier (interface)
    Publisher (interface)

  • This means that when a job is submitted to Bacalhau it is forwarded to a Bacalhau cluster node which acts as the requestor node.
    This requestor node broadcasts the job to the other nodes in the peer-to-peer network who can bid on the the job - creating a job deal market.

    This job deal also has a concurrency flag - meaning you can set the number of nodes you want to perform this job concurrently. The job also includes a confidence property - which defines how many verification proposals must agree for the job to be deemed successful and a min-bid property which defines the number of bids that must have been made before choosing to accept any.

    Depending on the flags given to the requestor node (which can include concurrency, confidence, minimum-bids before acceptance, reputation, locality, cost, hardware resources and even volumes (such as IPFS CIDs), the requestor node accepts one or more matching job bids, and the accepted bids are then executed by the relevant compute nodes using the storage providers that executor node has mapped in - for example the docker executor and IPFS storage volumes.

    Once the job is complete, a verification will be generated which, if accepted, leads to the raw results folder being published to the compute node. (default is estuary). There is a lot more flexibility to this process but the main thing to understand is that Bacalhau gives you - the user, the ability to execute a job where the data is already hosted, across a decentralised network of servers that store data, enabling you to save time, money and operational overheads and also provides referenceable and reproducible jobs that are easy to manage and maintain.

    Phew, now that we understand what’s going on under the hood, let’s take a quick look at what stable diffusion is before we dive into the code here!

  • Essentially, machine learning is a subset of AI focused on having computers provide insights into problems without explicitly programming them.

    There are three main types of machine learning—supervised learning, unsupervised learning, and reinforcement learning.

    Deep learning, which is the category stable diffusion falls under, is a subset of machine learning application that teaches itself to perform a specific task - in this case converting a text input to an image output.


  • And Stable Diffusion is the particular model used currently for doing this text-to-image processing (and is the same model Dall-E uses).
    It is based on a diffusion probabilistic model that uses a transformer to generate images from text. In this example we’ll be using a pre-trained model in tensorflow - google’s open source machine learning library.

    Now, you don’t really need to worry about the ins-and-outs of how stable diffusion works, unless, like me, you’re curious, and if so - I encourage you to dig in further - there’s lot’s of resources around to explain it!

    All you really need to know here, though is that you can create your own text to image processor to run on the Bacalhau Network and you don’t need a data science degree or any special skills to do it, in fact im hoping you’ll be inspired to make your own models and projects with it!
    This example aims to show how easy it is to use stable diffusion on a GPU with the Bacalhau Network.
    So let’s get on to analysing this data! Show me the code!!

  • Yay the coding part!
  • I’ll be using Google Colab to go through this example. For those that may not have come across it before, Google Colaboratory allows you to write, execute and share computation files - like our python and bash scripts, and runs in your browser by executing the code on a private virtual machine which you can configure. It’s based on the open source Jupyter notebook which is used extensively in data science fields and stores any notebooks you make in your google drive or you can load from github. And it’s free tier works great for us here!

    By the way you can run any of the examples on the bacalhau docs in Google Colab too!

    [demo on setup]
  • If you want to follow along with me here - go ahead and set up google colab for yourself.
    Alternatively you can just open the shared colab from the docs site without the need to install.
  • This is the first screen you’ll see if you open the colab url.
    You can create a new notebook for this example there
  • Since this example uses a GPU based environment, we’ll just switch out our runtime environment from the runtime menu to run on a GPU

    We don’t need a premium GPU for this one :)
  • Alrighty - so awesome! You have a fresh notebook. Let’s get started! curl -sL https://get.bacalhau.org/install.sh | bash
    bacalhau version


    pip install git+https://github.com/fchollet/stable-diffusion-tensorflow --upgrade --quiet
    pip install tensorflow tensorflow_addons ftfy --upgrade --quiet
    pip install tqdm
    apt install --allow-change-held-packages libcudnn8=8.1.0.77-1+cuda11.2


  • Alrighty - so awesome! You have a fresh notebook. Let’s get started!
    This is the first script.




    pip install git+https://github.com/fchollet/stable-diffusion-tensorflow --upgrade --quiet
    pip install tensorflow tensorflow_addons ftfy --upgrade --quiet
    pip install tqdm
    apt install --allow-change-held-packages libcudnn8=8.1.0.77-1+cuda11.2


  • Yay the coding part!
  • And i think these look better than patrick collins’ pizza ;P
  • Insert an openlinks QR code here
  • bachalhau Each section should contain:
    What is this project
    Why / how does it improve filecoin access / storage / usability etc.
    WHEN / Latest projects
    Where are the building gaps
    CALL Any call to Action (why does this involve you - may be as above)
  • bachalhau Each section should contain:
    What is this project
    Why / how does it improve filecoin access / storage / usability etc.
    WHEN / Latest projects
    Where are the building gaps
    CALL Any call to Action (why does this involve you - may be as above)
  • First we’ll download and take a look at just one of the IPFS files locally… (there is far more data than this in the overall collection though - this represents only about one chunk - or 100,000 blocks of eth data)

    This code simply gets the iPFS tar file through a http gateway and un-tars it (decompresses)

    wget -q -O file.tar.gz https://w3s.link/ipfs/bafybeifgqjvmzbtz427bne7af5tbndmvniabaex77us6l637gqtb2iwlwq
    tar -xvf file.tar.gz
  • output_850000
  • We’ll use pandas to create some columns for this data and plot it with matplot.
    Pandas is an open source data analysis tool built on top of the Python programming language.
    We’re using it here to clean up the ethereum data from the csv file found in our output directory.


    We can either run this directly from a python3 terminal instance, or by creating a script in our working directory and running that.

    import pandas as pd
    import glob
    import matplotlib.pyplot as plt

    file = glob.glob('output_*/transactions/start_block=*/end_block=*/transactions*.csv')[0]
    print("Loading file %s" % file)
    df = pd.read_csv(file)
    df['value'] = df['value'].astype('float')
    df['from_address'] = df['from_address'].astype('string')
    df['to_address'] = df['to_address'].astype('string')
    df['hash'] = df['hash'].astype('string')
    df['block_hash'] = df['block_hash'].astype('string')
    df['block_datetime'] = pd.to_datetime(df['block_timestamp'], unit='s')
    df.info()

    df[['block_datetime', 'value']].groupby(pd.Grouper(key='block_datetime', freq='1D')).sum().plot()
    plt.show()
  • This is cool - but! the code here only inspects the daily trading volume of Ethereum for a single chunk (100,000 blocks) of data.

    We can do better - We can use the Bacalhau client to download the data from IPFS and then run the analysis on the data in the cloud. This means that we can analyse the entire Ethereum blockchain without having to download it locally.

  • To run jobs on the Bacalhau network you need to package your code. In this example, the code is packaged as a Docker image. Don’t worry though, you don’t need to go off and learn Docker, this data has already been dockerised and uploaded as an image for you to use as you see fit!
    So, let’s instead develop the code that will perform the analysis. The code here is a simple script to parse the incoming data and produce a CSV file with the daily trading volume of Ethereum.
  • Read slide then..

    Let’s try it out!

  • There’s no need to do this - as the image already exists on Docker - but in case you want to….!
  • Bacalhau is a distributed computing platform that allows you to run jobs on a network of computers. It is designed to be easy to use and to run on a variety of hardware.
    To submit a job, you can use the Bacalhau CLI.
    The following command will run the container above on the IPFS data -- the long hash -- shown at the start of this notebook. Let's confirm that the results are as expected.
  • Look at docs if you want to understand more on what this means

    If you’re familiar with docker, you’ll notice some of these commands have an overlap that perform the same function.

    Inspect:
    The docker run command used the outputs volume as a results folder so when we download them they will be stored in a folder within volumes/outputs.
    Let’s check this out in VS Code and see it happen in real time.
  • We can re-plot these results to see if they are the same as we got locally

    And they are!

    But… we could do that locally!! Why bother using Bacalhau???

    Well… what about the rest of the ethereum data?? We want a full picture not just a snapshot!
  • We can run the same analysis on the entire Ethereum blockchain (up to the point where I have uploaded the Ethereum data). To do this, we need to run the analysis on each of the chunks of data that we have stored on IPFS. We can do this by running the same job on each of the chunks.

    Let’s see this in action in VS code so we can see what’s going on with the files…

    We’ll need to wait for all our jobs to complete (bacalhau list -n 50)
    Then we’ll download all the results and merge them into a single directory.
  • We’ll need to wait for all our jobs to complete (bacalhau list -n 50)


    Then we’ll download all the results and merge them into a single directory.
  • So let’s get on to analysing this data! Show me the code!!

  • So let’s get on to analysing this data! Show me the code!!

  • So let’s get on to analysing this data! Show me the code!!

  • So let’s get on to analysing this data! Show me the code!!

  • Skip ahead if you just want me to show you the code!
  • Let’s get to the example! https://bit.ly/bacalhaueth

    Here I’m going to run through this ethereum data analysis with bacalhau.

    You don’t need any prior knowledge of bacalhau or data science to join in with me here either!
  • Installing bacalhau
  • Installing bacalhau

×