TensorFrames: Google Tensorflow on Apache Spark

14 095 vues

Publié le

Presentation at Bay Area Spark Meetup by Databricks Software Engineer and Spark committer Tim Hunter.

This presentation covers how you can use TensorFrames with Tensorflow to distributed computing on GPU.

Publié dans : Logiciels
3 commentaires
44 j’aime
Statistiques
Remarques
Aucun téléchargement
Vues
Nombre de vues
14 095
Sur SlideShare
0
Issues des intégrations
0
Intégrations
368
Actions
Partages
0
Téléchargements
810
Commentaires
3
J’aime
44
Intégrations 0
Aucune incorporation

Aucune remarque pour cette diapositive
  • Explain that TensorFlow is a library for deep learning

  • list a few algorithms: deep learning, clustering, classification, etc.
    business logic and analysis more concerned usually with complex structures: text, lists, associations like dictionaries
    The bread and butter of data science can be told in 3 words: integers, floats and doubles.
    Slicing and dicing data: matrices, vectors, reals
  • not everybody is a fortran or C++ programmer.
    There is considerable friction in writing optimized algorithms.
    How can we lower the barrier?
  • scale up or scale
    The Holy Grail:a large number of specialized processors
    you have 2 options: better computers or more computers
  • For all these configurations of hardware, there are even more frameworks and libraries to access them, and each of them has strengths and weaknesses
    the classics for single machine use
    the distributed frameworks: Spark, Mahout, MapReduce
    the libraries to access specialized hardware: CUDA and OpenCL for parallel programming
    in the middle, MPI: it is hard to program and it is not very resilient to hardware failures

    Then frameworks built on top of these in the recent years for deep learning and computer vision
    The trend is to have multiple graphic cards communicate
  • MLlib has KDE, but how about making it work for other data types like floats, or other kernels?
  • my phd adviser used to tell me that you always have to include one equation to show that you mean serious business
  • do not talk about UDF, simply say you can wrap scala function inside the SQL engine
    UDF: it is a scala function and you can run it inside a SQL query


  • start from login,homepage
    disable debug menu
    go more slowly for demo
  • TensorFrames: Google Tensorflow on Apache Spark

    1. 1. TensorFrames: Google Tensorflow on Apache Spark Tim Hunter Meetup 08/2016 - Salesforce
    2. 2. How familiar are you with Spark? 1. What is Apache Spark? 2. I have used Spark 3. I am using Spark in production or I contribute to its development 2
    3. 3. How familiar are you with TensorFlow? 1. What is TensorFlow? 2. I have heard about it 3. I am training my own neural networks 3
    4. 4. Founded by the team who created Apache Spark Offers a hosted service: - Apache Spark in the cloud - Notebooks - Cluster management - Production environment About Databricks 4
    5. 5. Software engineer at Databricks Apache Spark contributor Ph.D. UC Berkeley in Machine Learning (and Spark user since Spark 0.5) About me 5
    6. 6. Outline • Numerical computing with Apache Spark • Using GPUs with Spark and TensorFlow • Performance details • The future 6
    7. 7. Numerical computing for Data Science • Queries are data-heavy • However algorithms are computation-heavy • They operate on simple data types: integers, floats, doubles, vectors, matrices 7
    8. 8. The case for speed • Numerical bottlenecks are good targets for optimization • Let data scientists get faster results • Faster turnaround for experimentations • How can we run these numerical algorithms faster? 8
    9. 9. Evolution of computing power 9 Failure is not an option: it is a fact When you can afford your dedicated chip GPGPU Scale out Scaleup
    10. 10. Evolution of computing power 10 NLTK Theano Today’s talk: Spark + TensorFlow
    11. 11. Evolution of computing power • Processor speed cannot keep up with memory and network improvements • Access to the processor is the new bottleneck • Project Tungsten in Spark: leverage the processor’s heuristics for executing code and fetching memory • Does not account for the fact that the problem is numerical 11
    12. 12. Asynchronous vs. synchronous • Asynchronous algorithms perform updates concurrently • Spark is synchronous model, deep learning frameworks usually asynchronous • A large number of ML computations are synchronous • Even deep learning may benefit from synchronous updates 12
    13. 13. Outline • Numerical computing with Apache Spark • Using GPUs with Spark and TensorFlow • Performance details • The future 13
    14. 14. GPGPUs 14 • Graphics Processing Units for General Purpose computations 6000 Theoretical peak throughput GPU CPU Theoretical peak bandwidth GPU CPU
    15. 15. • Library for writing “machine intelligence” algorithms • Very popular for deep learning and neural networks • Can also be used for general purpose numerical computations • Interface in C++ and Python 15 Google TensorFlow
    16. 16. Numerical dataflow with Tensorflow 16 x = tf.placeholder(tf.int32, name=“x”) y = tf.placeholder(tf.int32, name=“y”) output = tf.add(x, 3 * y, name=“z”) session = tf.Session() output_value = session.run(output, {x: 3, y: 5}) x: int32 y: int32 mul 3 z
    17. 17. Numerical dataflow with Spark df = sqlContext.createDataFrame(…) x = tf.placeholder(tf.int32, name=“x”) y = tf.placeholder(tf.int32, name=“y”) output = tf.add(x, 3 * y, name=“z”) output_df = tfs.map_rows(output, df) output_df.collect() df: DataFrame[x: int, y: int] output_df: DataFrame[x: int, y: int, z: int] x: int32 y: int32 mul 3 z
    18. 18. Demo 18
    19. 19. Outline • Numerical computing with Apache Spark • Using GPUs with Spark and TensorFlow • Performance details • The future 19
    20. 20. 20 It is a communication problem Spark worker process Worker python process C++ buffer Python pickle Tungsten binary format Python pickle Java object
    21. 21. 21 TensorFrames: native embedding of TensorFlow Spark worker process C++ buffer Tungsten binary format Java object
    22. 22. • Estimation of distribution from samples • Non-parametric • Unknown bandwidth parameter • Can be evaluated with goodness of fit An example: kernel density scoring 22
    23. 23. • In practice, compute: with: • In a nutshell: a complex numerical function An example: kernel density scoring 23
    24. 24. 24 Speedup 0 60 120 180 Scala UDF Scala UDF (optimized) TensorFrames TensorFrames + GPU Runtime(sec) def score(x: Double): Double = { val dis = points.map { z_k => - (x - z_k) * (x - z_k) / ( 2 * b * b) } val minDis = dis.min val exps = dis.map(d => math.exp(d - minDis)) minDis - math.log(b * N) + math.log(exps.sum) } val scoreUDF = sqlContext.udf.register("scoreUDF", score _) sql("select sum(scoreUDF(sample)) from samples").collect()
    25. 25. 25 Speedup 0 60 120 180 Scala UDF Scala UDF (optimized) TensorFrames TensorFrames + GPU Runtime(sec) def score(x: Double): Double = { val dis = new Array[Double](N) var idx = 0 while(idx < N) { val z_k = points(idx) dis(idx) = - (x - z_k) * (x - z_k) / ( 2 * b * b) idx += 1 } val minDis = dis.min var expSum = 0.0 idx = 0 while(idx < N) { expSum += math.exp(dis(idx) - minDis) idx += 1 } minDis - math.log(b * N) + math.log(expSum) } val scoreUDF = sqlContext.udf.register("scoreUDF", score _) sql("select sum(scoreUDF(sample)) from samples").collect()
    26. 26. 26 Speedup 0 60 120 180 Scala UDF Scala UDF (optimized) TensorFrames TensorFrames + GPU Runtime(sec) def cost_fun(block, bandwidth): distances = - square(constant(X) - sample) / (2 * b * b) m = reduce_max(distances, 0) x = log(reduce_sum(exp(distances - m), 0)) return identity(x + m - log(b * N), name="score”) sample = tfs.block(df, "sample") score = cost_fun(sample, bandwidth=0.5) df.agg(sum(tfs.map_blocks(score, df))).collect()
    27. 27. 27 Speedup 0 60 120 180 Scala UDF Scala UDF (optimized) TensorFrames TensorFrames + GPU Runtime(sec) def cost_fun(block, bandwidth): distances = - square(constant(X) - sample) / (2 * b * b) m = reduce_max(distances, 0) x = log(reduce_sum(exp(distances - m), 0)) return identity(x + m - log(b * N), name="score”) with device("/gpu"): sample = tfs.block(df, "sample") score = cost_fun(sample, bandwidth=0.5) df.agg(sum(tfs.map_blocks(score, df))).collect()
    28. 28. Demo: Deep dreams 28
    29. 29. Demo: Deep dreams 29
    30. 30. Outline • Numerical computing with Apache Spark • Using GPUs with Spark and TensorFlow • Performance details • The future 30
    31. 31. 31 Improving communication Spark worker process C++ buffer Tungsten binary format Java object Direct memory copy Columnar storage
    32. 32. The future • Integration with Tungsten: • Direct memory copy • Columnar storage • Better integration with MLlib data types • GPU instances in Databricks: Official support coming this fall 32
    33. 33. Recap • Spark: an efficient framework for running computations on thousands of computers • TensorFlow: high-performance numerical framework • Get the best of both with TensorFrames: • Simple API for distributed numerical computing • Can leverage the hardware of the cluster 33
    34. 34. Try these demos yourself • TensorFrames source code and documentation: github.com/databricks/tensorframes spark-packages.org/package/databricks/tensorframes • Demo notebooks available on Databricks • The official TensorFlow website: www.tensorflow.org 34
    35. 35. Spark Summit EU 2016 15% Discount Code: DatabricksEU16 35
    36. 36. Thank you.

    ×