SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Genomic Analysis in Scala
Scala/Splash 2017
October 22, 2017
Ryan Williams
1 / 17
Overview
Intro
Genomic applications
General Scala libraries
Design-pattern deep-dive
"fun" with implicits
Slides: hammerlab.org/splash-2017
Everything discussed in this talk open source / Apache 2.0
2 / 17
Hammer Lab
Mt. Sinai School of Medicine, NYC
Research
Personal Genome Vaccine pipeline / clinical trial
Checkpoint blockade biomarkers, mutational signatures
http://www.hammerlab.org/
Tools
Genome biofx using Spark + Scala
Biofx workflows and tools in OCaml
The usual suspects: python, R, …
 
3 / 17
coverage-depth
4 / 17
  
coverage-depth
 
5 / 17
spark-bam
Splitting genomic BAM files
6 / 17
Related Libraries
Non-genomics-specific
Maybe you want to use them
7 / 17
magic-rdds
Collection-operations implemented for Spark RDDs
scans
{left,right}
{elements, values of tuples}
.runLengthEncode, group consecutive elements by predicate / Ordering
.reverse
reductions: .maxByKey, .minByKey
sliding/windowed traversals
.size - smart count
multiple counts in one job:
val (count1, count2) = (rdd1, rdd2).size
smart partition-tracking: reuse counts for UnionRDDs
zips
lazy partition-count, eager partition-number check
sameElements, equals
group/sample by key: first elems or reservoir-sampled
HyperGeometric distribution handling Longs: hammerlab/math-utils
8 / 17
hammerlab/iterators
scans (in terms of cats.Monoid)
sliding/windowed traversals
eager drops/takes
by number
while
until
sorted/range zips
SimpleBufferedIterator
iterator in terms of _advance(): Option[T]
hasNext lazily buffers/caches head
etc.
9 / 17
args4j case-app
statically-checked/typed handlers
implicit resolution
inheritance vs. composition
mutable vs. immutable
case-app positional-arg support: #58
spark-commands: command-line interfaces
class Opts {
@args4j.Option(
name = "--in-path",
aliases = Array("-i"),
handler = classOf[PathOptionHandler],
usage = "Input path to read from"
)
var inPath: Option[Path] = None
@args4j.Option(
name = "--out-path",
aliases = Array("-o"),
handler = classOf[PathOptionHandler],
usage = "Output path to write to"
)
var outPath: Option[Path] = None
@args4j.Option(
name = "--overwrite",
aliases = Array("-f"),
usage = "Whether to overwrite an existing ou
)
var overwrite: Boolean = false
}
case class Opts(
@Opt("-i")
@Msg("Input path to read from")
inPath: Option[Path] = None,
@Opt("-o")
@Msg("Output path to write to")
outPath: Option[Path] = None,
@Opt("-f")
@Msg("Whether to overwrite an existing output
overwrite: Boolean = false
)
10 / 17
Design Patterns
Down the typelevel / implicit rabbit-hole
11 / 17
Deep case-class hierarchy:
case class A(n: Int)
case class B(s: String)
case class C(a: A, b: B)
case class D(b: Boolean)
case class E(c: C, d: D, a: A, a2: A)
case class F(e: E)
Instances:
val a = A(123)
val b = B("abc")
val c = C(a, b)
val d = D(true)
val e = E(c, d, A(456), A(789))
val f = F(e)
Pull out fields by type and/or name:
f.find('c) // f.e.c
f.findT[C] // f.e.c
f.field[C]('c) // f.e.c
f.field[A]('a2) // f.e.a2
f.field[B]('b) // f.e.c.b
As evidence parameters:
def findAandB[T](t: T)(
implicit
findA: Find[T, A],
findB: Find[T, B]
): (A, B) =
(findA(t), findB(t))
shapeless-utils
"recursive structural types"
12 / 17
Nesting/Mixing implicit contexts
Minimal boilerplate Spark CLI apps:
input Path
output Path (or: just a PrintStream)
SparkContext
select Broadcast variables
other argument-input objects
How to make all of these things implicitly available with minimal boilerplate?
13 / 17
def app1() = {
// call methods that want implicit 
// input Path, SparkContext
}
def app2() = {
// call methods that want implicit 
// Path, SparkContext, PrintStream
}
Nesting/Mixing implicit contexts
Minimal boilerplate Spark CLI apps:
input Path
output Path (or: just a PrintStream)
SparkContext
select Broadcast variables
other argument-input objects
How to make all of these things implicitly available with minimal boilerplate?
Ideally:
13 / 17
def run(
implicit
inPath: Path,
printStream: PrintStream,
sc: SparkContext,
ranges: Broadcast[Ranges],
…
): Unit = {
// do thing
}
case class Context(
inPath: Path,
printStream: PrintStream,
sc: SparkContext,
ranges: Broadcast[Ranges],
…
)
def run(implicit ctx: Context): Unit = {
implicit val Context(
inPath, printStream, sc, ranges, …
) = ctx
// do thing
}
Nesting/Mixing implicit contexts
Minimal boilerplate Spark CLI apps:
input Path
output Path (or: just a PrintStream)
SparkContext
select Broadcast variables
other argument-input objects
How to make all of these things implicitly available with minimal boilerplate?
14 / 17
trait HasInputPath { self: HasArgs ⇒
implicit val inPath = Path(args(0))
}
trait HasOutputPath { self: HasArgs ⇒
val outPath = Path(args(1))
}
class MinimalApp(args: Array[String])
extends HasArgs(args)
with HasInputPath
with HasPrintStream
with HasSparkContext
object Main {
def main(args: Array[String]): Unit =
new MinimalApp(args) {
// all the implicits!
}
}
}
Nesting/Mixing implicit contexts
How to make many implicits available with minimal boilerplate?   ≈
trait HasSparkContext {
implicit val sc: SparkContext = new SparkContext(…)
}
abstract class HasArgs(args: Array[String])
trait HasPrintStream extends HasOutputPath { self: Args ⇒
implicit val printStream = new PrintStream(newOutputStream(outPath))
}
15 / 17
That comes from a data structure like:
case class Result(
numPositions : Long,
compressedSize : Bytes,
compressionRatio : Double,
numReads : Long,
numFalsePositives: Long,
numFalseNegatives: Long
)
or better yet:
case class Result(
numPositions : NumPositions,
compressedSize : CompressedSize,
compressionRatio: CompressionRatio,
numReads : NumReads,
falseCounts : FalseCounts
)
{to,from}String: invertible syntax
Miscellaneous tools output "reports":
466202931615 uncompressed positions
156G compressed
Compression ratio: 2.78
1236499892 reads
22489 false positives, 0 false negatives
This is basically toString the Show type-class
twist: downstream tools want to parse these reports
want to re-hydrate Result instances
implicit val _iso: Iso[FalseCounts] =
iso"${'numFPs} false positives, ${'numFNs} false negatives" }
16 / 17
Thanks!
17 / 17

Contenu connexe

Tendances

system software
system software system software
system software randhirlpu
 
Beginning Scala Svcc 2009
Beginning Scala Svcc 2009Beginning Scala Svcc 2009
Beginning Scala Svcc 2009David Pollak
 
Flavour of meta-programming with shapeless
Flavour of meta-programming with shapelessFlavour of meta-programming with shapeless
Flavour of meta-programming with shapelessArthur Kushka
 
ADVANCED FEATURES OF C++
ADVANCED FEATURES OF C++ADVANCED FEATURES OF C++
ADVANCED FEATURES OF C++NITHYA KUMAR
 
Streams or Loops? Java 8 Stream API by Niki Petkov - Proxiad Bulgaria
Streams or Loops? Java 8 Stream API by Niki Petkov - Proxiad BulgariaStreams or Loops? Java 8 Stream API by Niki Petkov - Proxiad Bulgaria
Streams or Loops? Java 8 Stream API by Niki Petkov - Proxiad BulgariaHackBulgaria
 
SPARQL in a nutshell
SPARQL in a nutshellSPARQL in a nutshell
SPARQL in a nutshellFabien Gandon
 
TMPA-2017: Functional Parser of Markdown Language Based on Monad Combining an...
TMPA-2017: Functional Parser of Markdown Language Based on Monad Combining an...TMPA-2017: Functional Parser of Markdown Language Based on Monad Combining an...
TMPA-2017: Functional Parser of Markdown Language Based on Monad Combining an...Iosif Itkin
 
Intentional Programming
Intentional ProgrammingIntentional Programming
Intentional Programminggiapmaster
 
Lecture 4 - Comm Lab: Web @ ITP
Lecture 4 - Comm Lab: Web @ ITPLecture 4 - Comm Lab: Web @ ITP
Lecture 4 - Comm Lab: Web @ ITPyucefmerhi
 
Functional Programming Fundamentals
Functional Programming FundamentalsFunctional Programming Fundamentals
Functional Programming FundamentalsShahriar Hyder
 
OCP Java SE 8 Exam - Sample Questions - Lambda Expressions
OCP Java SE 8 Exam - Sample Questions - Lambda Expressions OCP Java SE 8 Exam - Sample Questions - Lambda Expressions
OCP Java SE 8 Exam - Sample Questions - Lambda Expressions Ganesh Samarthyam
 
Seductions of Scala
Seductions of ScalaSeductions of Scala
Seductions of ScalaDean Wampler
 

Tendances (18)

system software
system software system software
system software
 
Beginning Scala Svcc 2009
Beginning Scala Svcc 2009Beginning Scala Svcc 2009
Beginning Scala Svcc 2009
 
Flavour of meta-programming with shapeless
Flavour of meta-programming with shapelessFlavour of meta-programming with shapeless
Flavour of meta-programming with shapeless
 
LEX & YACC TOOL
LEX & YACC TOOLLEX & YACC TOOL
LEX & YACC TOOL
 
ADVANCED FEATURES OF C++
ADVANCED FEATURES OF C++ADVANCED FEATURES OF C++
ADVANCED FEATURES OF C++
 
Lexyacc
LexyaccLexyacc
Lexyacc
 
Streams or Loops? Java 8 Stream API by Niki Petkov - Proxiad Bulgaria
Streams or Loops? Java 8 Stream API by Niki Petkov - Proxiad BulgariaStreams or Loops? Java 8 Stream API by Niki Petkov - Proxiad Bulgaria
Streams or Loops? Java 8 Stream API by Niki Petkov - Proxiad Bulgaria
 
Ch4c
Ch4cCh4c
Ch4c
 
SPARQL in a nutshell
SPARQL in a nutshellSPARQL in a nutshell
SPARQL in a nutshell
 
TMPA-2017: Functional Parser of Markdown Language Based on Monad Combining an...
TMPA-2017: Functional Parser of Markdown Language Based on Monad Combining an...TMPA-2017: Functional Parser of Markdown Language Based on Monad Combining an...
TMPA-2017: Functional Parser of Markdown Language Based on Monad Combining an...
 
Intentional Programming
Intentional ProgrammingIntentional Programming
Intentional Programming
 
C# Today and Tomorrow
C# Today and TomorrowC# Today and Tomorrow
C# Today and Tomorrow
 
Functional programming
Functional programmingFunctional programming
Functional programming
 
Lecture 4 - Comm Lab: Web @ ITP
Lecture 4 - Comm Lab: Web @ ITPLecture 4 - Comm Lab: Web @ ITP
Lecture 4 - Comm Lab: Web @ ITP
 
C++ 11
C++ 11C++ 11
C++ 11
 
Functional Programming Fundamentals
Functional Programming FundamentalsFunctional Programming Fundamentals
Functional Programming Fundamentals
 
OCP Java SE 8 Exam - Sample Questions - Lambda Expressions
OCP Java SE 8 Exam - Sample Questions - Lambda Expressions OCP Java SE 8 Exam - Sample Questions - Lambda Expressions
OCP Java SE 8 Exam - Sample Questions - Lambda Expressions
 
Seductions of Scala
Seductions of ScalaSeductions of Scala
Seductions of Scala
 

Similaire à Genomic Analysis in Scala

From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...
From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...
From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...Databricks
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibJen Aman
 
Introduction To Scala
Introduction To ScalaIntroduction To Scala
Introduction To ScalaPeter Maas
 
The Scala Programming Language
The Scala Programming LanguageThe Scala Programming Language
The Scala Programming Languageleague
 
So various polymorphism in Scala
So various polymorphism in ScalaSo various polymorphism in Scala
So various polymorphism in Scalab0ris_1
 
typemap in Perl/XS
typemap in Perl/XS  typemap in Perl/XS
typemap in Perl/XS charsbar
 
Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Martin Odersky
 
Static types on javascript?! Type checking approaches to ensure healthy appli...
Static types on javascript?! Type checking approaches to ensure healthy appli...Static types on javascript?! Type checking approaches to ensure healthy appli...
Static types on javascript?! Type checking approaches to ensure healthy appli...Arthur Puthin
 
Scala for Java Programmers
Scala for Java ProgrammersScala for Java Programmers
Scala for Java ProgrammersEric Pederson
 
Scala - en bedre og mere effektiv Java?
Scala - en bedre og mere effektiv Java?Scala - en bedre og mere effektiv Java?
Scala - en bedre og mere effektiv Java?Jesper Kamstrup Linnet
 
Scala uma poderosa linguagem para a jvm
Scala   uma poderosa linguagem para a jvmScala   uma poderosa linguagem para a jvm
Scala uma poderosa linguagem para a jvmIsaias Barroso
 
Let's build a parser!
Let's build a parser!Let's build a parser!
Let's build a parser!Boy Baukema
 
A Sceptical Guide to Functional Programming
A Sceptical Guide to Functional ProgrammingA Sceptical Guide to Functional Programming
A Sceptical Guide to Functional ProgrammingGarth Gilmour
 

Similaire à Genomic Analysis in Scala (20)

From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...
From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...
From HelloWorld to Configurable and Reusable Apache Spark Applications in Sca...
 
Scala in Places API
Scala in Places APIScala in Places API
Scala in Places API
 
Elasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlibElasticsearch And Apache Lucene For Apache Spark And MLlib
Elasticsearch And Apache Lucene For Apache Spark And MLlib
 
SPARQL Cheat Sheet
SPARQL Cheat SheetSPARQL Cheat Sheet
SPARQL Cheat Sheet
 
Introduction To Scala
Introduction To ScalaIntroduction To Scala
Introduction To Scala
 
The Scala Programming Language
The Scala Programming LanguageThe Scala Programming Language
The Scala Programming Language
 
So various polymorphism in Scala
So various polymorphism in ScalaSo various polymorphism in Scala
So various polymorphism in Scala
 
typemap in Perl/XS
typemap in Perl/XS  typemap in Perl/XS
typemap in Perl/XS
 
Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009
 
What's New In C# 7
What's New In C# 7What's New In C# 7
What's New In C# 7
 
Static types on javascript?! Type checking approaches to ensure healthy appli...
Static types on javascript?! Type checking approaches to ensure healthy appli...Static types on javascript?! Type checking approaches to ensure healthy appli...
Static types on javascript?! Type checking approaches to ensure healthy appli...
 
Scala for Java Programmers
Scala for Java ProgrammersScala for Java Programmers
Scala for Java Programmers
 
Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to Scala
 
Scala - en bedre Java?
Scala - en bedre Java?Scala - en bedre Java?
Scala - en bedre Java?
 
Scala - en bedre og mere effektiv Java?
Scala - en bedre og mere effektiv Java?Scala - en bedre og mere effektiv Java?
Scala - en bedre og mere effektiv Java?
 
Scala uma poderosa linguagem para a jvm
Scala   uma poderosa linguagem para a jvmScala   uma poderosa linguagem para a jvm
Scala uma poderosa linguagem para a jvm
 
Let's build a parser!
Let's build a parser!Let's build a parser!
Let's build a parser!
 
Reading Data into R
Reading Data into RReading Data into R
Reading Data into R
 
A Sceptical Guide to Functional Programming
A Sceptical Guide to Functional ProgrammingA Sceptical Guide to Functional Programming
A Sceptical Guide to Functional Programming
 
Perl tutorial final
Perl tutorial finalPerl tutorial final
Perl tutorial final
 

Dernier

Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 

Dernier (20)

Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 

Genomic Analysis in Scala

  • 1. Genomic Analysis in Scala Scala/Splash 2017 October 22, 2017 Ryan Williams 1 / 17
  • 2. Overview Intro Genomic applications General Scala libraries Design-pattern deep-dive "fun" with implicits Slides: hammerlab.org/splash-2017 Everything discussed in this talk open source / Apache 2.0 2 / 17
  • 3. Hammer Lab Mt. Sinai School of Medicine, NYC Research Personal Genome Vaccine pipeline / clinical trial Checkpoint blockade biomarkers, mutational signatures http://www.hammerlab.org/ Tools Genome biofx using Spark + Scala Biofx workflows and tools in OCaml The usual suspects: python, R, …   3 / 17
  • 8. magic-rdds Collection-operations implemented for Spark RDDs scans {left,right} {elements, values of tuples} .runLengthEncode, group consecutive elements by predicate / Ordering .reverse reductions: .maxByKey, .minByKey sliding/windowed traversals .size - smart count multiple counts in one job: val (count1, count2) = (rdd1, rdd2).size smart partition-tracking: reuse counts for UnionRDDs zips lazy partition-count, eager partition-number check sameElements, equals group/sample by key: first elems or reservoir-sampled HyperGeometric distribution handling Longs: hammerlab/math-utils 8 / 17
  • 9. hammerlab/iterators scans (in terms of cats.Monoid) sliding/windowed traversals eager drops/takes by number while until sorted/range zips SimpleBufferedIterator iterator in terms of _advance(): Option[T] hasNext lazily buffers/caches head etc. 9 / 17
  • 10. args4j case-app statically-checked/typed handlers implicit resolution inheritance vs. composition mutable vs. immutable case-app positional-arg support: #58 spark-commands: command-line interfaces class Opts { @args4j.Option( name = "--in-path", aliases = Array("-i"), handler = classOf[PathOptionHandler], usage = "Input path to read from" ) var inPath: Option[Path] = None @args4j.Option( name = "--out-path", aliases = Array("-o"), handler = classOf[PathOptionHandler], usage = "Output path to write to" ) var outPath: Option[Path] = None @args4j.Option( name = "--overwrite", aliases = Array("-f"), usage = "Whether to overwrite an existing ou ) var overwrite: Boolean = false } case class Opts( @Opt("-i") @Msg("Input path to read from") inPath: Option[Path] = None, @Opt("-o") @Msg("Output path to write to") outPath: Option[Path] = None, @Opt("-f") @Msg("Whether to overwrite an existing output overwrite: Boolean = false ) 10 / 17
  • 11. Design Patterns Down the typelevel / implicit rabbit-hole 11 / 17
  • 12. Deep case-class hierarchy: case class A(n: Int) case class B(s: String) case class C(a: A, b: B) case class D(b: Boolean) case class E(c: C, d: D, a: A, a2: A) case class F(e: E) Instances: val a = A(123) val b = B("abc") val c = C(a, b) val d = D(true) val e = E(c, d, A(456), A(789)) val f = F(e) Pull out fields by type and/or name: f.find('c) // f.e.c f.findT[C] // f.e.c f.field[C]('c) // f.e.c f.field[A]('a2) // f.e.a2 f.field[B]('b) // f.e.c.b As evidence parameters: def findAandB[T](t: T)( implicit findA: Find[T, A], findB: Find[T, B] ): (A, B) = (findA(t), findB(t)) shapeless-utils "recursive structural types" 12 / 17
  • 13. Nesting/Mixing implicit contexts Minimal boilerplate Spark CLI apps: input Path output Path (or: just a PrintStream) SparkContext select Broadcast variables other argument-input objects How to make all of these things implicitly available with minimal boilerplate? 13 / 17
  • 14. def app1() = { // call methods that want implicit // input Path, SparkContext } def app2() = { // call methods that want implicit // Path, SparkContext, PrintStream } Nesting/Mixing implicit contexts Minimal boilerplate Spark CLI apps: input Path output Path (or: just a PrintStream) SparkContext select Broadcast variables other argument-input objects How to make all of these things implicitly available with minimal boilerplate? Ideally: 13 / 17
  • 15. def run( implicit inPath: Path, printStream: PrintStream, sc: SparkContext, ranges: Broadcast[Ranges], … ): Unit = { // do thing } case class Context( inPath: Path, printStream: PrintStream, sc: SparkContext, ranges: Broadcast[Ranges], … ) def run(implicit ctx: Context): Unit = { implicit val Context( inPath, printStream, sc, ranges, … ) = ctx // do thing } Nesting/Mixing implicit contexts Minimal boilerplate Spark CLI apps: input Path output Path (or: just a PrintStream) SparkContext select Broadcast variables other argument-input objects How to make all of these things implicitly available with minimal boilerplate? 14 / 17
  • 16. trait HasInputPath { self: HasArgs ⇒ implicit val inPath = Path(args(0)) } trait HasOutputPath { self: HasArgs ⇒ val outPath = Path(args(1)) } class MinimalApp(args: Array[String]) extends HasArgs(args) with HasInputPath with HasPrintStream with HasSparkContext object Main { def main(args: Array[String]): Unit = new MinimalApp(args) { // all the implicits! } } } Nesting/Mixing implicit contexts How to make many implicits available with minimal boilerplate?   ≈ trait HasSparkContext { implicit val sc: SparkContext = new SparkContext(…) } abstract class HasArgs(args: Array[String]) trait HasPrintStream extends HasOutputPath { self: Args ⇒ implicit val printStream = new PrintStream(newOutputStream(outPath)) } 15 / 17
  • 17. That comes from a data structure like: case class Result( numPositions : Long, compressedSize : Bytes, compressionRatio : Double, numReads : Long, numFalsePositives: Long, numFalseNegatives: Long ) or better yet: case class Result( numPositions : NumPositions, compressedSize : CompressedSize, compressionRatio: CompressionRatio, numReads : NumReads, falseCounts : FalseCounts ) {to,from}String: invertible syntax Miscellaneous tools output "reports": 466202931615 uncompressed positions 156G compressed Compression ratio: 2.78 1236499892 reads 22489 false positives, 0 false negatives This is basically toString the Show type-class twist: downstream tools want to parse these reports want to re-hydrate Result instances implicit val _iso: Iso[FalseCounts] = iso"${'numFPs} false positives, ${'numFNs} false negatives" } 16 / 17