In this talk, Joel Grus, a research engineer at AI2, speaks about best practices in ML engineering.
Presented on 06/06/2019
**These slides are from a talk given at Rsqrd AI. Learn more at rsqrdai.org**
3. Caveat:
I work at AI2, and we are at AI2,
but I am speaking only on behalf
of me
4.
5.
6. ● research engineer @AI2, I build deep
learning tools for NLP researchers
● previously SWE @Google, data science
@VoloMetrix + @Farecast + others
● small-time (but value-add) angel investor
(talk to me about your ML startups!)
● wrote a book about data science (2nd
edition now available)
● co-host of Adversarial Learning podcast
● "I Don't Like Notebooks"
● "Fizz Buzz in Tensorflow"
about me
21. write unit tests
tiny known dataset
check that model runs
check that output has the right fields
check that output has the right shape
check that output has reasonable values
22. make it easy to make your models
do something different
28. programming to higher level abstractions
# models/crf_tagger.py
class CrfTagger(Model):
"""
The ``CrfTagger`` encodes a sequence of text with a ``Seq2SeqEncoder``,
then uses a Conditional Random Field model to predict a tag for each token in the sequence.
def __init__(self, vocab: Vocabulary,
text_field_embedder: TextFieldEmbedder,
encoder: Seq2SeqEncoder,
label_namespace: str = "labels",
feedforward: Optional[FeedForward] = None,
label_encoding: Optional[str] = None,
include_start_end_transitions: bool = True,
constrain_crf_decoding: bool = None,
calculate_span_f1: bool = None,
dropout: Optional[float] = None,
verbose_metrics: bool = False,
initializer: InitializerApplicator = InitializerApplicator(),
regularizer: Optional[RegularizerApplicator] = None) -> None:
super().__init__(vocab, regularizer)
...
31. "Oh, great, now
I'm going to make
lots of changes to
my code and
maintain all these
different versions
so that my results
are reproducible"
"I'll just make a
new config file for
the BERT version!"