DSPy a system for AI to Write Prompts and Do Fine Tuning
Welcome to the Age of Data
1. Welcome to the age of data!
BIGDATA.BE
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
2. who am i
» Steven Noels
» Founder & VP Product
» Makers of Lily: Interactive Big Data
platform
» Open Source / Apache Software
Foundation
» co-founder bigdata.be
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 2
3. Houston,
we have
a problem.
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org
36. map/reduce
» Batch-oriented
» Data locality (code is shipped around)
» Heavy parallellization
» Process management
» Append-only files
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 32
37. Hadoop ecosystem
» Hadoop Common » Hive: A data warehouse infrastructure
» Subprojects that provides data summarization and
ad hoc querying.
» Flume/SQOOP: Data collection systems
» MapReduce: A software framework for
for large distributed systems.
distributed processing of large data
» HBase: A scalable, distributed database sets on compute clusters.
that supports structured data storage
» Pig: A high-level data-flow language
for large/wide tables.
and execution framework for parallel
» HDFS: A distributed file system that computation.
provides high throughput access to
» ZooKeeper: A high-performance
application data.
coordination service for distributed
applications.
» Mahout: machine learning libraries
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 33
38. High-level data model / easy API indexes
UI Framework SDK
(HUE) (HUE SDK)
Search
Dev2Dev
Workflow Scheduling Metadata tutoring,
(OOZIE) (oozie) (HIVE) integrated
deployment
and
Languages / enterprise
Data Compilers Fast usage metrics, support
Integration (PIG, HIVE) Read/Write analytics &
(FLUME, Access recommen-
SQOOP) (HBASE) dations
(PIG, HIVE)
Coordination
(ZOOKEEPER)
CDH
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 34
39. real-time big data architecture
1. compensate for high latency of updates to serving layer
speed layer 2. fast, incremental algorithms
3. batch layer eventually overrides speed layer
storm
1. random access to batch views
serving layer 2. updated by batch layer
1. store master dataset (append-only)
batch layer 2. compute arbitrary views
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 35
42. The start of Lily.
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org 38
43. Thank you !
for your attention
for your questions
» steven.noels@outerthought.com
» @stevenn
IIC » TECHNOLOGIEPARK 3 » B-9052 ZWIJNAARDE (GENT) » www.outerthought.org