InterPlanetary Linked Data (IPLD) is the data layer for content-addressed systems and the Web3.0. It is a suite of technologies for representing and traversing hash-linked data. In this module you will understand:
- Why IPLD exists
- IPLD’s fundamental concepts, such as Merkle DAGs and Merkle Roots
- The relation of IPLD to IPFS
- How to use IPLD for a distributed data structure.
2. Agenda
➔ Motivation: Why we need IPLD?
◆ What is it and why it exists
➔ IPFS & IPLD
◆ What’s the relationship?
➔ Fundamental Concepts
◆ Graphs & Linked Data
◆ Merkle DAGs
◆ Merkle Roots
◆ Links as the heart of IPLD
➔ Beyond File Data with IPLD
◆ The IPLD Data Model
◆ IPLD-native codecs
◆ Distributed Data Structures
4. Why IPLD?
The Data Layer for content-addressed systems
Can we extract a re-usable data layer from IPFS that can be
used to build other types of content-addressed data
systems?
“Building the next Git should take hours, not days!”
5. IPLD as Leverage
➔ How can we scale the size and complexity of the data that
we share peer to peer?
➔ Can we unify disparate content addressed formats and link
between them? Git, blockchains, IPFS, etc.
➔ Can we build distributed data structures that we can
interact with like we do with hosted databases, while taking
advantage of the benefits of content addressing?
7. IPFS & IPLD
IPLD is the data layer of IPFS
⬍
IPFS is a block store for IPLD
8. IPFS & IPLD
File Data
➔ At its most fundamental, IPFS is a collection of:
● binary blobs of data - “blocks”;
● their associated content identifiers - CIDs
➔ Only the smallest files in IPFS are stored as a single blob: to
keep block size practical, files are split up into chunks and
spread across multiple blocks and linked together into a single
graph
➔ Directories are graphs of named links pointing to files, forming
graphs that address other graphs
9. File Chunks:
UnixFS File:
(merkle-link)
(a hash)
(merkle-tree-dag) - directed acyclic graph
0-200 200-350
0-100 100-200 200-300 300-350
Merkle DAGs are graph data structures where each node is content-addressed
Visit: dag.ipfs.io
Linking Chunks in a DAG
Content Addressing
10. Leaf 3
CID
Leaf 1 CID Leaf 2 CID
File Chunks:
UnixFS File:
(merkle-link)
(a hash)
Merkle DAGs are graph data structures where each node is content-addressed
Visit: dag.ipfs.io
Linking Chunks in a DAG
Content Addressing
Child 1 CID Child 2 CID
Root CID
bafybeigdyrzt5s… (CIDv1)
QmbWqxBEKC3P8qs… (CIDv0)
(merkle-tree-dag) - directed acyclic graph
11. Fundamental Concepts
Links as the heart of IPLD Motivation
IPFS & IPLD
Fundamental Concepts
Beyond File Data with IPLD
12. ● Recall “DAG” is “Directed Acyclic Graph”
● Ralph Merkle, formalised the hash tree pattern 1979
Content being hashed may also contain hash digests of other
content; therefore, any content “address” authenticates
content “linked” via the inclusion of their digest in the tree
below it
Merkle DAGs
14. Dynamic Data
with Static
Structures
Merkle Mutability
All variations of mutability are
supported in the same way:
• Additions
• Deletions
• Modifications
(or: Deletion + Addition)
19. Links as the
heart of IPLD
● CIDs are hashes with descriptions, they tell you the hash function
used as well as the codec that can be used to interpret the binary data
being linked to.
● The CID’s hash digest is represented as a “multihash” and uses
numbers that identify hash algorithms, such as SHA2-256 (0x12) or
Blake2b-312 (0xb227)
● The CID’s “codec” or its IPLD format code tells you how to decode the
data when you locate it and load its bytes. This could be as simple as
JSON (0x0200), CBOR (0x51) or even raw bytes (0x55).
● Multihash, Multicodec and CIDs are part of the Multiformats system
for self-describing values.
Connecting Graphs:
Anatomy of a CID
110010010…
1000000000000010
01110000
00000001 00010010
CID Version
Multicodec
(IPLD format) (Length)
Multicodec
(Hash Fn) (Hash Digest)
sha-256
(0x12)
CID-V1
dag-pb
(0x70)
128 | 2
bafybeigdyrzt5sfp7udm7hu7…
Visit: multiformats.io
Multihash
20. Links as the
heart of IPLD
CIDs are the native link format for IPLD that distinguishes it from a
simple data representation system
● Most data serialization formats, such as JSON and CBOR, don’t have a native
way of representing links to content addressed data, so don’t have an in-built
way to form graphs of linked data.
● IPLD brings its own formats that represent CIDs natively in the encoded
bytes.
● IPLD can also be used as a lens through which to view other content
addressed formats, such as Git, or Bitcoin from which we can derive CIDs by
assumption.
Connecting Graphs:
Anatomy of a CID
110010010…
1000000000000010
01110000
00000001 00010010
CID Version
Multicodec
(IPLD format) (Length)
Multicodec
(Hash Fn) (Hash Digest)
sha-256
(0x12)
CID-V1
dag-pb
(0x70)
128 | 2
bafybeigdyrzt5sfp7udm7hu7…
Visit: multiformats.io
Multihash
21. Beyond File Data with IPLD
Scalable peer to peer data structures Motivation
IPFS & IPLD
Fundamental Concepts
Beyond File Data with IPLD
22. ● The primary IPLD codec of IPFS for use with files and
directories data is called DAG-PB: a dedicated Protobuf
format for representing a set of named links and a binary
blob of data.
● IPFS additionally interprets the binary blob of data using a
second Protobuf format called UnixFS to derive metadata
about files.
IPLD and File
Data in IPFS
Visit: docs.ipfs.io/concepts/file-systems/
23. IPFS file data: fixed layouts with additional properties to
represent directory structures and basic file metadata
What else do we need? Can we address and organise
complex and large data structures with IPLD blocks without
having to make everything a file?
Beyond File Data
24. ● IPLD defines a Data Model that details the forms that data
can take in memory, and through which a codec transforms
that memory to and from encoded bytes
● The Data Model includes Booleans, Strings, Ints, Floats,
Null, Arrays and Maps, but also Bytes and Links (CIDs).
● An IPLD “block” in this way is similar to the in-memory form
of a JSON data structure but with Links and Bytes.
The IPLD Data Model
Beyond File Data:
25. Two additional codecs designed for IPLD which can
represent the full IPLD Data Model in a flexible way are:
● DAG-JSON - based on JSON, but with special forms to
encode Links (CIDs) and Bytes
● DAG-CBOR - based on CBOR, but with the addition of a tag
to represent CIDs and additional strictness requirements for
deterministic encoding
Flexible Encoding Formats
Beyond File Data:
26. A Familiar Data Interface
The Data Model:
The Data Model includes the
common fundamentals available in
most programming languages.
const data = {
string: "☺ we can do strings!",
ints: 1337,
floats: 13.37,
booleans: true,
arrays: [1, 2, 3],
bytes: new Uint8Array([0x01, 0x03, 0x03, 0x07]),
links:
CID(bafyreidykglsfhoixmivffc5uwhcgshx4j465xwqntbmu43nb2dzqwfvae)
}
27. DAG-JSON
The Data Model:
DAG-JSON extends JSON, adding
determinism, a format for Bytes
and a format for Links
{
"arrays": [1, 2, 3],
"booleans": true,
"bytes": { "/":
{ "bytes": "AQMDBw" }
},
"floats": 13.37,
"ints": 1337,
"links": { "/":
"bafyreidykglsfhoixmivffc5uwhcgshx4j465xwqntbmu43nb2dzqwfvae"
},
"string": "☺ we can do strings!"
}
28. DAG-CBOR
The Data Model:
DAG-CBOR is a strict subset
of CBOR with determinism
and a special tag to identify
CIDs
a764696e74731905396562797465734401030307656c696e6b73d82a58250001711220785197229dc8bb115294
5da58e2348f7e279eeded06cc2ca736d0e879858b501666172726179738301020366666c6f617473fb402abd70
a3d70a3d66737472696e67781ae298baefb88f202077652063616e20646f20737472696e67732168626f6f6c65
616e73f5
a7 # map(7)
64 # string(4)
696e7473 # "ints"
19 0539 # uint(1337)
65 # string(5)
6279746573 # "bytes"
44 # bytes(4)
01030307 # "x01x03x03x07"
65 # string(5)
6c696e6b73 # "links"
d8 2a # tag(42)
58 25 # bytes(37)
0001711220785197229dc8bb1152945da58e2348f7 # "x00x01qx12 xQ"]¥#H÷"
e279eeded06cc2ca736d0e879858b501 # "âyîÞÐlÂÊsmx0e"
66 # string(6)
617272617973 # "arrays"
83 # array(3)
01 # uint(1)
02 # uint(2)
03 # uint(3)
66 # string(6)
666c6f617473 # "floats"
fb 402abd70a3d70a3d # float(13.37)
66 # string(6)
737472696e67 # "string"
78 1ae298baef # string(22)
e298baefb88f202077652063616e20646f2073747269 # "☺ we can do stri"
6e677321 # "ngs!"
68 # string(8)
626f6f6c65616e73 # "booleans"
f5 # true
29. Round-trip Through the Data Model
The Data Model:
IPLD’s Data Model is a stable
system for addressing,
constructing, encoding and
decoding data for a content
addressed world.
const data = {
string: "☺ we can do strings!",
ints: 1337,
floats: 13.37,
booleans: true,
arrays: [1, 2, 3],
bytes: new Uint8Array([0x01, 0x03, 0x03, 0x07]),
links:
CID(bafyreidykglsfhoixmivffc5uwhcgshx4j465xwqntbmu43nb2dzqwfvae)
}
30. The Data Layer for content-addressed systems.
It is a suite of technologies for representing and traversing hash-linked
data.
Including:
● A Data Model
● Mechanisms for deterministic translation from binary data to the Data
Model and back (codecs)
● Addressing and data-description primitives (CIDs / multiformats)
● Tools to address, traverse and mutate large graphs of linked blocks of
data
IPLD is ...
Recap
31. ● Persistent and immutable data structures are not new.
Functional Programming leans heavily on the same concepts.
● Standard libraries for Scala, Clojure, Haskell, etc. are full of data
structures that translate (almost) directly to the distributed,
content-addressed world.
● Algorithms for building, traversing and mutating content
addressed data structures requires careful consideration of the
trade-offs.
● Directional and acyclic graphs of immutable pieces of data can
be challenging to wrangle but scale powerfully.
Distributed Data
Structures
32. Example: Super-large array
Scaling addressable datasets
[ e1, e2, e3, e4, e5 ]
● Single block with 5 elements and one CID
for that block
33. Example: Super-large array
[ e1, e2, e3, e4, e5 ] [ e6, e7, e8 ]
[ L1.1, L1.2 ]
Height: 2
Height: 1
● Three distinct content addressed blocks
● Three CIDs
● Two leaf nodes containing our data
● One root to address all content in the DAG