SlideShare une entreprise Scribd logo
1  sur  120
Blockchain Data Analytics Tutorial
Cüneyt Gürcan Akçora, Yulia R. Gel, Murat Kantarcioglu
Joint work with N. C. Abay, Y. Chen, M. Dixon, A. K. Dey, U. Islambekov,
Y. Li, E. Smirnova, B. Thuraisingham
Depts. of Computer Science and Math Sciences
University of Texas at Dallas
BlockchainTutorial.Github.ioIEEE ICDM 2018 Blockchain Day, Singapore
2
Outline
• A brief history of Blockchain
• Building blocks of Blockchain
• Blockchain data models and structures
• TXO and account based blockchains
• Privacy and security in blockchains
• Financial analytics on blockchains
1- Blockchain Data Analytics - Core Blockchain
How Blockchains appeared?
How do they work?
What are the design considerations?
What is the data stored on a blockchain?
4
Core Blockchain
10/31/2008: Satoshi Nakamoto posts the
Bitcoin white paper to a forum.
1/3/2009: The first data block in the
Bitcoin.
Coin Timeline*
Bitcoin: A peer to peer Electronic Cash System
* By JEFF DESJARDINS. Image retrieved from VisualCapitalist.com and updated.
Smart contracts, lightning networks, added privacy
5
Blockchain Network
Every node runs the same software to verify data blocks.
Each node is connected to a few other nodes only.
New nodes appear and existing ones disappear all the time.
There is no trusted node.
Every node has the full copy of the data.
Goal: Having a single truth about data, that can be verified by everyone.
6
Bitcoin: A financial application of Blockchain
Blockchain: a distributed ledger (i.e., “a book laying or remaining regularly
in one place”).
Block
Blockchain: a chain of data blocks
False data
Which peer should the node a believe
about block 4?
1 2 3 4
Bitcoin: chain data contains financial transactions.
1 2 3 4
True data
a
7
Bitcoin
2 bitcoins 1 bitcoins
2 bitcoins
From: Cuneyt
To: Joe (1BTC), Tim
(2BTC).
Use the 3 bitcoins I
received in Block 1
transaction 3.
Signed: Cuneyt
From: Jim
To: Chris (2BTC).
Use the 2 bitcoins I
received in Block 2
transaction 1.
Signed: Jim
1MB block size =
~ 2K transactions
Two inherent problems:
• Authenticity (You really have the funds)
• Double spending (You are not using the same funds twice)
• Authenticity is solved with
encrypted signatures, and showing
the proof of funds.
• Confirmation of payments requires
more effort: the double spending
problem.
From: Cuneyt
To: Joe (1BTC), Tim
(2BTC).
Use the 3 bitcoins I
received in Block 1
transaction 3.
Signed: Cuneyt
From: Jim
To: Chris (2BTC).
Use the 2 bitcoins I
received in Block 2
transaction 1.
Signed: Jim
8
Core Blockchain
5
5
• If everyone can create
blocks, the blockchain may
never stabilize.
Fork 1
Fork 2
4321
From: Jim
To: Chris (2BTC).
Use the 2 bitcoins I
received in Block 2
transaction 1.
Signed: Jim
5
5
From: Jim
To: John (2BTC).
Use the 2 bitcoins I
received in Block 2
transaction 1.
Signed: Jim
Jim is malicious: He is trying to use the same
coins in two payments.
Jim is hoping that Chris and John will not notice
the other payment.
1- If fork 1 becomes the canonical fork, John will
be defrauded.
2- If fork 2 becomes the canonical fork. Chris will
be defrauded.
9
Core Blockchain
• We cannot have a stable chain if we cannot be certain about
blocks. There cannot be multiple long forks with alternative
truths.
• Solution: Make block creation difficult. Allow the network
sometime between blocks, so that the current state will be
learned by all (or most) nodes.
• How can we stop people from creating blocks?
Ask a cryptographic puzzle!
65
65
Fork 1
Fork 2
4321
10
Core Blockchain
From: Cuneyt
To: Alice
Date: 1/1/2027
…..mail content….
This mail has 35 words
Proof-of-Work was first used in email spam detection
If the proof of work is not attached, it is spam!
Else count the words
If the word count in proof of work is wrong
Discard the email, spam!
Else email might be spam, run spam detector.
Proof of work: In this simple example, it is counting words.
This algorithm is used by the email service provider
11
Core Blockchain
Proof-of-Work: Spending time and effort to create (mine) a block.
The idea is to slow down attackers.
Bitcoin uses a hash puzzle for Proof of work.
Hash(University) = 7FDD903AF601C14E71D4938B2F7AB58A78C03C36D43485BB1937826B90DEFDD0
Hash(Univarsity) = 7E984B4F8807A0092C65AE3D897DD186943D95435C0A56F8350A0C7F82ACEF03
Proof of work: Find a hash value that satisfies a given difficulty.
12
Core Blockchain
Miner
From: Jim
To: Chris (2BTC).
Use the 2 bitcoins I
received in Block 2
transaction 1.
Signed: Jim
A node chooses to be a miner
Mining
• Mining is the process of gathering transactions that are in the
system waiting, creating a block out of them and advertising it to
the other nodes in the system.
• Creating a block is the computational review process performed on
transactions.
• Each block is limited to 1MB. Can hold ~2K transactions only. A block
can have 1 transaction only (as in many earlier blocks). Do you see a
possible problem here?
• Everyone can create transactions, but only miners can create blocks.
(Nuclear scientist were caught running mining software in
supercomputers).
14
Mining – Creating the block
• Several issues must be addressed in mining:
• Nothing is physical, the coins you spend may be fake (Verify
source).
• Even when the coins are not fake, you may have already spent it
(Verify history).
• Is the sender the real owner of these coins, is the receiver
address correct? (Verify users).
• Is the output, inputs amounts correct? (Verify the amount).
Miner checks and verifies all these.
There are many nodes, but few miners on most blockchains.
15
Core Blockchain
For(nonce = 1 to infinity)
blockHash = Hash( [hashOfBlock + hashOfPrevBlock + …]+ nonce)
If(blockHash satisfies difficulty)
block mined successfully!
) = hashOfBlockHash of block content (
• The miner increases the nonce until a useful blockHash is found.
• If such a nonce does not exist, the miner can start over by re-
arranging the blocks.
16
Core Blockchain
1 2 3 4 65
• Once the block is mined, the miner broadcasts it to all its peers. The
block propagates in the network.
• Mining a block does not guarantee that the block will be included in
the blockchain.
• Other miners need to build their blocks on top of the block.
• Colluding miners can ignore a mined block. Furthermore, they can
cooperate and build blocks on each other’s blocks only.
17
Core Blockchain
1 2 3 4 65
Miner 1
Miner 2
Miner 3
Miner 4 arrives at 𝑡1 to create a block, finds 3 competing last blocks.
𝑡1
Depending on which block to build on, Miner 4 has to exclude transactions
that have already been mined.
𝑡0
?
18
Core Blockchain
1 2 3 4 65
Miner 1
Miner 2
Miner 3
Let’s suppose that Miner 4 chooses to build on the block of Miner 1.
𝑡1 𝑡2
Miner 4
• Miner 5 arrives at 𝑡2 and sees 3 forks – The logical choice is to build on the
longest fork of Miner 1 and 4*.
• Miner 5 may still choose to build on other forks – may be a costly mistake.
𝑡0
*both length and difficulty are considered.
Proof of Work: An example
• How difficult is proof of work? Consider “Hello world!”+nonce
• If the difficulty is three zeros (000….), we try 4251 nonce values
• Hash("Hello, world!0”) =>
1312af178c253f84028d480a6adc1e25e81caa44c749ec81976192e2ec934c64
• Hash("Hello, world!4250”) =>
0000c3af42fc31103f1fdc0151fa747ff87349a4714df7cc52ea464e12dcd4e9
Bitcoin uses an adaptive difficulty that changes with how much
computing power exists in the mining business.
20
Core Blockchain – adjusting difficulty
Hash of Bitcoin Block #547873 (October 2018) [20 zeros]
0000000000000000000064eb6ef4f94808938de0889695dd7bb8dca70b334cb2
Hash of Bitcoin Block #3 (January 2009) [8 zeros]
00000000b3322c8c3ef7d2cf6da009a776e6a99ee65ec5a32f3f345712238473
Hash of Bitcoin Block #350000 (March 2015) [17 zeros]
0000000000000000053cf64f0400bb38e0c4b3872c38795ddde27acb40a112bb
• The desired rate is one block every 10 minutes. This is periodically
checked every 2016 blocks (2 weeks).
• If 2016 blocks took less than two weeks, the difficulty is increased.
1
10
100
1000
10000
100000
1000000
10000000
100000000
1E+09
1E+10
1E+11
1E+12
1E+13
1/27/2009 1/27/2012 1/27/2015 1/27/2018
Difficulty
Time
Proof of Work: Bitcoin difficulty in time
Decreases are possible
With max possible difficulty we will need to try > 1077 nonce values.
Bitcoin: more than 1021 tries to find a valid nonce!
Data from BTC.com
22
• Block reward halves every 4 years. Starting with 50 bitcoins per block,
this will create 21M bitcoins in total.
• Transaction fee is the amount unspent from inputs to outputs.
• The fee may also be zero – but why would anyone mine your
transaction?
From: Cuneyt
To: Joe (0.8 BTC),
Tim (2 BTC).
Use the 3 bitcoins I
received in Block 1
transaction 3.
Signed: Cuneyt
0.8 bitcoin
2 bitcoins
transaction fee = 0.2 bitcoins
Block
reward
Sum of all transaction fees
Incentives for mining
23
• Around May 2020 the block reward will halve to 6.25 bitcoins.
• 2140 is the year when the reward will be practically zero.
• Transactions fees will carry the system after block rewards become
trivial.
• November 2018: block reward is 12.5B, transaction fees are ~0.05B.
• Fees are trivial if the market volume is low. Users leave aside lower
fees.
• In December 2017, fees were around 5-7B.
24
Bitcoin mining
• One winning miner every 10 mins. Many others lose and waste electricity.
• Eric Jennings: “The cost for having no central authority is the cost of that
energy”.
• Tim Swanson: “Bitcoin is a peer-to-peer heat engine”.
• Narayanan: “Bitcoin mining has been an expensive way to bet that the
price of Bitcoin would rise”.
25
Proof-of-X
Proof-of-X is an umbrella term that covers Proof-of-Work alternatives in
block mining.
Each alternative scheme expects miners to show a proof that they have
done enough work or spent enough wealth before creating the block.
• Proof-of-Stake: Stake = Coin×Age. The miner with the highest stake
becomes the next miner in the chain. Once coins are used, their age
becomes zero. Rich gets richer!
• Proof-of-Burn: The miner sacrifices wealth: creates a transaction and
sends some coins to a “verifiably unspendable” address. Reduces total
supply!
• Proof-of-Ownership, Proof-of-Publication, and others…
26
Blockchains – why stop at cryptocurrencies?
Every node runs the same software to verify data blocks.
Each node is connected to a few other nodes only.
New nodes appear and existing ones disappear all the time.
There is no trusted node.
Every node has the full copy of the data.
27
0.8 bitcoin
2 bitcoins
From: Cuneyt
To: Joe (0.8 BTC),
Tim (2 BTC).
Use the 3 bitcoins I
received in Block 1
transaction 3.
Signed: Cuneyt
data
Bitcoin: data are financial transactions.
Tschorsch, Florian, and Björn Scheuermann. Bitcoin and beyond: A technical survey on decentralized digital
currencies. IEEE Communications Surveys & Tutorials 18, no. 3 (2016): 2084-2123.
28
- Notary Documents
- Pictures
- Identity Documents
- Shipping logs
- Manufacturing logs
- IOT data
Data can be more:
1- On-chain storage
2- Off-chain storage:
 Store hashes of data (as proof)
 Store the address of data (Our
data resides as IP: 145.178.14.29)
30
Blockchain Network – Beyond Cryptocurrencies
• Ethereum has been created to store data and
software code on a blockchain.
• Similar to Bitcoin, Ethereum has a currency:
Ether.
• The code (a smart contract) is written in the
proprietary coding language Solidity, which is
compiled to bytecode and executed on the
Ethereum Virtual Machine.
• An analogy is the MYSQL snippets stored on a
database.
Solidity
31
Blockchain Network – Smart Contract
• User creates a transaction to upload the Smart Contract code to an
address.
• The code at the address is replicated in all blockchain nodes.
• In other words, you force other users to store your code.
• The code is executed by passing transaction messages to its
functions. Execution occurs at all nodes – hence the World
Computer!
• Contract creation is expensive.
• All subsequent calls to the contract code are billed in terms of
what operations they require.
32
Blockchain Network – Contracts
• Each operation has a gas price for executing it.
• For example, using the ‘addition’ operation costs you 3 gas.
Image: https://hackernoon.com/
33
Ethereum – the World Computer
Benefits of having code on a blockchain
Public code Code can be analyzed by everyone.
Unmodifiable code Code cannot be modified without leaving
a trace.
Unstoppable execution Code will run to completion.
Verifiable results Results can be verified by all parties.
It is easy to see why platform creators called the code Smart Contract!
34
Ethereum – the World Computer
• Contracts gave rise to Smart Contract based tokens: exchanged data
units that are used to buy/sell services in the real world.
• For example, Storj token stores files on your hard disk, and pays you a
fee through Ethereum.
• Tokens can be bought or sold; they act as value stores. Token prices are
arbitrated in the real world.
• Companies create tokens, and sell them in Initial Coin Offerings to raise
capital.
35
Blockchain tokens
New Ethereum token contracts in time (>5K transactions in early 2018)
36
Blockchain tokens
Ethereum token transactions in time
37
Platforms– Standardization Continues
• Initially, tokens could implement a vital function (e.g., transfer) with any
name (e.g., sell, transferTo, sendTo).
• ERC20 standard enforces a list of functions that must be implemented
by a token:
2018 May. Data from our Chartalist project
38
Blockchain tokens and platforms
Left Ethereum
2- Blockchain Graph Analytics
Are transactions the same on all blockchains?
How can we model Blockchain data?
40
Blockchain Graph Analytics
• For data modelling, blockchains can be divided into two major
categories:
Account based blockchains (e.g., Ethereum)
Transaction output (TXO) based blockchains (e.g., Bitcoin, Litecoin)
41
2a - Transaction output based blockchains
42
Transaction output (TXO) based blockchains
0.8 bitcoin
2 bitcoins
3B
0.8B
2B
Transaction 1
Address
0.2B tx fee
Next, if address b wants to spend its received 2B, it needs to show proof
of funds:
“Use the 2B I received from Block 1, transaction 1 and to pay 1.5B to c
and 0.3B to d”.
a
b
3B
0.8B
2B
2B
1.5B
0.3B
c
d
b
a
b
43
Transaction output (TXO) based blockchains
• Genesis block 0: The first block, created by Nakamoto.
• Every block has one coinbase transaction that creates bitcoins
(sum of block reward + transaction fees).
• All other payments must show proof of funds (previous outputs).
Coinbase transaction
Block n Block n+1
Time
44
Three Graph Rules for TXO
1- Source Rule: Coins can be gained from multiple transactions. These can
be spent at once or separately (dashed edges connect to unspecified
addresses).
b
Address b can spend bitcoins at 𝑡𝑥1(once), or at 𝑡𝑥1 and 𝑡𝑥2.
𝑡𝑥1
𝑡𝑥2
45
Three Graph Rules for TXO
2- Balance Rule: All coins gained from a transaction must be spent in a
single transaction. Addresses cannot keep change, must forward it.
Same user?
Address reuse is
rare
c
d
e
i - c sold all its coins: c, d and e all belong to different people, or
ii - c paid to d, and forwarded the change to its new address e.
In many scenarios, we have to learn which addresses belong to the same
entity.
Two cases:
46
Three Graph Rules for TXO
3 – Mapping Rule: Multiple inputs can be signed separately and merged, but
the input-output address mappings are not recorded.
A transaction can be considered a lake with incoming rivers, and outgoing
emissaries. Coins mix in this lake.
1B
1B 1B
1B
Heuristics are developed to link inputs to outputs – we will cover them in
the privacy section.
47
Existing Graph Approaches
Transaction graph: Edges between transactions only.
Transaction graph
Cannot capture unspent coins.
Cannot distinguish transactions with differing inputs/outputs.
Blockchain graph
Different
inputs/outputs
Dorit Ron and Adi Shamir. 2013. Quantitative analysis of the full bitcoin transaction graph. In International
Conference on Financial Cryptography and Data Security. Springer, 6–24.
48
Existing Graph Approaches
2- Address graph: Edges between addresses only.
Address graphBlockchain graph
 Edges are multiplied between inputs and outputs: creates 1 million
edges for a 1000 input, 1000 output transaction.
 Creates bias for average degree, even for median degree.
Michele Spagnuolo, Federico Maggi, and Stefano Zanero. 2014. Bitiodine: Extracting intelligence from the
bitcoin network. In International Conference on Financial Cryptography and Data Security. Springer, 457–4
49
Existing Graph Approaches
Graph Analysis with single node type:
Not always useful for the forever forward branching tree of Bitcoin.
2- Address graph: is it worth the trouble searching for graph motifs?
 Addresses are not supposed to re-appear in future.
 Closed triangles are very rare.
 Output/input address sets do not have edges to each other – our tools
do not consider this, and search for edges in vain (linked transactions
within a block are possible but rare).
50
Blockchain Graph – Substructure mining
Definition [K-Chainlets]:
Let k-chainlet Gk = (Vk, Ek, B) be a subgraph of G with k nodes of type {Transaction}. If
there exists an isomorphism between Gk and G’, G’ ∈ G, we say that there exists an
occurrence, or embedding of Gk in G.
If a Gk occurs more/less frequently than expected by chance, it is called
a Blockchain k-chainlet. A k-chainlet signature fG(Gk) is the number of occurrences of Gk in
G.
• Rather than individual edges or
nodes, we use a subgraph as the
building block in our Bitcoin analysis.
• We use the term chainlet to refer to
such subgraphs.
Cuneyt G. Akcora, Asim Kumer Dey, Yulia R. Gel, and Murat Kantarcioglu. Forecasting Bitcoin Price
with Graph Chainlets. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 765-
776. Springer, Cham, 2018.
51
Blockchain Chainlets
• Chainlets have distinct shapes that
reflect their role in the network.
• We aggregate these roles to analyze
network dynamics.
Tx 1
Tx 1
Tx 2
Tx 2
Tx 3
Tx 3
Tx 4
Tx 4
Three distinct types of 1-chainlets!
52
Aggregate Chainlets
Transition. Ex: Chainlet C3→3
Cx→y : chainlet with x inputs and y
outputs.
• Transition Chainlets imply coins changing
address: x = y.
Split. Ex: Chainlet C1→2
• Split Chainlets may imply spending behavior:
y > x.
But, community practice against address
reuse can also create split chainlets.
Merge. Ex: Chainlet C3→1
• Merge Chainlets imply gathering of funds:
x > y.
53
Aggregate Chainlets
Percentage of aggregate chainlets in the Bitcoin Graph (weekly snapshots)
Around here 2 pizzas are worth 10
thousand bitcoins. Non è il bel paese!
Outputs
1 2 3
inputs
321
54
Chainlet Matrix
• For a given time granularity, such as one day, we take snapshots of the
Bitcoin graph.
• Chainlet counts obtained from the graph are stored in an N×N matrix.
Representing the network in time
2 1
1
0 0 0
0
0 0
N: How big should the matrix be?
55
Extreme Chainlets • N can reach thousands, the matrix can be
1000 × 1000.
• On Bitcoin, % 90.50 of the chainlets have N
of 5 (x < 5 and y < 5), and % 97.57 for N of
20.
Outputs
1 2 3
inputs
321
2 1
1
0 0 0
0
0 0
4
Extreme chainlets are the last column/row of the chainlet matrix.
They imply big coin movements in the graph!
Occurrence matrix
𝑂[𝑖, 𝑗] =
#𝐶𝑖→𝑗 if 𝑖 < 𝑁 𝑎𝑛𝑑 𝑗 < 𝑁
𝑧=𝑁
∞
#𝐶𝑖→𝑧 if 𝑖 < 𝑁 𝑎𝑛𝑑 𝑗 = 𝑁
𝑦=𝑁
∞
#𝐶 𝑦→𝑗 if 𝑖 = 𝑁 𝑎𝑛𝑑 𝑗 < 𝑁
𝑦=𝑁
∞
𝑧=𝑁
∞
#𝐶 𝑦→𝑧 if 𝑖 = 𝑁 𝑎𝑛𝑑 𝑗 = 𝑁
56
Extreme Chainlets
Bitcoin companies stopped all business in New
York State because of new regulations.
The New York Business Journal called this the
"Great Bitcoin Exodus".
Percentage of extreme chainlets in the Bitcoin Graph (N = 20, daily snapshots)
57
Clustering the Chainlets
• A hierarchical clustering of chainlets by using Cosine Similarity over
chainlet signatures in time.
• We used a similarity cut threshold of 0.7 to create clusters from the
hierarchical dendrogram.
Chainlet clusters for daily snapshots Chainlet clusters for weekly snapshots
Most common chainlets Extreme and correlated chainlets
58
2b - Account based blockchains
59
Account based blockchains
• On account based blockchains, transactions involve one input address
and one output address.
• An address spends coins from a balance, keeps the change.
• Each transaction of an address has an order (called nonce). The nonce is
the number of transactions sent to the network by the address.
• A later transaction needs to wait for earlier transactions to be mined.
4E
60
Internal transactions
• Account based blockchains have two types of “transactions”.
• The first transaction type involves a transfer of the used cryptocurrency,
such as Ether on Ethereum.
• The second type are internal transactions, which involve a transfer of
smart contract based tokens.
• Internal transactions are created when smart contracts change states of
addresses.
• Internal transactions can be discovered in two ways: by parsing ordinary
transactions’ messages, or by running the transaction message through
the smart contract code.
• The parsing method cannot discover failed transactions.
61
Internal transactions
• A transaction can transfer both currencies and tokens.
4E
Ordinary address
Contract
addressA transaction message on Ethereum
4E
4E
2
Contracts can start
events as well, these are
explicitly recorded.
An internal transaction
can create multiple
edges, although this is
rare on Ethereum.
62
Trading tokens – a timeline
0Ether Send my 2 tokens to
address a
b
0.2Ether
a b
0.3Ether I want to buy 2
Storj tokens
b
Storj token
Balances:
b: 2 Storj
Balances:
b: 0 Storj
a: 2 Storj
a pays 0.2E to b to buy its
tokens. From 𝑡1 to 𝑡2, Storj
price decreased in the
market from 0.15E to 0.1E
𝑡1
𝑡2
𝑡0
All edges on the
Ethereum graph. a
b
2
0.3E
0.2E
63
Account based blockchains
The largest connected
component in Storj network
on 13-1-2018.
• We model account based blockchains as directed, weighted, multi-
graphs.
• The network of a single token is usually sparse, and devoid of
community structure.
• Daily networks may contain many disconnected components.
64
Inter-token networks
• Account based blockchains are global market places where goods are
exchanged in terms of tokens. Ethereum is a successful example.
• Blockchain platforms will allow us to view global market activity in real
time.
65
Research questions
• Account based blockchains lend themselves to traditional network
analysis tools and algorithms.
• Motif analysis, core decomposition, centrality and clustering algorithms
can easily be adapted to work on account based blockchains.
• High granularity temporal data allows time series analyses.
• The rich variety of cryptotokens being traded on the network brings many
interesting research problems:
Token price prediction, price manipulation detection, token network
health and robustness analysis, inter-token impact analysis, investor
behavior analysis.
66
3- Blockchain Privacy and Security
Permissionless (public) Blockchains Permissioned (private) Blockchains
Bitcoin, Litecoin, Ethereum Hyperledger, R3
• By definition any user can join a public blockchain (e.g., Bitcoin).
• For corporate settings, the transparency means that rivals can learn
company finances and buy/sale relationships.
• The permissioned blockchains were created for industrial settings.
• Permissioned: Less power consumption, more secure, privacy
aware, but for all purposes a gated community.
3- Blockchain Privacy and Security
• In public blockchains, data is considered public.
• Tapscott: There are no honeypots of personal data on the
blockchain.
• Public blockchains are pseudo-anonymous: There is no registration
to join the network, but all your transactions are public.
• For security, TOR can be used to send transactions to the P2P
network.
• As a threat, most online exchanges are governed by know-your-
customer rules that require customer registration.
Blockchain communication graph
• At its core, Bitcoin maintains a peer-to-peer architecture. Bitcoin
peers create persistent TCP channels with each other and relay
transactions.
• Each peer seeks a minimum of 8, a maximum of 125 peers.
• Each node forwards transactions arriving from a neighbor to other
neighbors.
• Transactions that await mining in the P2P network are contained
in the mempool.
• The first sender of a transaction is most likely to be the transaction
owner.
Blockchain communication graph
• Nodes forward incoming transactions selectively to hinder time based
address inference. This is called trickling.
• In this network, b is connected to all neighbors of a – by observing
relayed transactions, b can deduce that transaction t3 originated from a.
Andrew Miller, James Litton, Andrew Pachulski, Neal Gupta, Dave Levin, Neil Spring, and Bobby Bhattacharjee.
2015. Discovering bitcoin’s public topology and influential nodes. (2015).
t2, t5, t4
t1, t2, t5
t1, t6, t5
t1, t3, t5
t1, t6, t4
t6, t7, t8
t1, t6, t4
a
b
Blockchain TXO content graph
• Can we tell which addresses are controlled by the same
user/entity/organization?
• In order to answer this question, we first need to map inputs to
outputs.
Where do the bitcoins at
address a come from?
a
From nine addresses!
Fungibility: Is a specific bitcoin
worth a bitcoin everywhere?
Taint analysis studies a
bitcoin’s history.
Blockchain TXO content graph
Heuristics are used to detect which input and output addresses are
controlled by the same user.
Meiklejohn, Sarah, Marjori Pomarole, Grant Jordan, Kirill Levchenko, Damon McCoy, Geoffrey M. Voelker, and
Stefan Savage. A fistful of bitcoins: characterizing payments among men with no names. In Proceedings
of the 2013 conference on Internet measurement conference, pp. 127-140. ACM, 2013.
1B
4B 3B
2B 1B
1B 1B
1B
Considering amounts may help in
basic cases.
Schemes exist to use multiple
rounds of flows with equal amounts
to hide tracks.
Heuristics to link addresses
Addresses a, b and c belong to the same user.
1- Idioms of Use: posits that all input addresses in a transaction should
belong to the same entity because only the owner could have signed
the inputs with the associated private keys.
a
b
c
Heuristics to link addresses
Addresses a, b, c, d and e belong to the same
user.
2- Transitive Closure: extends Idioms of Use: if a transaction has inputs
from a and b, whereas another transaction has from a and c, b and c
belong to the same user.
a
b
c d
e
Heuristics to link addresses
The heuristic then posits that the one-time change (output) address— if
one exists— is controlled by the same user as the input addresses.
3- Change address: the following four conditions must be met:
(1) the output address has not appeared in any previous transaction;
(2) the transaction is not a coin generation;
(3) there is no self-change address in the outputs
(4) all the other output addresses in the transaction have appeared in
previous transactions.
Obfuscation efforts
• A measure to prevent matching addresses to users is known as Coin
Mixing, or its improved version, CoinJoin.
• The initial idea in mixing was to use a central server to mix inputs
from multiple users.
2B
2B 2B
2B
5B
2B 3B
4B
2B
1B 1B
2B
3B
2B 2B
3B
Ruffing, Tim, Pedro Moreno-Sanchez, and Aniket Kate. CoinShuffle: Practical decentralized coin mixing for
Bitcoin. In European Symposium on Research in Computer Security, pp. 345-364. Springer, Cham, 2014.
Obfuscation efforts – peeling chains
• In a peeling chain, a single address begins with a relatively large
amount of bitcoins.
• A smaller amount is then “peeled” off this larger amount, creating a
transaction in which a small amount is sent to one address and the
remainder is sent to a one-time change address.
• This process is repeated— potentially for hundreds or thousands of
hops— until the larger amount is pared down
Di Battista, Giuseppe, Valentino Di Donato, Maurizio Patrignani, Maurizio Pizzonia, Vincenzo Roselli, and Roberto
Tamassia. Bitconeview: visualization of flows in the bitcoin transaction graph. In Visualization for Cyber
Security (VizSec), 2015 IEEE Symposium on, pp. 1-8. IEEE, 2015.
Narayanan, Arvind, and Malte Möser. Obfuscation in bitcoin: Techniques and politics. arXiv preprint
arXiv:1706.05432 (2017).
Obfuscation efforts – peeling chains
25B
0.5B
0.5B
0.5B
0.5B
…
Repeated patterns are frequently
found on the Bitcoin blockchain.
Exit to
fiat
currency
McGinn, Dan, David Birch, David Akroyd, Miguel Molina-
Solana, Yike Guo, and William J. Knottenbelt. Visualizing
dynamic bitcoin transaction patterns. Big data 4, no. 2
(2016): 109-119.
Network clustering of addresses
• By nature all user clustering heuristics are error prone.
• Some community practices further complicate the issue.
• For example, online wallets, such as coinbase.com, buy/sell coins
among its customers without using transactions; ownership of an
address is changed by transferring the associated private keys to
another user.
• Although the user associated with the address changes, nothing gets
recorded in the blockchain.
• Clustering can be further improved by considering IP locations and
temporal patterns.
Research directions in taint analysis
• Money laundering M. Moser, R. Bohme, and D. Breuker. An inquiry into money
laundering tools in the Bitcoin ecosystem. In: eCRS. IEEE. 2013, pp.
1-14.
Huang, Danny Yuxing, Maxwell Matthaios Aliapoulios, Vector Guo
Li, Luca Invernizzi, Elie Bursztein, Kylie McRoberts, Jonathan Levin,
Kirill Levchenko, Alex C. Snoeren, and Damon McCoy. Tracking
ransomware end-to-end. In 2018 IEEE Symposium on Security and
Privacy (SP), pp. 618-631. IEEE, 2018.
• Ransomware payments
• Illicit trade/use Portnoff, Rebecca S., Danny Yuxing Huang, Periwinkle Doerfler,
Sadia Afroz, and Damon McCoy. Backpage and bitcoin:
Uncovering human traffickers. In Proceedings of the 23rd ACM
SIGKDD International Conference on Knowledge Discovery and
Data Mining, pp. 1595-1604. ACM, 2017.
• Personal blackmail S. Phetsouvanh, F. Oggier and A. Datta. EGRET: Extortion Graph
Exploration Techniques in the Bitcoin Network. IEEE ICDM
Workshop on Data Mining in Networks (DaMNet). IEEE, 2018.
Thanks for attending!
Cuneyt.Akcora@utdallas.edu
Further reading -> Blockchain: A graph primer.
C. G. Akcora, Y. R. Gel, M. Kantarcioglu.
[Updated regularly, online] ArXiv:1708.08749, pp 1-16, 2017.
BlockchainTutorial.Github.io
81
Outline
1. Descriptive summaries
2. Price models
3. Risk models
 Value at Risk estimates
 GARCH family
4. Models using local blockchain network features
4- Statistical Analysis of the Cryptocurrency
Price Formation
Price and returns of cryptocurrency
Let 𝑦𝑡 be the price of a cryptocurrency. Returns of prices measure the
relative change in prices.
• Simple returns:
𝑅𝑡 =
𝑦𝑡 − 𝑦𝑡−1
𝑦𝑡−1
=
𝑦𝑡
𝑦𝑡−1
− 1
Benefit of using returns versus prices is normalization. Measures all variables in
a comparable metric.
• Log returns:
𝑟𝑡 = log 𝑦𝑡 − log 𝑦𝑡−1 = log
𝑦𝑡
𝑦𝑡−1
Log returns are additive. Again if we assume that prices are distributed log
normally (which, in practice, may or may not be true for any given price series),
then log transformation results in approximately normal returns, which are
easier to work with.
82
83
Bitcoin price and log returns
The price has an upward trend and it is volatile which is also clear from log returns.
Summary statistics – log returns
The summary statistics are the largest for Bitcoin, followed by Dash, Litecoin,
Monero, Ripple, Maidsafecoin and Dogecoin (Chu et al., 2017).
The log returns for each cryptocurrency are positively skewed.
84
Chu, Jeffrey, Stephen Chan, Saralees Nadarajah, and Joerg Osterrieder. GARCH modelling of
cryptocurrencies. Journal of Risk and Financial Management 10, no. 4 (2017): 17.
85
Summary statistics – log returns cont.…
Log returns are
more or less
symmetrically
distributed. Some
histograms appear
more peaked than
others.
The histogram of the
log returns of the
exchange rates from
June 2014 – May
2017.
Chu, Jeffrey, Saralees Nadarajah,
and Stephen Chan. Statistical
analysis of the exchange rate of
bitcoin. PloS one10, no. 7
(2015).
86
Models for cryptocurrency price and volatility
• Price models
• Risk models:
a) Value at Risk estimates via fitting parametric distributions.
b) GARCH family
• Models using local block chain network features
87
Predictive models
In order to avoid spurious regression, we need to test stationarity and
interdependency properties (Jang and Lee, 2017).
Jang, Huisu, and Jaewook Lee. An empirical study on modeling and prediction of bitcoin prices
with bayesian neural networks based on blockchain information. IEEE Access 6 (2018): 5427-5437.
According to regressions of
interdependent and non-
stationary time series may
lead to spurious results.
Engle, Robert F., and Clive WJ Granger.
Co-integration and error correction:
representation, estimation, and
testing. Econometrica: journal of the
Econometric Society (1987): 251-276.
88
Time plots
Bitcoin price has a positive time trend and shows a clear non-stationarity.
Every log returns graph shows periods of very high volatility and periods of
relative tranquility which is a common feature among financial assets.
Dyhrberg, Anne Haubo. Bitcoin, gold and the dollar–A GARCH volatility analysis. Finance Research
Letters 16 (2016): 85-92.
89
Transaction activity and price
Price, number of transaction and number of unique address exhibit similar
upward pattern. The log returns shows that they are volatile.
Koutmos, Dimitrios. Bitcoin returns and transaction activity. Economics Letters 167 (2018): 81-85.
90
Stationarity and cointegration tests
• Test for the stationarity: The augmented Dickey-Fuller (ADF) test:
∆𝑦𝑡 = 𝛼 + 𝛽𝑡 + 𝛾𝑦𝑡−1 + 𝛿1∆𝑦𝑡−1 + … + 𝛿 𝑝−1∆𝑦𝑡−𝑝+1 + 𝜖 𝑡
where 𝛼 is a constant, 𝛽 the coefficient on a time trend and 𝑝 the lag
order of the autoregressive process. 𝐻0: 𝛾 = 0 against 𝐻𝐴: 𝛾 < 0.
Test statistics: 𝐷𝐹 = 𝛾/𝑠𝑒( 𝛾)
• Cointegration test: Two time series are considered to be
cointegrated if there exists a long-run equilibrium relationship
between them.
Engle-Granger Cointegration test: If 𝑥𝑡 and 𝑦𝑡 are non-stationary
and cointegrated, then a linear combination of them must be
stationary. In other words:
𝑦𝑡 − 𝛽𝑥𝑡 = 𝑢 𝑡
where 𝑢 𝑡 is stationary.
91
Granger Causality Test
The causality test assesses whether one time series is useful in
predicting another.
F 𝑡+ℎ ∙ |ℱ 𝒀,𝑿,𝒁 𝟏,…,𝒁 𝒌
𝑡−1
= F 𝑡+ℎ ∙ |ℱ(𝒀,𝒁 𝟏,…,𝒁 𝒌)
𝑡−1
• Then, 𝑿 𝒕−𝟏 is said not to Granger cause (G-cause) 𝒀 𝒕+𝒉 with
respect to ℱ(𝒀,𝒁 𝟏,…,𝒁 𝒌)
𝑡−1
.
• Otherwise, 𝑿 is said to G-cause 𝒀, which can be denoted by 𝐺 𝑿→𝒀.
• → represents the direction of causality.
92
Granger Causality Test cont.…
For univariate case consider time series 𝑦𝑡, 𝑥𝑡 and 𝑧𝑡. To test
G-causality of 𝑥𝑡, we compare the fit of the full model
𝑦𝑡 = 𝛼0 +
𝑘=1
𝑑
𝛼 𝑘 𝑦𝑡−𝑘 +
𝑘=1
𝑑
𝛽 𝑘 𝑥𝑡−𝑘 +
𝑘=1
𝑑
𝛾 𝑘 𝑧𝑡−𝑘 + 𝑒𝑡
versus the fit of the reduced model
𝑦𝑡 = 𝛼0 +
𝑘=1
𝑑
𝛼 𝑘 𝑦𝑡−𝑘 +
𝑘=1
𝑑
𝛽 𝑘 𝑥𝑡−𝑘 + 𝑒𝑡
• Under the null hypothesis of no predictive effect in 𝑥 onto 𝑦 (i.e., x
does not G-cause 𝑦), 𝑉𝑎𝑟 𝑒𝑡 = 𝑉𝑎𝑟 𝑒𝑡 .
• If 𝑉𝑎𝑟 𝑒𝑡 is (statistically) significantly lower than 𝑉𝑎𝑟 𝑒𝑡 , then 𝑥
• contains additional information that can improve forecasting of 𝑦,
i.e., 𝐺 𝑿→𝒀.
93
Granger Causality Test - example
In both case we can conclude that Bitcoin price realized volatility Granger-causes
the VIX and the VIX Granger causes Bitcoin price realized volatility.
Estrada, Julio Cesar Soldevilla. Analyzing Bitcoin Price Volatility. University of California, Berkeley (2017).
94
Models
• For stationary case standard OLS estimator can be used to
estimate the model.
• For non-stationary and non-cointegrated series we estimate a
multivariate vector auto regressive (VAR) model.
• When the time series are considered to be cointegrated the
Vector Error Correction (VEC) model is suitable for estimation.
• A combination of above models is also used.
Machine learning methods
• Neural network (NN) and its different versions, e.g., RNN,
BNN, CNN, etc.
• Random Forest (RF)
• Support Vector Regression (SVR)
95
96
Model evaluation
Root mean squared error (RMSE)
Mean absolute percentage error (MAPE)
where 𝑦𝑡 is the Bitcoin price and 𝑦𝑡 is the corresponding predicted value.
=
1
𝑛
𝑡=1
𝑛
𝑦𝑡 − 𝑦𝑡
2
=
1
𝑛
𝑡=1
𝑛
𝑦𝑡 − 𝑦𝑡
𝑦𝑡
Bayesian neural networks (BNN)
97
• BNN models outperform other models in terms of RMSE and MAPE
for predicting the log price of Bitcoin for 1-day ahead forecast.
• BNN is more reliable for describing the process of log volatility than
other benchmark models (Jang and Lee, 2017)
Bayesian neural networks (BNN)
98
We observe that Bayesian neural networks capture the patterns of the Bitcoin
prices better than other models (Jang and Lee, 2017).
99
Risk Models: Tests for randomness and no
serial correlation
p-values for randomness tests and serial correlation tests are all greater
than 0.05, therefore Bitcoin log returns and squares of log returns are
random and uncorrelated. (Chu et al., 2015).
Model selection techniques
100
Akaike’s information criteria (AIC) and the Bayesian information
criterion (BIC):
𝐴𝐼𝐶 = −2ℓ + 2𝑘
𝐵𝐼𝐶 = −2ℓ + 𝑘 ln 𝑛
where, ℓ is the maximized log-likelihood function of the model and 𝑘
is the number of parameters in the model.
• The model with smallest AIC and BIC are considered as the “best"
model.
Graphical approach:
• Quantile-Quantile (Q-Q) plot.
• Observed versus fitted density are the popular techniques for
model diagnostic.
101
Model selection for log returns
Overall, the generalized hyperbolic distribution gives the best fit by having the
smallest values for ln L, AIC, AICc, BIC (Chu et al., 2015).
102
Model selection for Bitcoin log returns cont.…
The QQ plot, probability plot and the density plot of the fitted generalized
hyperbolic distribution suggest that the fit is good. The fit appears reasonable
also in the tails (Chu et al., 2015).
103
Bitcoin Value at Risk (VaR)
Value at Risk is the maximum loss, which should not be exceeded
during a specified period of time with a given probability level.
Let 𝑓(𝑥) be the probability density function of this distribution.
𝑃 𝑋 ≤ −𝑉𝑎𝑅 1 − 𝛼 = 𝛼
−∞
−𝑉𝑎𝑅 1−𝛼
𝑓 𝑥 𝑑𝑥 = 𝛼
The fitted values for the VaR
appears very close to the
historical estimates
(Chu et al., 2015).
104
GARCH Modelling
The GARCH (𝑝, 𝑞) model for a Bitcoin price returns, 𝑟𝑡, is defined as
𝑟𝑡 = 𝜎𝑡 𝜖 𝑡
𝜎𝑡
2
= 𝑤0 +
𝑖=1
𝑞
𝑤𝑖 𝑟𝑡−𝑖
2
+
𝑗=1
𝑝
𝜏𝑖 𝜎𝑡−𝑗
2
where 𝑤0 > 0, 𝑤𝑗 > 0, 𝜏𝑗 > 0, 𝜖 𝑡~IID 0,1 , 𝑖 = 1,2, … , 𝑞, 𝑗 = 1,2, … , 𝑝.
To assess how explanatory variables influence the volatility of the
Bitcoin price we can employ a GARCH-X model:
𝜎𝑡
2
= 𝑤0 +
𝑖=1
𝑞
𝑤𝑖 𝑟𝑡−𝑖
2
+
𝑗=1
𝑝
𝜏𝑖 𝜎𝑡−𝑗
2
+ Λ𝑋𝑡
105
GARCH family
• Variety of GARCH model: SGARCH, EGARCH, GJRGARCH,
APARCH, IGARCH, CSGARCH, GARCH, TGARCH, etc.
• We select best model based on different model selection
criteria, e.g., AIC, BIC etc.
106
GARCH model for Bitcoin
The second model is the exponential GARCH model which investigates if the
return on Bitcoin is asymmetrically affected by good and bad news (known as
the leverage effect) (Dyhrberg, 2016).
107
GARCH model for Bitcoin cont. …
• Exchange rates suggest that
Bitcoin returns are more
sensitive to the value of the
dollar relative to the £, than
to value of the $ relative to
the €.
• Therefore regional or country
specific effects are present.
Mean equation
108
GARCH model for Bitcoin cont. …
• Bitcoin return will have a lower volatility than the dollar when there is a positive
volatility shock to the federal funds rate.
Variance equation
• a positive volatility shock to the
dollar-sterling exchange rate
decreases the variance of the
Bitcoin returns.
• This may indicate that Bitcoin is a
relatively safe asset in such a
situation (Dyhrberg, 2016).
Bitcoin volatility - machine learning methods
Features:
Methods considered: EWMA, ARIMA, ARIMAX, RF, GBT, XGT etc.
Guo, Tian, and Nino Antulov-Fantulin. Predicting short-term bitcoin price fluctuations from
buy and sell orders. arXiv preprint arXiv:1802.04065 (2018).
Modeling Realized volatility
Bitcoin vol.- machine learning methods cont. …
● The simple EWMA can beat all others in some intervals.
● Simply adding features from order book does not necessarily improve the
performance.
● Models like ARIMAX and STRX are prone to overfit by redundant data of long horizon,
while ensemble method XGT, and ENET are relatively robust to the horizon.
111
Models with Local blockchain network features
Local higher-order structures of complex networks, or multiple-node
subgraphs, are found to be an indispensable tool for analysis of
1. robustness of biological networks (Milo et al., 2002)
2. robustness of power grid (Dey et al., 2017)
3. functionality and early warning stability indicators in financial networks (Jiang
et al., 2014)
Milo, Ron, Shai Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, Dmitri Chklovskii, and Uri Alon. Network motifs:
simple building blocks of complex networks. Science 298, no. 5594 (2002): 824-827.
Dey, Asim Kumer, Yulia R. Gel, and H. Vincent Poor. Motif-based analysis of power grid robustness
under attacks. In Signal and Information Processing (GlobalSIP), 2017 IEEE Global Conference on, pp.
1015-1019. IEEE, 2017.
Jiang, X. F., T. T. Chen, and B. Zheng. Structure of local interactions in complex financial
dynamics. Scientific reports4 (2014): 5321.
112
Models with Local blockchain network features
Local higher-order structures of complex networks, or multiple-node subgraphs:
Three distinct types of 1-chainlets!
113
Blockchain network features
In contrast to fiat currencies, transactions of cryptocurrencies are permanently
recorded on distributed ledgers to be seen by the public. A natural analytics approach
is then to ask the following three interlinked questions (Akcora et al., 2018):
1. Do changes in chainlet characteristics exhibit any causal effect on future
cryptocurrency price and returns?
2. Do chainlets convey some unique information about future cryptocurrency
prices, given more conventional economic variables and non-network blockchain
characteristics?
3. Do chainlets dynamics of one cryptocurrency have influence on price and
volatility of other cryptocurrency?
Cuneyt G. Akcora, Asim Kumer Dey, Yulia R. Gel, and Murat Kantarcioglu. Forecasting Bitcoin Price
with Graph Chainlets. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 765-
776. Springer, Cham, 2018.
Asim Kumer Dey, Akcora, Cuneyt G., Yulia R. Gel, and Murat Kantarcioglu. On the Role of Local
Blockchain Network Features in Cryptocurrency Price Formation. 2019.
114
Chainlet Predictive Utility in Price
(Akcora et al., 2018)
115
Cryptocurrency Price Prediction with Chainlets
(Dey et al., 2019)
The comparison study based only on the Random Forest (RF) type of models.
116
Model comparison
Predictive utilities of a model over the baseline model can be
measured as
Ψ 𝑋→𝑌 =
𝜓 𝐵𝑖
𝜓 𝐵0
• 𝜓 is a measure of prediction error, e.g., root mean squared error
(RMSE).
• If Ψ 𝑋→𝑌 < 1, the 𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑡𝑒(𝑿) is said to improve prediction of 𝑌.
The percentage change in 𝜓 for a specific model w.r.t. 𝐵0 as
Δ = 1 − 𝛹 𝑋→𝑦 × 100%
(Akcora et al., 2018)
117
Cryptocurrency Price Prediction with Chainlets
• For short to moderate term (up to 15 days ahead) forecasting horizons, model B2,
solely based on Bitcoin occurrences, yields more accurate performance, although
closely followed by models B3 and B4 (Akcora et al., 2018).
• For longer term forecasting horizons, i.e., more than 15 days ahead, model B4,
containing information from both Bitcoin and Litecoin, delivers the most
competitive results, followed by model B2.
118
Analyzing Price Volatility with Chainlets
To assess how chainlets variables influence the volatility of the Bitcoin
price we employ a GARCH-X model with the explanatory variables
𝜎𝑡
2
= 𝑤0 +
𝑖=1
𝑞
𝑤𝑖 𝑟𝑡−𝑖
2
+
𝑗=1
𝑝
𝜏𝑖 𝜎𝑡−𝑗
2
+ Λ𝑋𝑡
where X =
[𝕆 ℂ1→7
₿
𝕆 ℂ20→3
₿
𝕆 ℂ3→3
₿
𝕆 Bitcoin cluster 7 𝔸 ℂ3→4
₿
𝔸 ℂ20→20
₿
]
,
Λ = 𝜆1 𝜆2 … 𝜆6
′.
• All the explanatory variables are in the form of log returns.
• GARCH(1,1) model.
• 𝜖 𝑡~ N(0,1).
(Dey et al., 2019)
119
Analyzing Price Volatility with Chainlets cont...
The model with chainlet covariates, Model X, tends to describe the Bitcoin
price volatility more accurately than the volatility model without chainlet
covariates i.e., Model 0 (Dey et al., 2019).
120
Future Research Directions
• Relationship between transaction networks of multiple
cryptocurrencies and health of crypto eco-system.
• Network features of cryptocurrencies transactions as a proxy for
market sensing.
• Ensemble forecasting of fiat currencies with cryptocurrencies
features.
Thanks for attending!
Yulia R. Gel
ygl@utdallas.edu
BlockchainTutorial.Github.io

Contenu connexe

Similaire à IEEE ICDM 2018 Tutorial on Blockchain Data Analytics

Blockchain Fundamentals - Day 3 - PoW consensus and ICOs (new style)
Blockchain Fundamentals - Day 3 -  PoW consensus and ICOs (new style)Blockchain Fundamentals - Day 3 -  PoW consensus and ICOs (new style)
Blockchain Fundamentals - Day 3 - PoW consensus and ICOs (new style)Chhay Lin Lim
 
Blockchain and bitcoin fundamentals (usages and applications)
Blockchain and bitcoin fundamentals (usages and applications)Blockchain and bitcoin fundamentals (usages and applications)
Blockchain and bitcoin fundamentals (usages and applications)Amir Rafati
 
CRYPTO CURRENCY-2022OD205.pdf
CRYPTO CURRENCY-2022OD205.pdfCRYPTO CURRENCY-2022OD205.pdf
CRYPTO CURRENCY-2022OD205.pdfJESUNPK
 
Overview of bitcoin
Overview of bitcoinOverview of bitcoin
Overview of bitcoinAbdul Nasir
 
With a transaction fee market and without a block size limit in Bitcoin netwo...
With a transaction fee market and without a block size limit in Bitcoin netwo...With a transaction fee market and without a block size limit in Bitcoin netwo...
With a transaction fee market and without a block size limit in Bitcoin netwo...ijgttjournal
 
Basics of Bitcoin & Mining
Basics of Bitcoin & MiningBasics of Bitcoin & Mining
Basics of Bitcoin & MiningAkhilesh Arora
 
SMART Seminar Series: "Blockchain and its Applications". Presented by Prof Wi...
SMART Seminar Series: "Blockchain and its Applications". Presented by Prof Wi...SMART Seminar Series: "Blockchain and its Applications". Presented by Prof Wi...
SMART Seminar Series: "Blockchain and its Applications". Presented by Prof Wi...SMART Infrastructure Facility
 
The Blockchain - The Technology behind Bitcoin
The Blockchain - The Technology behind Bitcoin The Blockchain - The Technology behind Bitcoin
The Blockchain - The Technology behind Bitcoin Jérôme Kehrli
 
Introduction into blockchains and cryptocurrencies
Introduction into blockchains and cryptocurrenciesIntroduction into blockchains and cryptocurrencies
Introduction into blockchains and cryptocurrenciesSergey Ivliev
 
Week 2 - Blockchain and Cryptocurrencies: Key Technical (and Historical) Conc...
Week 2 - Blockchain and Cryptocurrencies: Key Technical (and Historical) Conc...Week 2 - Blockchain and Cryptocurrencies: Key Technical (and Historical) Conc...
Week 2 - Blockchain and Cryptocurrencies: Key Technical (and Historical) Conc...Roger Royse
 
Every thing bitcoin in baby language
Every thing bitcoin in baby languageEvery thing bitcoin in baby language
Every thing bitcoin in baby languageOssai Nduka
 
Bitcoin and the Rise of the Block Chains
Bitcoin and the Rise of the Block ChainsBitcoin and the Rise of the Block Chains
Bitcoin and the Rise of the Block ChainsDallas Kennedy
 
An in depth presentation of Cryptocurrency.
An in depth presentation of Cryptocurrency.An in depth presentation of Cryptocurrency.
An in depth presentation of Cryptocurrency.SanjeebSamanta1
 
Bitcoin and Ransomware Analysis
Bitcoin and Ransomware AnalysisBitcoin and Ransomware Analysis
Bitcoin and Ransomware AnalysisInderjeet Singh
 
Bitcoin and Ransomware Analysis
Bitcoin and Ransomware AnalysisBitcoin and Ransomware Analysis
Bitcoin and Ransomware Analysisinder_barara
 
Token btlcoin btlcoin
Token btlcoin btlcoinToken btlcoin btlcoin
Token btlcoin btlcoinbtlcoin token
 
Blockchain talk open value meetup 31-8-17
Blockchain talk open value meetup 31-8-17Blockchain talk open value meetup 31-8-17
Blockchain talk open value meetup 31-8-17Roy Wasse
 

Similaire à IEEE ICDM 2018 Tutorial on Blockchain Data Analytics (20)

Blockchain Fundamentals - Day 3 - PoW consensus and ICOs (new style)
Blockchain Fundamentals - Day 3 -  PoW consensus and ICOs (new style)Blockchain Fundamentals - Day 3 -  PoW consensus and ICOs (new style)
Blockchain Fundamentals - Day 3 - PoW consensus and ICOs (new style)
 
Blockchain and bitcoin fundamentals (usages and applications)
Blockchain and bitcoin fundamentals (usages and applications)Blockchain and bitcoin fundamentals (usages and applications)
Blockchain and bitcoin fundamentals (usages and applications)
 
CRYPTO CURRENCY-2022OD205.pdf
CRYPTO CURRENCY-2022OD205.pdfCRYPTO CURRENCY-2022OD205.pdf
CRYPTO CURRENCY-2022OD205.pdf
 
Overview of bitcoin
Overview of bitcoinOverview of bitcoin
Overview of bitcoin
 
With a transaction fee market and without a block size limit in Bitcoin netwo...
With a transaction fee market and without a block size limit in Bitcoin netwo...With a transaction fee market and without a block size limit in Bitcoin netwo...
With a transaction fee market and without a block size limit in Bitcoin netwo...
 
BITCOIN EXPLAINED
BITCOIN EXPLAINEDBITCOIN EXPLAINED
BITCOIN EXPLAINED
 
Basics of Bitcoin & Mining
Basics of Bitcoin & MiningBasics of Bitcoin & Mining
Basics of Bitcoin & Mining
 
SMART Seminar Series: "Blockchain and its Applications". Presented by Prof Wi...
SMART Seminar Series: "Blockchain and its Applications". Presented by Prof Wi...SMART Seminar Series: "Blockchain and its Applications". Presented by Prof Wi...
SMART Seminar Series: "Blockchain and its Applications". Presented by Prof Wi...
 
The Blockchain - The Technology behind Bitcoin
The Blockchain - The Technology behind Bitcoin The Blockchain - The Technology behind Bitcoin
The Blockchain - The Technology behind Bitcoin
 
Introduction into blockchains and cryptocurrencies
Introduction into blockchains and cryptocurrenciesIntroduction into blockchains and cryptocurrencies
Introduction into blockchains and cryptocurrencies
 
Week 2 - Blockchain and Cryptocurrencies: Key Technical (and Historical) Conc...
Week 2 - Blockchain and Cryptocurrencies: Key Technical (and Historical) Conc...Week 2 - Blockchain and Cryptocurrencies: Key Technical (and Historical) Conc...
Week 2 - Blockchain and Cryptocurrencies: Key Technical (and Historical) Conc...
 
Blockchain
BlockchainBlockchain
Blockchain
 
Every thing bitcoin in baby language
Every thing bitcoin in baby languageEvery thing bitcoin in baby language
Every thing bitcoin in baby language
 
Crypto currency1
Crypto currency1Crypto currency1
Crypto currency1
 
Bitcoin and the Rise of the Block Chains
Bitcoin and the Rise of the Block ChainsBitcoin and the Rise of the Block Chains
Bitcoin and the Rise of the Block Chains
 
An in depth presentation of Cryptocurrency.
An in depth presentation of Cryptocurrency.An in depth presentation of Cryptocurrency.
An in depth presentation of Cryptocurrency.
 
Bitcoin and Ransomware Analysis
Bitcoin and Ransomware AnalysisBitcoin and Ransomware Analysis
Bitcoin and Ransomware Analysis
 
Bitcoin and Ransomware Analysis
Bitcoin and Ransomware AnalysisBitcoin and Ransomware Analysis
Bitcoin and Ransomware Analysis
 
Token btlcoin btlcoin
Token btlcoin btlcoinToken btlcoin btlcoin
Token btlcoin btlcoin
 
Blockchain talk open value meetup 31-8-17
Blockchain talk open value meetup 31-8-17Blockchain talk open value meetup 31-8-17
Blockchain talk open value meetup 31-8-17
 

Dernier

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 

Dernier (20)

April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 

IEEE ICDM 2018 Tutorial on Blockchain Data Analytics

  • 1. Blockchain Data Analytics Tutorial Cüneyt Gürcan Akçora, Yulia R. Gel, Murat Kantarcioglu Joint work with N. C. Abay, Y. Chen, M. Dixon, A. K. Dey, U. Islambekov, Y. Li, E. Smirnova, B. Thuraisingham Depts. of Computer Science and Math Sciences University of Texas at Dallas BlockchainTutorial.Github.ioIEEE ICDM 2018 Blockchain Day, Singapore
  • 2. 2 Outline • A brief history of Blockchain • Building blocks of Blockchain • Blockchain data models and structures • TXO and account based blockchains • Privacy and security in blockchains • Financial analytics on blockchains
  • 3. 1- Blockchain Data Analytics - Core Blockchain How Blockchains appeared? How do they work? What are the design considerations? What is the data stored on a blockchain?
  • 4. 4 Core Blockchain 10/31/2008: Satoshi Nakamoto posts the Bitcoin white paper to a forum. 1/3/2009: The first data block in the Bitcoin. Coin Timeline* Bitcoin: A peer to peer Electronic Cash System * By JEFF DESJARDINS. Image retrieved from VisualCapitalist.com and updated. Smart contracts, lightning networks, added privacy
  • 5. 5 Blockchain Network Every node runs the same software to verify data blocks. Each node is connected to a few other nodes only. New nodes appear and existing ones disappear all the time. There is no trusted node. Every node has the full copy of the data. Goal: Having a single truth about data, that can be verified by everyone.
  • 6. 6 Bitcoin: A financial application of Blockchain Blockchain: a distributed ledger (i.e., “a book laying or remaining regularly in one place”). Block Blockchain: a chain of data blocks False data Which peer should the node a believe about block 4? 1 2 3 4 Bitcoin: chain data contains financial transactions. 1 2 3 4 True data a
  • 7. 7 Bitcoin 2 bitcoins 1 bitcoins 2 bitcoins From: Cuneyt To: Joe (1BTC), Tim (2BTC). Use the 3 bitcoins I received in Block 1 transaction 3. Signed: Cuneyt From: Jim To: Chris (2BTC). Use the 2 bitcoins I received in Block 2 transaction 1. Signed: Jim 1MB block size = ~ 2K transactions Two inherent problems: • Authenticity (You really have the funds) • Double spending (You are not using the same funds twice) • Authenticity is solved with encrypted signatures, and showing the proof of funds. • Confirmation of payments requires more effort: the double spending problem. From: Cuneyt To: Joe (1BTC), Tim (2BTC). Use the 3 bitcoins I received in Block 1 transaction 3. Signed: Cuneyt From: Jim To: Chris (2BTC). Use the 2 bitcoins I received in Block 2 transaction 1. Signed: Jim
  • 8. 8 Core Blockchain 5 5 • If everyone can create blocks, the blockchain may never stabilize. Fork 1 Fork 2 4321 From: Jim To: Chris (2BTC). Use the 2 bitcoins I received in Block 2 transaction 1. Signed: Jim 5 5 From: Jim To: John (2BTC). Use the 2 bitcoins I received in Block 2 transaction 1. Signed: Jim Jim is malicious: He is trying to use the same coins in two payments. Jim is hoping that Chris and John will not notice the other payment. 1- If fork 1 becomes the canonical fork, John will be defrauded. 2- If fork 2 becomes the canonical fork. Chris will be defrauded.
  • 9. 9 Core Blockchain • We cannot have a stable chain if we cannot be certain about blocks. There cannot be multiple long forks with alternative truths. • Solution: Make block creation difficult. Allow the network sometime between blocks, so that the current state will be learned by all (or most) nodes. • How can we stop people from creating blocks? Ask a cryptographic puzzle! 65 65 Fork 1 Fork 2 4321
  • 10. 10 Core Blockchain From: Cuneyt To: Alice Date: 1/1/2027 …..mail content…. This mail has 35 words Proof-of-Work was first used in email spam detection If the proof of work is not attached, it is spam! Else count the words If the word count in proof of work is wrong Discard the email, spam! Else email might be spam, run spam detector. Proof of work: In this simple example, it is counting words. This algorithm is used by the email service provider
  • 11. 11 Core Blockchain Proof-of-Work: Spending time and effort to create (mine) a block. The idea is to slow down attackers. Bitcoin uses a hash puzzle for Proof of work. Hash(University) = 7FDD903AF601C14E71D4938B2F7AB58A78C03C36D43485BB1937826B90DEFDD0 Hash(Univarsity) = 7E984B4F8807A0092C65AE3D897DD186943D95435C0A56F8350A0C7F82ACEF03 Proof of work: Find a hash value that satisfies a given difficulty.
  • 12. 12 Core Blockchain Miner From: Jim To: Chris (2BTC). Use the 2 bitcoins I received in Block 2 transaction 1. Signed: Jim A node chooses to be a miner
  • 13. Mining • Mining is the process of gathering transactions that are in the system waiting, creating a block out of them and advertising it to the other nodes in the system. • Creating a block is the computational review process performed on transactions. • Each block is limited to 1MB. Can hold ~2K transactions only. A block can have 1 transaction only (as in many earlier blocks). Do you see a possible problem here? • Everyone can create transactions, but only miners can create blocks. (Nuclear scientist were caught running mining software in supercomputers).
  • 14. 14 Mining – Creating the block • Several issues must be addressed in mining: • Nothing is physical, the coins you spend may be fake (Verify source). • Even when the coins are not fake, you may have already spent it (Verify history). • Is the sender the real owner of these coins, is the receiver address correct? (Verify users). • Is the output, inputs amounts correct? (Verify the amount). Miner checks and verifies all these. There are many nodes, but few miners on most blockchains.
  • 15. 15 Core Blockchain For(nonce = 1 to infinity) blockHash = Hash( [hashOfBlock + hashOfPrevBlock + …]+ nonce) If(blockHash satisfies difficulty) block mined successfully! ) = hashOfBlockHash of block content ( • The miner increases the nonce until a useful blockHash is found. • If such a nonce does not exist, the miner can start over by re- arranging the blocks.
  • 16. 16 Core Blockchain 1 2 3 4 65 • Once the block is mined, the miner broadcasts it to all its peers. The block propagates in the network. • Mining a block does not guarantee that the block will be included in the blockchain. • Other miners need to build their blocks on top of the block. • Colluding miners can ignore a mined block. Furthermore, they can cooperate and build blocks on each other’s blocks only.
  • 17. 17 Core Blockchain 1 2 3 4 65 Miner 1 Miner 2 Miner 3 Miner 4 arrives at 𝑡1 to create a block, finds 3 competing last blocks. 𝑡1 Depending on which block to build on, Miner 4 has to exclude transactions that have already been mined. 𝑡0 ?
  • 18. 18 Core Blockchain 1 2 3 4 65 Miner 1 Miner 2 Miner 3 Let’s suppose that Miner 4 chooses to build on the block of Miner 1. 𝑡1 𝑡2 Miner 4 • Miner 5 arrives at 𝑡2 and sees 3 forks – The logical choice is to build on the longest fork of Miner 1 and 4*. • Miner 5 may still choose to build on other forks – may be a costly mistake. 𝑡0 *both length and difficulty are considered.
  • 19. Proof of Work: An example • How difficult is proof of work? Consider “Hello world!”+nonce • If the difficulty is three zeros (000….), we try 4251 nonce values • Hash("Hello, world!0”) => 1312af178c253f84028d480a6adc1e25e81caa44c749ec81976192e2ec934c64 • Hash("Hello, world!4250”) => 0000c3af42fc31103f1fdc0151fa747ff87349a4714df7cc52ea464e12dcd4e9 Bitcoin uses an adaptive difficulty that changes with how much computing power exists in the mining business.
  • 20. 20 Core Blockchain – adjusting difficulty Hash of Bitcoin Block #547873 (October 2018) [20 zeros] 0000000000000000000064eb6ef4f94808938de0889695dd7bb8dca70b334cb2 Hash of Bitcoin Block #3 (January 2009) [8 zeros] 00000000b3322c8c3ef7d2cf6da009a776e6a99ee65ec5a32f3f345712238473 Hash of Bitcoin Block #350000 (March 2015) [17 zeros] 0000000000000000053cf64f0400bb38e0c4b3872c38795ddde27acb40a112bb • The desired rate is one block every 10 minutes. This is periodically checked every 2016 blocks (2 weeks). • If 2016 blocks took less than two weeks, the difficulty is increased.
  • 21. 1 10 100 1000 10000 100000 1000000 10000000 100000000 1E+09 1E+10 1E+11 1E+12 1E+13 1/27/2009 1/27/2012 1/27/2015 1/27/2018 Difficulty Time Proof of Work: Bitcoin difficulty in time Decreases are possible With max possible difficulty we will need to try > 1077 nonce values. Bitcoin: more than 1021 tries to find a valid nonce! Data from BTC.com
  • 22. 22 • Block reward halves every 4 years. Starting with 50 bitcoins per block, this will create 21M bitcoins in total. • Transaction fee is the amount unspent from inputs to outputs. • The fee may also be zero – but why would anyone mine your transaction? From: Cuneyt To: Joe (0.8 BTC), Tim (2 BTC). Use the 3 bitcoins I received in Block 1 transaction 3. Signed: Cuneyt 0.8 bitcoin 2 bitcoins transaction fee = 0.2 bitcoins Block reward Sum of all transaction fees Incentives for mining
  • 23. 23 • Around May 2020 the block reward will halve to 6.25 bitcoins. • 2140 is the year when the reward will be practically zero. • Transactions fees will carry the system after block rewards become trivial. • November 2018: block reward is 12.5B, transaction fees are ~0.05B. • Fees are trivial if the market volume is low. Users leave aside lower fees. • In December 2017, fees were around 5-7B.
  • 24. 24 Bitcoin mining • One winning miner every 10 mins. Many others lose and waste electricity. • Eric Jennings: “The cost for having no central authority is the cost of that energy”. • Tim Swanson: “Bitcoin is a peer-to-peer heat engine”. • Narayanan: “Bitcoin mining has been an expensive way to bet that the price of Bitcoin would rise”.
  • 25. 25 Proof-of-X Proof-of-X is an umbrella term that covers Proof-of-Work alternatives in block mining. Each alternative scheme expects miners to show a proof that they have done enough work or spent enough wealth before creating the block. • Proof-of-Stake: Stake = Coin×Age. The miner with the highest stake becomes the next miner in the chain. Once coins are used, their age becomes zero. Rich gets richer! • Proof-of-Burn: The miner sacrifices wealth: creates a transaction and sends some coins to a “verifiably unspendable” address. Reduces total supply! • Proof-of-Ownership, Proof-of-Publication, and others…
  • 26. 26 Blockchains – why stop at cryptocurrencies? Every node runs the same software to verify data blocks. Each node is connected to a few other nodes only. New nodes appear and existing ones disappear all the time. There is no trusted node. Every node has the full copy of the data.
  • 27. 27 0.8 bitcoin 2 bitcoins From: Cuneyt To: Joe (0.8 BTC), Tim (2 BTC). Use the 3 bitcoins I received in Block 1 transaction 3. Signed: Cuneyt data Bitcoin: data are financial transactions. Tschorsch, Florian, and Björn Scheuermann. Bitcoin and beyond: A technical survey on decentralized digital currencies. IEEE Communications Surveys & Tutorials 18, no. 3 (2016): 2084-2123.
  • 28. 28 - Notary Documents - Pictures - Identity Documents - Shipping logs - Manufacturing logs - IOT data Data can be more: 1- On-chain storage 2- Off-chain storage:  Store hashes of data (as proof)  Store the address of data (Our data resides as IP: 145.178.14.29)
  • 29. 30 Blockchain Network – Beyond Cryptocurrencies • Ethereum has been created to store data and software code on a blockchain. • Similar to Bitcoin, Ethereum has a currency: Ether. • The code (a smart contract) is written in the proprietary coding language Solidity, which is compiled to bytecode and executed on the Ethereum Virtual Machine. • An analogy is the MYSQL snippets stored on a database. Solidity
  • 30. 31 Blockchain Network – Smart Contract • User creates a transaction to upload the Smart Contract code to an address. • The code at the address is replicated in all blockchain nodes. • In other words, you force other users to store your code. • The code is executed by passing transaction messages to its functions. Execution occurs at all nodes – hence the World Computer! • Contract creation is expensive. • All subsequent calls to the contract code are billed in terms of what operations they require.
  • 31. 32 Blockchain Network – Contracts • Each operation has a gas price for executing it. • For example, using the ‘addition’ operation costs you 3 gas. Image: https://hackernoon.com/
  • 32. 33 Ethereum – the World Computer Benefits of having code on a blockchain Public code Code can be analyzed by everyone. Unmodifiable code Code cannot be modified without leaving a trace. Unstoppable execution Code will run to completion. Verifiable results Results can be verified by all parties. It is easy to see why platform creators called the code Smart Contract!
  • 33. 34 Ethereum – the World Computer • Contracts gave rise to Smart Contract based tokens: exchanged data units that are used to buy/sell services in the real world. • For example, Storj token stores files on your hard disk, and pays you a fee through Ethereum. • Tokens can be bought or sold; they act as value stores. Token prices are arbitrated in the real world. • Companies create tokens, and sell them in Initial Coin Offerings to raise capital.
  • 34. 35 Blockchain tokens New Ethereum token contracts in time (>5K transactions in early 2018)
  • 35. 36 Blockchain tokens Ethereum token transactions in time
  • 36. 37 Platforms– Standardization Continues • Initially, tokens could implement a vital function (e.g., transfer) with any name (e.g., sell, transferTo, sendTo). • ERC20 standard enforces a list of functions that must be implemented by a token: 2018 May. Data from our Chartalist project
  • 37. 38 Blockchain tokens and platforms Left Ethereum
  • 38. 2- Blockchain Graph Analytics Are transactions the same on all blockchains? How can we model Blockchain data?
  • 39. 40 Blockchain Graph Analytics • For data modelling, blockchains can be divided into two major categories: Account based blockchains (e.g., Ethereum) Transaction output (TXO) based blockchains (e.g., Bitcoin, Litecoin)
  • 40. 41 2a - Transaction output based blockchains
  • 41. 42 Transaction output (TXO) based blockchains 0.8 bitcoin 2 bitcoins 3B 0.8B 2B Transaction 1 Address 0.2B tx fee Next, if address b wants to spend its received 2B, it needs to show proof of funds: “Use the 2B I received from Block 1, transaction 1 and to pay 1.5B to c and 0.3B to d”. a b 3B 0.8B 2B 2B 1.5B 0.3B c d b a b
  • 42. 43 Transaction output (TXO) based blockchains • Genesis block 0: The first block, created by Nakamoto. • Every block has one coinbase transaction that creates bitcoins (sum of block reward + transaction fees). • All other payments must show proof of funds (previous outputs). Coinbase transaction Block n Block n+1 Time
  • 43. 44 Three Graph Rules for TXO 1- Source Rule: Coins can be gained from multiple transactions. These can be spent at once or separately (dashed edges connect to unspecified addresses). b Address b can spend bitcoins at 𝑡𝑥1(once), or at 𝑡𝑥1 and 𝑡𝑥2. 𝑡𝑥1 𝑡𝑥2
  • 44. 45 Three Graph Rules for TXO 2- Balance Rule: All coins gained from a transaction must be spent in a single transaction. Addresses cannot keep change, must forward it. Same user? Address reuse is rare c d e i - c sold all its coins: c, d and e all belong to different people, or ii - c paid to d, and forwarded the change to its new address e. In many scenarios, we have to learn which addresses belong to the same entity. Two cases:
  • 45. 46 Three Graph Rules for TXO 3 – Mapping Rule: Multiple inputs can be signed separately and merged, but the input-output address mappings are not recorded. A transaction can be considered a lake with incoming rivers, and outgoing emissaries. Coins mix in this lake. 1B 1B 1B 1B Heuristics are developed to link inputs to outputs – we will cover them in the privacy section.
  • 46. 47 Existing Graph Approaches Transaction graph: Edges between transactions only. Transaction graph Cannot capture unspent coins. Cannot distinguish transactions with differing inputs/outputs. Blockchain graph Different inputs/outputs Dorit Ron and Adi Shamir. 2013. Quantitative analysis of the full bitcoin transaction graph. In International Conference on Financial Cryptography and Data Security. Springer, 6–24.
  • 47. 48 Existing Graph Approaches 2- Address graph: Edges between addresses only. Address graphBlockchain graph  Edges are multiplied between inputs and outputs: creates 1 million edges for a 1000 input, 1000 output transaction.  Creates bias for average degree, even for median degree. Michele Spagnuolo, Federico Maggi, and Stefano Zanero. 2014. Bitiodine: Extracting intelligence from the bitcoin network. In International Conference on Financial Cryptography and Data Security. Springer, 457–4
  • 48. 49 Existing Graph Approaches Graph Analysis with single node type: Not always useful for the forever forward branching tree of Bitcoin. 2- Address graph: is it worth the trouble searching for graph motifs?  Addresses are not supposed to re-appear in future.  Closed triangles are very rare.  Output/input address sets do not have edges to each other – our tools do not consider this, and search for edges in vain (linked transactions within a block are possible but rare).
  • 49. 50 Blockchain Graph – Substructure mining Definition [K-Chainlets]: Let k-chainlet Gk = (Vk, Ek, B) be a subgraph of G with k nodes of type {Transaction}. If there exists an isomorphism between Gk and G’, G’ ∈ G, we say that there exists an occurrence, or embedding of Gk in G. If a Gk occurs more/less frequently than expected by chance, it is called a Blockchain k-chainlet. A k-chainlet signature fG(Gk) is the number of occurrences of Gk in G. • Rather than individual edges or nodes, we use a subgraph as the building block in our Bitcoin analysis. • We use the term chainlet to refer to such subgraphs. Cuneyt G. Akcora, Asim Kumer Dey, Yulia R. Gel, and Murat Kantarcioglu. Forecasting Bitcoin Price with Graph Chainlets. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 765- 776. Springer, Cham, 2018.
  • 50. 51 Blockchain Chainlets • Chainlets have distinct shapes that reflect their role in the network. • We aggregate these roles to analyze network dynamics. Tx 1 Tx 1 Tx 2 Tx 2 Tx 3 Tx 3 Tx 4 Tx 4 Three distinct types of 1-chainlets!
  • 51. 52 Aggregate Chainlets Transition. Ex: Chainlet C3→3 Cx→y : chainlet with x inputs and y outputs. • Transition Chainlets imply coins changing address: x = y. Split. Ex: Chainlet C1→2 • Split Chainlets may imply spending behavior: y > x. But, community practice against address reuse can also create split chainlets. Merge. Ex: Chainlet C3→1 • Merge Chainlets imply gathering of funds: x > y.
  • 52. 53 Aggregate Chainlets Percentage of aggregate chainlets in the Bitcoin Graph (weekly snapshots) Around here 2 pizzas are worth 10 thousand bitcoins. Non è il bel paese!
  • 53. Outputs 1 2 3 inputs 321 54 Chainlet Matrix • For a given time granularity, such as one day, we take snapshots of the Bitcoin graph. • Chainlet counts obtained from the graph are stored in an N×N matrix. Representing the network in time 2 1 1 0 0 0 0 0 0 N: How big should the matrix be?
  • 54. 55 Extreme Chainlets • N can reach thousands, the matrix can be 1000 × 1000. • On Bitcoin, % 90.50 of the chainlets have N of 5 (x < 5 and y < 5), and % 97.57 for N of 20. Outputs 1 2 3 inputs 321 2 1 1 0 0 0 0 0 0 4 Extreme chainlets are the last column/row of the chainlet matrix. They imply big coin movements in the graph! Occurrence matrix 𝑂[𝑖, 𝑗] = #𝐶𝑖→𝑗 if 𝑖 < 𝑁 𝑎𝑛𝑑 𝑗 < 𝑁 𝑧=𝑁 ∞ #𝐶𝑖→𝑧 if 𝑖 < 𝑁 𝑎𝑛𝑑 𝑗 = 𝑁 𝑦=𝑁 ∞ #𝐶 𝑦→𝑗 if 𝑖 = 𝑁 𝑎𝑛𝑑 𝑗 < 𝑁 𝑦=𝑁 ∞ 𝑧=𝑁 ∞ #𝐶 𝑦→𝑧 if 𝑖 = 𝑁 𝑎𝑛𝑑 𝑗 = 𝑁
  • 55. 56 Extreme Chainlets Bitcoin companies stopped all business in New York State because of new regulations. The New York Business Journal called this the "Great Bitcoin Exodus". Percentage of extreme chainlets in the Bitcoin Graph (N = 20, daily snapshots)
  • 56. 57 Clustering the Chainlets • A hierarchical clustering of chainlets by using Cosine Similarity over chainlet signatures in time. • We used a similarity cut threshold of 0.7 to create clusters from the hierarchical dendrogram. Chainlet clusters for daily snapshots Chainlet clusters for weekly snapshots Most common chainlets Extreme and correlated chainlets
  • 57. 58 2b - Account based blockchains
  • 58. 59 Account based blockchains • On account based blockchains, transactions involve one input address and one output address. • An address spends coins from a balance, keeps the change. • Each transaction of an address has an order (called nonce). The nonce is the number of transactions sent to the network by the address. • A later transaction needs to wait for earlier transactions to be mined. 4E
  • 59. 60 Internal transactions • Account based blockchains have two types of “transactions”. • The first transaction type involves a transfer of the used cryptocurrency, such as Ether on Ethereum. • The second type are internal transactions, which involve a transfer of smart contract based tokens. • Internal transactions are created when smart contracts change states of addresses. • Internal transactions can be discovered in two ways: by parsing ordinary transactions’ messages, or by running the transaction message through the smart contract code. • The parsing method cannot discover failed transactions.
  • 60. 61 Internal transactions • A transaction can transfer both currencies and tokens. 4E Ordinary address Contract addressA transaction message on Ethereum 4E 4E 2 Contracts can start events as well, these are explicitly recorded. An internal transaction can create multiple edges, although this is rare on Ethereum.
  • 61. 62 Trading tokens – a timeline 0Ether Send my 2 tokens to address a b 0.2Ether a b 0.3Ether I want to buy 2 Storj tokens b Storj token Balances: b: 2 Storj Balances: b: 0 Storj a: 2 Storj a pays 0.2E to b to buy its tokens. From 𝑡1 to 𝑡2, Storj price decreased in the market from 0.15E to 0.1E 𝑡1 𝑡2 𝑡0 All edges on the Ethereum graph. a b 2 0.3E 0.2E
  • 62. 63 Account based blockchains The largest connected component in Storj network on 13-1-2018. • We model account based blockchains as directed, weighted, multi- graphs. • The network of a single token is usually sparse, and devoid of community structure. • Daily networks may contain many disconnected components.
  • 63. 64 Inter-token networks • Account based blockchains are global market places where goods are exchanged in terms of tokens. Ethereum is a successful example. • Blockchain platforms will allow us to view global market activity in real time.
  • 64. 65 Research questions • Account based blockchains lend themselves to traditional network analysis tools and algorithms. • Motif analysis, core decomposition, centrality and clustering algorithms can easily be adapted to work on account based blockchains. • High granularity temporal data allows time series analyses. • The rich variety of cryptotokens being traded on the network brings many interesting research problems: Token price prediction, price manipulation detection, token network health and robustness analysis, inter-token impact analysis, investor behavior analysis.
  • 65. 66 3- Blockchain Privacy and Security Permissionless (public) Blockchains Permissioned (private) Blockchains Bitcoin, Litecoin, Ethereum Hyperledger, R3 • By definition any user can join a public blockchain (e.g., Bitcoin). • For corporate settings, the transparency means that rivals can learn company finances and buy/sale relationships. • The permissioned blockchains were created for industrial settings. • Permissioned: Less power consumption, more secure, privacy aware, but for all purposes a gated community.
  • 66. 3- Blockchain Privacy and Security • In public blockchains, data is considered public. • Tapscott: There are no honeypots of personal data on the blockchain. • Public blockchains are pseudo-anonymous: There is no registration to join the network, but all your transactions are public. • For security, TOR can be used to send transactions to the P2P network. • As a threat, most online exchanges are governed by know-your- customer rules that require customer registration.
  • 67. Blockchain communication graph • At its core, Bitcoin maintains a peer-to-peer architecture. Bitcoin peers create persistent TCP channels with each other and relay transactions. • Each peer seeks a minimum of 8, a maximum of 125 peers. • Each node forwards transactions arriving from a neighbor to other neighbors. • Transactions that await mining in the P2P network are contained in the mempool. • The first sender of a transaction is most likely to be the transaction owner.
  • 68. Blockchain communication graph • Nodes forward incoming transactions selectively to hinder time based address inference. This is called trickling. • In this network, b is connected to all neighbors of a – by observing relayed transactions, b can deduce that transaction t3 originated from a. Andrew Miller, James Litton, Andrew Pachulski, Neal Gupta, Dave Levin, Neil Spring, and Bobby Bhattacharjee. 2015. Discovering bitcoin’s public topology and influential nodes. (2015). t2, t5, t4 t1, t2, t5 t1, t6, t5 t1, t3, t5 t1, t6, t4 t6, t7, t8 t1, t6, t4 a b
  • 69. Blockchain TXO content graph • Can we tell which addresses are controlled by the same user/entity/organization? • In order to answer this question, we first need to map inputs to outputs. Where do the bitcoins at address a come from? a From nine addresses! Fungibility: Is a specific bitcoin worth a bitcoin everywhere? Taint analysis studies a bitcoin’s history.
  • 70. Blockchain TXO content graph Heuristics are used to detect which input and output addresses are controlled by the same user. Meiklejohn, Sarah, Marjori Pomarole, Grant Jordan, Kirill Levchenko, Damon McCoy, Geoffrey M. Voelker, and Stefan Savage. A fistful of bitcoins: characterizing payments among men with no names. In Proceedings of the 2013 conference on Internet measurement conference, pp. 127-140. ACM, 2013. 1B 4B 3B 2B 1B 1B 1B 1B Considering amounts may help in basic cases. Schemes exist to use multiple rounds of flows with equal amounts to hide tracks.
  • 71. Heuristics to link addresses Addresses a, b and c belong to the same user. 1- Idioms of Use: posits that all input addresses in a transaction should belong to the same entity because only the owner could have signed the inputs with the associated private keys. a b c
  • 72. Heuristics to link addresses Addresses a, b, c, d and e belong to the same user. 2- Transitive Closure: extends Idioms of Use: if a transaction has inputs from a and b, whereas another transaction has from a and c, b and c belong to the same user. a b c d e
  • 73. Heuristics to link addresses The heuristic then posits that the one-time change (output) address— if one exists— is controlled by the same user as the input addresses. 3- Change address: the following four conditions must be met: (1) the output address has not appeared in any previous transaction; (2) the transaction is not a coin generation; (3) there is no self-change address in the outputs (4) all the other output addresses in the transaction have appeared in previous transactions.
  • 74. Obfuscation efforts • A measure to prevent matching addresses to users is known as Coin Mixing, or its improved version, CoinJoin. • The initial idea in mixing was to use a central server to mix inputs from multiple users. 2B 2B 2B 2B 5B 2B 3B 4B 2B 1B 1B 2B 3B 2B 2B 3B Ruffing, Tim, Pedro Moreno-Sanchez, and Aniket Kate. CoinShuffle: Practical decentralized coin mixing for Bitcoin. In European Symposium on Research in Computer Security, pp. 345-364. Springer, Cham, 2014.
  • 75. Obfuscation efforts – peeling chains • In a peeling chain, a single address begins with a relatively large amount of bitcoins. • A smaller amount is then “peeled” off this larger amount, creating a transaction in which a small amount is sent to one address and the remainder is sent to a one-time change address. • This process is repeated— potentially for hundreds or thousands of hops— until the larger amount is pared down Di Battista, Giuseppe, Valentino Di Donato, Maurizio Patrignani, Maurizio Pizzonia, Vincenzo Roselli, and Roberto Tamassia. Bitconeview: visualization of flows in the bitcoin transaction graph. In Visualization for Cyber Security (VizSec), 2015 IEEE Symposium on, pp. 1-8. IEEE, 2015. Narayanan, Arvind, and Malte Möser. Obfuscation in bitcoin: Techniques and politics. arXiv preprint arXiv:1706.05432 (2017).
  • 76. Obfuscation efforts – peeling chains 25B 0.5B 0.5B 0.5B 0.5B … Repeated patterns are frequently found on the Bitcoin blockchain. Exit to fiat currency McGinn, Dan, David Birch, David Akroyd, Miguel Molina- Solana, Yike Guo, and William J. Knottenbelt. Visualizing dynamic bitcoin transaction patterns. Big data 4, no. 2 (2016): 109-119.
  • 77. Network clustering of addresses • By nature all user clustering heuristics are error prone. • Some community practices further complicate the issue. • For example, online wallets, such as coinbase.com, buy/sell coins among its customers without using transactions; ownership of an address is changed by transferring the associated private keys to another user. • Although the user associated with the address changes, nothing gets recorded in the blockchain. • Clustering can be further improved by considering IP locations and temporal patterns.
  • 78. Research directions in taint analysis • Money laundering M. Moser, R. Bohme, and D. Breuker. An inquiry into money laundering tools in the Bitcoin ecosystem. In: eCRS. IEEE. 2013, pp. 1-14. Huang, Danny Yuxing, Maxwell Matthaios Aliapoulios, Vector Guo Li, Luca Invernizzi, Elie Bursztein, Kylie McRoberts, Jonathan Levin, Kirill Levchenko, Alex C. Snoeren, and Damon McCoy. Tracking ransomware end-to-end. In 2018 IEEE Symposium on Security and Privacy (SP), pp. 618-631. IEEE, 2018. • Ransomware payments • Illicit trade/use Portnoff, Rebecca S., Danny Yuxing Huang, Periwinkle Doerfler, Sadia Afroz, and Damon McCoy. Backpage and bitcoin: Uncovering human traffickers. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1595-1604. ACM, 2017. • Personal blackmail S. Phetsouvanh, F. Oggier and A. Datta. EGRET: Extortion Graph Exploration Techniques in the Bitcoin Network. IEEE ICDM Workshop on Data Mining in Networks (DaMNet). IEEE, 2018.
  • 79. Thanks for attending! Cuneyt.Akcora@utdallas.edu Further reading -> Blockchain: A graph primer. C. G. Akcora, Y. R. Gel, M. Kantarcioglu. [Updated regularly, online] ArXiv:1708.08749, pp 1-16, 2017. BlockchainTutorial.Github.io
  • 80. 81 Outline 1. Descriptive summaries 2. Price models 3. Risk models  Value at Risk estimates  GARCH family 4. Models using local blockchain network features 4- Statistical Analysis of the Cryptocurrency Price Formation
  • 81. Price and returns of cryptocurrency Let 𝑦𝑡 be the price of a cryptocurrency. Returns of prices measure the relative change in prices. • Simple returns: 𝑅𝑡 = 𝑦𝑡 − 𝑦𝑡−1 𝑦𝑡−1 = 𝑦𝑡 𝑦𝑡−1 − 1 Benefit of using returns versus prices is normalization. Measures all variables in a comparable metric. • Log returns: 𝑟𝑡 = log 𝑦𝑡 − log 𝑦𝑡−1 = log 𝑦𝑡 𝑦𝑡−1 Log returns are additive. Again if we assume that prices are distributed log normally (which, in practice, may or may not be true for any given price series), then log transformation results in approximately normal returns, which are easier to work with. 82
  • 82. 83 Bitcoin price and log returns The price has an upward trend and it is volatile which is also clear from log returns.
  • 83. Summary statistics – log returns The summary statistics are the largest for Bitcoin, followed by Dash, Litecoin, Monero, Ripple, Maidsafecoin and Dogecoin (Chu et al., 2017). The log returns for each cryptocurrency are positively skewed. 84 Chu, Jeffrey, Stephen Chan, Saralees Nadarajah, and Joerg Osterrieder. GARCH modelling of cryptocurrencies. Journal of Risk and Financial Management 10, no. 4 (2017): 17.
  • 84. 85 Summary statistics – log returns cont.… Log returns are more or less symmetrically distributed. Some histograms appear more peaked than others. The histogram of the log returns of the exchange rates from June 2014 – May 2017. Chu, Jeffrey, Saralees Nadarajah, and Stephen Chan. Statistical analysis of the exchange rate of bitcoin. PloS one10, no. 7 (2015).
  • 85. 86 Models for cryptocurrency price and volatility • Price models • Risk models: a) Value at Risk estimates via fitting parametric distributions. b) GARCH family • Models using local block chain network features
  • 86. 87 Predictive models In order to avoid spurious regression, we need to test stationarity and interdependency properties (Jang and Lee, 2017). Jang, Huisu, and Jaewook Lee. An empirical study on modeling and prediction of bitcoin prices with bayesian neural networks based on blockchain information. IEEE Access 6 (2018): 5427-5437. According to regressions of interdependent and non- stationary time series may lead to spurious results. Engle, Robert F., and Clive WJ Granger. Co-integration and error correction: representation, estimation, and testing. Econometrica: journal of the Econometric Society (1987): 251-276.
  • 87. 88 Time plots Bitcoin price has a positive time trend and shows a clear non-stationarity. Every log returns graph shows periods of very high volatility and periods of relative tranquility which is a common feature among financial assets. Dyhrberg, Anne Haubo. Bitcoin, gold and the dollar–A GARCH volatility analysis. Finance Research Letters 16 (2016): 85-92.
  • 88. 89 Transaction activity and price Price, number of transaction and number of unique address exhibit similar upward pattern. The log returns shows that they are volatile. Koutmos, Dimitrios. Bitcoin returns and transaction activity. Economics Letters 167 (2018): 81-85.
  • 89. 90 Stationarity and cointegration tests • Test for the stationarity: The augmented Dickey-Fuller (ADF) test: ∆𝑦𝑡 = 𝛼 + 𝛽𝑡 + 𝛾𝑦𝑡−1 + 𝛿1∆𝑦𝑡−1 + … + 𝛿 𝑝−1∆𝑦𝑡−𝑝+1 + 𝜖 𝑡 where 𝛼 is a constant, 𝛽 the coefficient on a time trend and 𝑝 the lag order of the autoregressive process. 𝐻0: 𝛾 = 0 against 𝐻𝐴: 𝛾 < 0. Test statistics: 𝐷𝐹 = 𝛾/𝑠𝑒( 𝛾) • Cointegration test: Two time series are considered to be cointegrated if there exists a long-run equilibrium relationship between them. Engle-Granger Cointegration test: If 𝑥𝑡 and 𝑦𝑡 are non-stationary and cointegrated, then a linear combination of them must be stationary. In other words: 𝑦𝑡 − 𝛽𝑥𝑡 = 𝑢 𝑡 where 𝑢 𝑡 is stationary.
  • 90. 91 Granger Causality Test The causality test assesses whether one time series is useful in predicting another. F 𝑡+ℎ ∙ |ℱ 𝒀,𝑿,𝒁 𝟏,…,𝒁 𝒌 𝑡−1 = F 𝑡+ℎ ∙ |ℱ(𝒀,𝒁 𝟏,…,𝒁 𝒌) 𝑡−1 • Then, 𝑿 𝒕−𝟏 is said not to Granger cause (G-cause) 𝒀 𝒕+𝒉 with respect to ℱ(𝒀,𝒁 𝟏,…,𝒁 𝒌) 𝑡−1 . • Otherwise, 𝑿 is said to G-cause 𝒀, which can be denoted by 𝐺 𝑿→𝒀. • → represents the direction of causality.
  • 91. 92 Granger Causality Test cont.… For univariate case consider time series 𝑦𝑡, 𝑥𝑡 and 𝑧𝑡. To test G-causality of 𝑥𝑡, we compare the fit of the full model 𝑦𝑡 = 𝛼0 + 𝑘=1 𝑑 𝛼 𝑘 𝑦𝑡−𝑘 + 𝑘=1 𝑑 𝛽 𝑘 𝑥𝑡−𝑘 + 𝑘=1 𝑑 𝛾 𝑘 𝑧𝑡−𝑘 + 𝑒𝑡 versus the fit of the reduced model 𝑦𝑡 = 𝛼0 + 𝑘=1 𝑑 𝛼 𝑘 𝑦𝑡−𝑘 + 𝑘=1 𝑑 𝛽 𝑘 𝑥𝑡−𝑘 + 𝑒𝑡 • Under the null hypothesis of no predictive effect in 𝑥 onto 𝑦 (i.e., x does not G-cause 𝑦), 𝑉𝑎𝑟 𝑒𝑡 = 𝑉𝑎𝑟 𝑒𝑡 . • If 𝑉𝑎𝑟 𝑒𝑡 is (statistically) significantly lower than 𝑉𝑎𝑟 𝑒𝑡 , then 𝑥 • contains additional information that can improve forecasting of 𝑦, i.e., 𝐺 𝑿→𝒀.
  • 92. 93 Granger Causality Test - example In both case we can conclude that Bitcoin price realized volatility Granger-causes the VIX and the VIX Granger causes Bitcoin price realized volatility. Estrada, Julio Cesar Soldevilla. Analyzing Bitcoin Price Volatility. University of California, Berkeley (2017).
  • 93. 94 Models • For stationary case standard OLS estimator can be used to estimate the model. • For non-stationary and non-cointegrated series we estimate a multivariate vector auto regressive (VAR) model. • When the time series are considered to be cointegrated the Vector Error Correction (VEC) model is suitable for estimation. • A combination of above models is also used.
  • 94. Machine learning methods • Neural network (NN) and its different versions, e.g., RNN, BNN, CNN, etc. • Random Forest (RF) • Support Vector Regression (SVR) 95
  • 95. 96 Model evaluation Root mean squared error (RMSE) Mean absolute percentage error (MAPE) where 𝑦𝑡 is the Bitcoin price and 𝑦𝑡 is the corresponding predicted value. = 1 𝑛 𝑡=1 𝑛 𝑦𝑡 − 𝑦𝑡 2 = 1 𝑛 𝑡=1 𝑛 𝑦𝑡 − 𝑦𝑡 𝑦𝑡
  • 96. Bayesian neural networks (BNN) 97 • BNN models outperform other models in terms of RMSE and MAPE for predicting the log price of Bitcoin for 1-day ahead forecast. • BNN is more reliable for describing the process of log volatility than other benchmark models (Jang and Lee, 2017)
  • 97. Bayesian neural networks (BNN) 98 We observe that Bayesian neural networks capture the patterns of the Bitcoin prices better than other models (Jang and Lee, 2017).
  • 98. 99 Risk Models: Tests for randomness and no serial correlation p-values for randomness tests and serial correlation tests are all greater than 0.05, therefore Bitcoin log returns and squares of log returns are random and uncorrelated. (Chu et al., 2015).
  • 99. Model selection techniques 100 Akaike’s information criteria (AIC) and the Bayesian information criterion (BIC): 𝐴𝐼𝐶 = −2ℓ + 2𝑘 𝐵𝐼𝐶 = −2ℓ + 𝑘 ln 𝑛 where, ℓ is the maximized log-likelihood function of the model and 𝑘 is the number of parameters in the model. • The model with smallest AIC and BIC are considered as the “best" model. Graphical approach: • Quantile-Quantile (Q-Q) plot. • Observed versus fitted density are the popular techniques for model diagnostic.
  • 100. 101 Model selection for log returns Overall, the generalized hyperbolic distribution gives the best fit by having the smallest values for ln L, AIC, AICc, BIC (Chu et al., 2015).
  • 101. 102 Model selection for Bitcoin log returns cont.… The QQ plot, probability plot and the density plot of the fitted generalized hyperbolic distribution suggest that the fit is good. The fit appears reasonable also in the tails (Chu et al., 2015).
  • 102. 103 Bitcoin Value at Risk (VaR) Value at Risk is the maximum loss, which should not be exceeded during a specified period of time with a given probability level. Let 𝑓(𝑥) be the probability density function of this distribution. 𝑃 𝑋 ≤ −𝑉𝑎𝑅 1 − 𝛼 = 𝛼 −∞ −𝑉𝑎𝑅 1−𝛼 𝑓 𝑥 𝑑𝑥 = 𝛼 The fitted values for the VaR appears very close to the historical estimates (Chu et al., 2015).
  • 103. 104 GARCH Modelling The GARCH (𝑝, 𝑞) model for a Bitcoin price returns, 𝑟𝑡, is defined as 𝑟𝑡 = 𝜎𝑡 𝜖 𝑡 𝜎𝑡 2 = 𝑤0 + 𝑖=1 𝑞 𝑤𝑖 𝑟𝑡−𝑖 2 + 𝑗=1 𝑝 𝜏𝑖 𝜎𝑡−𝑗 2 where 𝑤0 > 0, 𝑤𝑗 > 0, 𝜏𝑗 > 0, 𝜖 𝑡~IID 0,1 , 𝑖 = 1,2, … , 𝑞, 𝑗 = 1,2, … , 𝑝. To assess how explanatory variables influence the volatility of the Bitcoin price we can employ a GARCH-X model: 𝜎𝑡 2 = 𝑤0 + 𝑖=1 𝑞 𝑤𝑖 𝑟𝑡−𝑖 2 + 𝑗=1 𝑝 𝜏𝑖 𝜎𝑡−𝑗 2 + Λ𝑋𝑡
  • 104. 105 GARCH family • Variety of GARCH model: SGARCH, EGARCH, GJRGARCH, APARCH, IGARCH, CSGARCH, GARCH, TGARCH, etc. • We select best model based on different model selection criteria, e.g., AIC, BIC etc.
  • 105. 106 GARCH model for Bitcoin The second model is the exponential GARCH model which investigates if the return on Bitcoin is asymmetrically affected by good and bad news (known as the leverage effect) (Dyhrberg, 2016).
  • 106. 107 GARCH model for Bitcoin cont. … • Exchange rates suggest that Bitcoin returns are more sensitive to the value of the dollar relative to the £, than to value of the $ relative to the €. • Therefore regional or country specific effects are present. Mean equation
  • 107. 108 GARCH model for Bitcoin cont. … • Bitcoin return will have a lower volatility than the dollar when there is a positive volatility shock to the federal funds rate. Variance equation • a positive volatility shock to the dollar-sterling exchange rate decreases the variance of the Bitcoin returns. • This may indicate that Bitcoin is a relatively safe asset in such a situation (Dyhrberg, 2016).
  • 108. Bitcoin volatility - machine learning methods Features: Methods considered: EWMA, ARIMA, ARIMAX, RF, GBT, XGT etc. Guo, Tian, and Nino Antulov-Fantulin. Predicting short-term bitcoin price fluctuations from buy and sell orders. arXiv preprint arXiv:1802.04065 (2018). Modeling Realized volatility
  • 109. Bitcoin vol.- machine learning methods cont. … ● The simple EWMA can beat all others in some intervals. ● Simply adding features from order book does not necessarily improve the performance. ● Models like ARIMAX and STRX are prone to overfit by redundant data of long horizon, while ensemble method XGT, and ENET are relatively robust to the horizon.
  • 110. 111 Models with Local blockchain network features Local higher-order structures of complex networks, or multiple-node subgraphs, are found to be an indispensable tool for analysis of 1. robustness of biological networks (Milo et al., 2002) 2. robustness of power grid (Dey et al., 2017) 3. functionality and early warning stability indicators in financial networks (Jiang et al., 2014) Milo, Ron, Shai Shen-Orr, Shalev Itzkovitz, Nadav Kashtan, Dmitri Chklovskii, and Uri Alon. Network motifs: simple building blocks of complex networks. Science 298, no. 5594 (2002): 824-827. Dey, Asim Kumer, Yulia R. Gel, and H. Vincent Poor. Motif-based analysis of power grid robustness under attacks. In Signal and Information Processing (GlobalSIP), 2017 IEEE Global Conference on, pp. 1015-1019. IEEE, 2017. Jiang, X. F., T. T. Chen, and B. Zheng. Structure of local interactions in complex financial dynamics. Scientific reports4 (2014): 5321.
  • 111. 112 Models with Local blockchain network features Local higher-order structures of complex networks, or multiple-node subgraphs: Three distinct types of 1-chainlets!
  • 112. 113 Blockchain network features In contrast to fiat currencies, transactions of cryptocurrencies are permanently recorded on distributed ledgers to be seen by the public. A natural analytics approach is then to ask the following three interlinked questions (Akcora et al., 2018): 1. Do changes in chainlet characteristics exhibit any causal effect on future cryptocurrency price and returns? 2. Do chainlets convey some unique information about future cryptocurrency prices, given more conventional economic variables and non-network blockchain characteristics? 3. Do chainlets dynamics of one cryptocurrency have influence on price and volatility of other cryptocurrency? Cuneyt G. Akcora, Asim Kumer Dey, Yulia R. Gel, and Murat Kantarcioglu. Forecasting Bitcoin Price with Graph Chainlets. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 765- 776. Springer, Cham, 2018. Asim Kumer Dey, Akcora, Cuneyt G., Yulia R. Gel, and Murat Kantarcioglu. On the Role of Local Blockchain Network Features in Cryptocurrency Price Formation. 2019.
  • 113. 114 Chainlet Predictive Utility in Price (Akcora et al., 2018)
  • 114. 115 Cryptocurrency Price Prediction with Chainlets (Dey et al., 2019) The comparison study based only on the Random Forest (RF) type of models.
  • 115. 116 Model comparison Predictive utilities of a model over the baseline model can be measured as Ψ 𝑋→𝑌 = 𝜓 𝐵𝑖 𝜓 𝐵0 • 𝜓 is a measure of prediction error, e.g., root mean squared error (RMSE). • If Ψ 𝑋→𝑌 < 1, the 𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑡𝑒(𝑿) is said to improve prediction of 𝑌. The percentage change in 𝜓 for a specific model w.r.t. 𝐵0 as Δ = 1 − 𝛹 𝑋→𝑦 × 100% (Akcora et al., 2018)
  • 116. 117 Cryptocurrency Price Prediction with Chainlets • For short to moderate term (up to 15 days ahead) forecasting horizons, model B2, solely based on Bitcoin occurrences, yields more accurate performance, although closely followed by models B3 and B4 (Akcora et al., 2018). • For longer term forecasting horizons, i.e., more than 15 days ahead, model B4, containing information from both Bitcoin and Litecoin, delivers the most competitive results, followed by model B2.
  • 117. 118 Analyzing Price Volatility with Chainlets To assess how chainlets variables influence the volatility of the Bitcoin price we employ a GARCH-X model with the explanatory variables 𝜎𝑡 2 = 𝑤0 + 𝑖=1 𝑞 𝑤𝑖 𝑟𝑡−𝑖 2 + 𝑗=1 𝑝 𝜏𝑖 𝜎𝑡−𝑗 2 + Λ𝑋𝑡 where X = [𝕆 ℂ1→7 ₿ 𝕆 ℂ20→3 ₿ 𝕆 ℂ3→3 ₿ 𝕆 Bitcoin cluster 7 𝔸 ℂ3→4 ₿ 𝔸 ℂ20→20 ₿ ] , Λ = 𝜆1 𝜆2 … 𝜆6 ′. • All the explanatory variables are in the form of log returns. • GARCH(1,1) model. • 𝜖 𝑡~ N(0,1). (Dey et al., 2019)
  • 118. 119 Analyzing Price Volatility with Chainlets cont... The model with chainlet covariates, Model X, tends to describe the Bitcoin price volatility more accurately than the volatility model without chainlet covariates i.e., Model 0 (Dey et al., 2019).
  • 119. 120 Future Research Directions • Relationship between transaction networks of multiple cryptocurrencies and health of crypto eco-system. • Network features of cryptocurrencies transactions as a proxy for market sensing. • Ensemble forecasting of fiat currencies with cryptocurrencies features.
  • 120. Thanks for attending! Yulia R. Gel ygl@utdallas.edu BlockchainTutorial.Github.io