How to Use Innovative Data Handling and Processing Techniques to Drive Alpha in the Financial Markets

2017 © Parametric Portfolio Associates® LLC
HOW TO USE INNOVATIVE DATA HANDLING AND
PROCESSING TECHNIQUES TO DRIVE ALPHA IN
THE FINANCIAL MARKETS

2017 © Parametric Portfolio Associates® LLC2
PARAMETRIC’S PROFILE:
*As of 3/31/2017. Includes AUM of Parametric Investment & Overlay Strategies and Parametric Custom Tax-Managed & Centralized Portfolio
Management.
Seattle, WA Minneapolis, MN Westport, CT
• Leaders in rules-based, engineered
portfolio solutions
• Strategies ranging from index tracking
portfolios to managed smart beta
• Ability to incorporate responsible investing
themes
• Founded 1987
• A subsidiary of Eaton Vance Corp.
since 2003
• Pioneers in overlay strategies and
custom risk management solutions
(formerly The Clifton Group)
• Innovative product solutions in real asset
and liquid alternatives
• Founded 1972
• Acquired by Parametric in 2012
• Specialists in option portfolio
management
• Provide product-based and custom
option overlay solutions
• Founded 2002
• A part of Parametric since 2007
We provide systematic, disciplined portfolio management solutions
We offer investment solutions through our three investment centers:
> Parametric Portfolio Associates® LLC (“Parametric”) is a majority-owned subsidiary of Eaton Vance Corp.
> Approximately $197.6 Billion (3/31/2017) in assets under management*.

PARAMETRIC INVESTMENT PLATFORM*
*For illustrative purposes only

PARAMETRIC’S BIG DATA JOURNEY
•MDM Launch
•Decision: Data
Centralization2016
•Data Lake
Implementation
•Focus:
Mastering Data
Sources and
Data Discovery
2017
•Modernizing
Data Usage
•Focus:
Transition Data
Silos to Data
Lake
2018

OVERVIEW - IT ENVIRONMENT
Our Hadoop Environment:
Two Separate Clusters
‒ Production 10 Node Cluster
• Common + Hive
• Clustered NiFi
‒ Development 10 Node Cluster
• Common + Hive + Spark
• Non-Clustered NiFi
NiFi in both dev and production
We build NiFi workflows in dev and promote to
production
Our Environment:
‒ Before Hadoop
• Primarily Windows
‒ C# , MS SQL, PowerShell, etc.
• All automation done using CA
‒ With Hadoop
• Still primarily Windows… + Hadoop
• Transition ETL automation to NiFi

THE DATA MANAGEMENT OFFICE CHALLENGE
Stop processing on time boundaries – Process as soon as data is available!
• Previously processes were triggered at specific times of day
• Vendor data availability is generally good but not perfect
‒Delayed data has cascading negative effects
‒Manual intervention typically required for delayed or missing data files
‒Requires after hours support
• Data consumption pushed to nightly jobs to ensure most complete data sets
• Insurance premium for waiting on data
‒Loss of potential processing hours
‒Loss of processing during business hours
‒Potential loss of pre-processing work

RESOLUTION: NIFI
NiFi’s Immediate Benefits:
• Event Based Processing
• Data Provenance
• Queueing
• Back Pressure
• Rapid Development
• Large Selection of Processors

QUICK WIN – FINDING RESTRICTED ISSUERS
• Background
• An account defines an ownership restriction for one or more issuers (companies). The
client provides one or more security identifiers with an issuer name. However the list
may be incomplete but the mandate is that ANY security from that issuer can not be
held in their account.
• Problem Statement
• How to identify the issuer to prevent any of its security types from being held?
• Solution:
• Use the client provided security identifiers to search our Bloomberg data and map it to
an id_bb_global so its restrict by our compliance system.

OLD PROCESS SOLUTION
Old Process Steps:
• Client provides a spreadsheet of market identifiers
• Spreadsheet is reviewed and then run through several independent processes – mostly manual
• Spreadsheet is returned to the requestor with the company identifier tacked, if found, to the original spreadsheet
• Requestor then formats the results so that it can be digested by the target system
Old Process Requirements:
• 3 people an average of 3 to 5 hours
‒ Requestor – 1 to 1 ½ hours
• Review client send, write email to kick start process, follow up on expected completion, and reformatting
results
‒ HelpDesk
• Process ticket and assign to app support
‒ App Support
• Research and query generation

NIFI SOLUTION
NiFi Steps:
• Client provides a spreadsheet of market identifiers
• Spreadsheet is reviewed, identifiers are cut and pasted into a standard formatted spreadsheet
• Spreadsheet is saved as a CSV file to the In directory on public drive
• Requestor receives an email when NiFi process is done
• Requestor picks up a ready to load CSV file in the Out directory
Achieved Targets:
• Minimal manual processing and IT intervention
• All self service and its easy to do
• Bonus: Better results by searching all Bloomberg data available instead of investible
universe

WHAT IT LOOKS LIKE TO THE REQUESTOR

WHAT IT LOOKS LIKE IN NIFI TODAY (REFINED)

HOW IT STARTED

14 2017 © Parametric Portfolio Associates® LLC
OUR TOP 10 NIFI BEST
PRACTICES

#10
Adjust Processor Run Schedule
-When developing flows, first priority should be to adjust Run Schedule
-We have had cases where this wasn’t done and massive amounts of data was
generated
-Back Pressure will keep things from running completely out of control

#9
Make sure you have plenty of storage space for the NiFi
databases
•NiFi’s Data Provenance and Queuing require sufficient storage space
•Place these databases on a separate mount points
•Configure Provenance expiration to meet your business requirements

#8
Use Process Groups
•Process groups allow you create modular flows
•Keeps flows organized and readable
•Authorization can be set for process groups
•The root page is a process group

#8 USE PROCESS GROUPS
1. Each developer has their own process group in our development
environment
2. We work in our own process groups when doing initial development / POC
type of work
3. We have process groups that encapsulate “releasable code”

#8 USE PROCESS GROUPS

#7
Create templates of single Processor for easy reuse
•Often times you add a Processor to a flow and then need to update a number of
properties
•Stream line this by creating a template that has just the Processor with the
properties prepopulated
•Use the template instead of the Processor

#7 PROCESSOR REUSE
1. Configure a Processor
2. In this example Put
HDFS configured with
Kerberos Principal
Create a template that only contains
the processor
1. Select the processor
2. Click the create template icon
3. Give it a good name and
description

#7 PROCESSOR REUSE
Once a template has been created
you can add that template to newly
developed work flows

#6
The Data Provenance Search Facility

#5
If you Cluster in Prod then Cluster in Dev
•NiFi supports clustering
•If you are going to clustered NiFi in Production then have your Dev NiFi
clustered as well
•Certain Processor should only run on a single node in the cluster
•It is possible to create a single node cluster in dev, but still best to have your dev
setup match your production setup.

#4
Set expirations on Success queues
•Often times we want to capture flows successful completion
•We route the final Success output of a process to a funnel
•Make sure you set the flow file expiration of the queue otherwise back pressure
will cause your flows to stop until the queue is drained

#4 EXPIRE SUCCESS QUEUES

#3
Use Custom Properties per environment
•NiFi flows have access to Custom properties and environment variables
•We use these to make our flows environment agnostic
•Not perfect
• Not all properties support expression language
• Custom properties are read at startup

#2
Create Small Modular Disconnected work flows
•A large complex, interconnected flows
• Difficult to debug
• Difficult to deploy
•Decouple flows
• Create flow with specific functional purpose
• If other flows are dependent use queues to as the coupling mechanism

#2 SMALL, MODULAR AND DISCONNECTED
Download and
put to HDFS
Process
downloaded files

#2 SMALL, MODULAR AND DISCONNECTED
Decouple

#1
Update the Names of the Processors
•Do not use the default Processor name
•Give each processor in a flow a friendly, well understood name
•NiFi Summary other features that show processor names will be
more usable if processors are named

#1 UPDATE PROCESSORS NAMES

ORGANIZATIONAL
CHALLENGES -
NIFI

EVENT BASED MINDSET – AS A COMPANY
 To leverage a modern data architecture fully companies need to
think in terms events, not a schedule
 An event could be anything from a file arriving in an FTP, a restful
API call, or a database being updated
 The concept of jobs running at specific times causes
unnecessary strain on systems and reduces overall throughput
• Like waiting to the last minute to do all your homework
• 50MB/s = >4TB/day

EASY TO GET STARTED – DON’T BE LAZY
 It is a lot easier to go fast and get an application working
• This leads to bad programs
 Just because it works doesn’t mean its efficient
• Double the hardware for the same amount of work
 Repeated for Importance: Rename your processors like you
would comment your code
 Check your backpressure settings – very easy to make a flow that
overwhelms a specific step
• Particularly with “GenerateFlowFile”
• Debugging steps need to be evaluated/removed before production

INCLUDE “BUSINESS” PEOPLE AND SPECIALISTS
 With a more interactive graphical interface, business people &
specialists can understand the program’s flow a lot better
 Have you ever tried to go line by line through java code with
someone? Eyes glaze over
 Collaborate earlier and more often,
it will decrease the cycles needed
to get on target

DON’T BE AFRAID TO CUSTOM CODE
 The majority of use-cases don’t need custom code – don’t be
afraid to make new processors

QUESTIONS?

DISCLOSURE
Parametric Portfolio Associates LLC (“Parametric”), headquartered in Seattle, Washington, is registered as an investment adviser with the U.S. Securities
and Exchange Commission under the Investment Advisers Act of 1940. Parametric is a leading global asset management firm, providing investment
strategies and customized exposure management directly to institutional investors and indirectly to individual investors through financial intermediaries.
Parametric offers a variety of rules-based investment strategies, including alpha-seeking equity, alternative and options strategies, as well as
implementation services, including customized equity, traditional overlay and centralized portfolio management. Parametric is a majority-owned
subsidiary of Eaton Vance Corp. and offers these capabilities through investment centers in Seattle, WA, Minneapolis, MN and Westport, CT. This
material may not be forwarded or reproduced, in whole or in part, without the written consent of Parametric Compliance. Parametric and its affiliates
are not responsible for its use by other parties.
All contents copyright 2017 Parametric Portfolio Associates LLC. All rights reserved. Parametric Portfolio Associates, PIOS, and Parametric with the iris
flower logo are all trademarks registered in the US Patent and Trademark Office.
Parametric is located at 1918 8th Avenue, Suite 3100, Seattle, WA 98101. For more information regarding Parametric and its investment strategies, or to
request a copy of Parametric’s Form ADV Brochure, please contact us at 206.694.5575 or visit our website, www.parametricportfolio.com.

How to Use Innovative Data Handling and Processing Techniques to Drive Alpha in the Financial Markets

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to How to Use Innovative Data Handling and Processing Techniques to Drive Alpha in the Financial Markets

Similar to How to Use Innovative Data Handling and Processing Techniques to Drive Alpha in the Financial Markets (20)

More from DataWorks Summit

More from DataWorks Summit (20)

Recently uploaded

Recently uploaded (20)

How to Use Innovative Data Handling and Processing Techniques to Drive Alpha in the Financial Markets