3. About me
● Modak Analytics
● Genenetwork project
● SciRuby Contributor
● Google Summer of Code 2016, 2017
● Ruby Grant 2017
● Fukuoka Ruby Award 2018
● Projects:
○ JRuby port of NMatrix
○ ArrayFire gem
○ RbCUDA
5. Highlights
Modak Analytics is helping implement one of the largest Life Sciences
Platform in the world.
Platform Details
2100
Structured
data sources
500k
Tables
1350
Unstructured
sources
1.3
Billion
Files
1200
Data Nodes
6
Petabytes
Usable
information
• 1000+ clinical trials being standardized to
CDISC (SDTM) model for cross-study analysis,
placebo baseline etc.
• Single integrated data platform comprising of
compound, activity results, assay protocol and
project information
• “Like Minded” data has been grounded into
Data Domains by business areas. E.g. Clinical,
Assay, Gene, Regulatory etc
• Around 17+ solutions have been developed
and deployed for business
Awarded at the prestigious ‘Strata
Data Conference 2017’ for building
this platform in record time
6. Governed Data Lake approach
AUTOMATED
DATA DISCOVERY
• Modak is
providing
end-to-end
service for the
platform
including
Automated
Ingestion,
Curation, and
innovative
Solutions
• Modak is also
providing 24*7
support for the
massive platform
AUTOMATED
DATA INGESTION
Data Spider
Postgres
SQL serverOracle
MySQL
Structured Data
SAS Data Sets
Unstructured Data
File shares
SharePointDocumentum
BOTS
FOUNDATION
LAYER
Ingested
Raw Data
Data Tagging
Data Masking
Data
cleansing
Data lineage
Data profiling
Augmented Data
Mapping/
Standardization
Data
Fingerprinting
A replica of the
Data is
ingested into
the Integration
Layer
INTEGRATION
LAYER
SOLUTIONS
LAYER
Data Analytics
SEMANTICLAYER
Visulaisation
Dashboards
and Reports
MetaData
Catalog
(KOSH)
Flow
controller
Streamsets
Pipelines are
generated
automatically
Data Governance
Data Security
System / Application Management
SOURCE DATA
Originators of data and serve
as “authoring” systems to
support business processes
Optimized for computing and
distribution of data Optimized for strategic BI
product development
Optimized for
Business Users
Optimized for
Analysts, Data
scientists
GWAS
39. GPU Array
● Generic pointer used to handle an array of elements on the GPU.
● Memory copying from CPU to GPU and vice-versa.
● Interfaced with NMatrix and NArray
40. vadd_kernel_src = <<-EOS
extern "C" {
__global__ void matSum(int *a, int *b, int *c)
{
int tid = blockIdx.x;
if (tid < 100)
c[tid] = a[tid] + b[tid];
}
}
EOS
f = compile(vadd_kernel_src)
RbCUDA::Driver.run_kernel(f.path)