SlideShare une entreprise Scribd logo
1  sur  36
Télécharger pour lire hors ligne
Measuring the digital economy
using big data
Prash Majmudar – Growth Intelligence
@growthintel
@prashmaj
Overview
• Background
• Approach (Data + Python)
• Sizing the economy - Results
• Examples
Background
Project background
• Research project supported by NESTA, Google
• Worked with independent economists at the
National Institute of Economic and Social
Research (NIESR) – Max Nathan, Anna Rosso
• Published report in 2013
• Further phases of work underway
5
Research questions
• What’s the most appropriate definition of UK ‘digital
companies’? Cleaner definitions, company counts
• What do the UK’s ‘digital companies’ (really) look like? Key
characteristics, focus on start-ups, innovating and ‘high-
growth’ companies, spatial footprint
• What drives innovation and/or high-growth status in digital
companies? Performance analysis and characteristics. Sample
historic data to investigate causality
Why?
• The digital economy is poorly served by
conventional definitions and datasets.
• Reliance on Companies House (historic data)
• Standard definitions used for:
– Credit / risk
– Government policy (e.g. focus on Tech City)
– Economic productivity measures
– Companies that sell / market to other companies
SIC - Standard Industrial Classification
• Brought into being in 1948
– Since 1948 the classification has been revised in
1958, 1968, 1980, 1992, 1997, and 2003
• Latest version is “SIC 2007”
– adopted by UK in 2008.
– adopted by Companies House in October 2011.
• 731 SIC codes, but not without issues
– Self-classification
– Emerging sectors e.g. no codes for Nanotechnology
SIC
• 77220 Renting of video tapes and disks
• 81223 Furnace and chimney cleaning services
• 01440 Raising of camels and camelids
• 32110 Striking of coins – Royal Mint
• 38310 Dismantling of wrecks
• 01260 Growing of oleaginous fruits
• 82990 Other business support service activities n.e.c.
– 10% of Businesses
• 20% not classified
Challenge
• The ‘digital economy’ is not straightforward to define
• Refers to:
– a set of sectors,
– a set of outputs (products and services),
– and a set of inputs (production and distribution tools, underpinned by
information and communication technologies).
• Mapping the digital economy onto industries is
necessarily imprecise.
• Government defines it as ‘information’ and ‘digital
content’ industries (BIS 2012, 2013)
• Data driven methods can provide richer, more informative and
more up to date analysis.
Data driven approach
All Companies
in the Economy
~ 3M companies
Online
activity
News /
Events
Technologies
Classifications
Financials
TMs /
Patents
UNUSUAL
DATA
Trade
activity
UNIQUE
DATA
COMPANIES
USER DATA
Linked datasets and algorithms
Enterprise
users
Tech Users
Medium
company
users
Approach
• Classification system is multi-dimensional:
– Sector: vertical they operate in
– Product type: principal output (services / physical
goods)
– Client type: business or consumer focussed
– Sales process: how they sell / route to market
IT Film Telco Publishing
Oil &
Gas
Architecture
Software
– web
Consultancy
Hardware
/ tools
Electronics
Media
distribution
Approach
Crowd sourced
labelled data
Crawl /
APIs
Pre-labelled
data
Feature
generation /
selection
Model
training
Feature
Extraction /
pre-processing
Scrapy
Processing
Python scikit-learn / pandas
Training set
Building training sets
Crowd sourcing –
create
classification
tasks
Expert panels Pre-labelled data
• Using crowd sourcing
– Users follow pre-defined instructions – are rewarded
for successfully completing tasks
– Can put in place qualification tests etc.
– Vote to produce labels – majority of 5
• Used expert panel when large number of classes
Feature engineering
– Multiple sources of features
• Free text (News / Web)
• Structured datasets (e.g. patent filings etc.)
– Cleaning data
• Malformed HTML
• Stripping out HTML, Javascript
– Tokenising and calculating TF-IDF weights
Modelling
• Supervised learning classification problem
• Scikit learn (fast iteration on different models).
Use of Linear SVMs and processing pipelines
– One vs many classifier
• Pandas plays well here – can quickly build up
feature sets
• Large number of features (thousands) – linear
models are fast.
0 0.2 0.4 0.6 0.8 1 1.2 1.4
cables
smes
termination
ip
networking
server
sap
consultant
ethernet
installer
fault
cloud
remote
setup
ict
servers
copper
telecom
wireless
hardware
conferencing
desk
disruption
crm
infrastructure
hosting
fibre
cisco
switches
cabling
0 0.2 0.4 0.6 0.8 1 1.2 1.4
luxurious
quantity
footwear
collection
cotton
courier
shirts
stockists
cart
logo
satin
wholesale
hats
nylon
wear
workwear
bridal
womens
designs
socks
accessories
lace
mens
clothing
fashion
apparel
FashionComputer networking
clf.coef_
Summary
• Use multiple datasets as an input
• Build multi-class classifiers for
sector, product, client, sales process
• Apply classifiers to 3M companies in the UK
Sizing the digital economy
Challenges
• Sole traders are not observed
• Registered company addresses are not always trading
addresses
• Understanding company structure
• Employee coverage is limited – gaps in data due to reliance on
historic filing data traditionally
23
Cleaning the company data
• Aim = build a benchmarking sample
• Include only observations with SIC and GI info => smaller than ‘true’
- Step 1: drop non-trading, dormant, dissolved companies or those in
administration
- Step 2: drop holding companies
- Step 3: identify groups of linked companies (via
name, postcode), keep the unit that reports highest revenue
• Benchmarking sample = 1.868m companies
• Validate ‘true’ sample (2.254m) vs. BPS enterprise counts
24
Identifying ‘digital companies’
• Aim = more robust definition, compare against SIC-based
• Use ‘sector’ and ‘product’ categories
• Intuition = we want companies in ‘digital’ sectors’ that also do
‘digital’ things (e.g. digital publishing, media, design …)
- Step 1: Identify GI sector and product categories
- Steps 2-5: clean out ‘non-digital’ GI sectors, products combinations
- Step 6: Count companies
- E.g. Process designed to exclude large proportion of architecture
firms, except those whose principal product type is software for CAD /
technical drawing
25
Company counts
Observations %
A. SIC 07
Other 1,681,151 89.96
Digital Economy 187,616 10.04
B.GI sector and product
Other 1,599,072 85.57
Digital Economy 269,695 14.43
Note: Panel A follows the BIS (2009) definition. Panel B defines the digital economy using GI digital sector by digital product "cells".
Classifications:
Sector – Oil and Energy
Product – Computer Software
Client – Businesses
Sales process – Project
Based in Aberdeen
SIC Code: 82990 - Other business support
service activities
Company counts are
highest in London.
But we also find large
counts in
Manchester, Birmingha
m, Bristol and Brighton
...
... as well as the wider
Greater South East.
280.000 0.200 0.400 0.600 0.800 1.000 1.200 1.400 1.600 1.800
Livingston & Bathgate
Crawley
Oxford
Southampton
Coventry
Middlesbrough & Stockton
Cheltenham & Evesham
Swindon
Cambridge
Andover
Brighton
Bournemouth
Wycombe & Slough
Luton & Watford
Stevenage
Guildford & Aldershot
Poole
Milton Keynes & Aylesbury
Newbury
Reading & Bracknell
Basingstoke
Guildford
consultancy
custom
software
development digital media
media
distribution
peer to peer
communicati
ons photography
printing
services
software
desktop or
server
software web
application web hosting
animation 1
architecture 178
computer games 2 80
computer hardware 12 7 1
computer network security 7 1
computer networking 23 5
computer software 88 459 70
defense space 37
electrical electronic
manufacturing 13 72 1
entertainment film
production 6 33
financial services 820
information services 8 3
information technology 2756 6 94
internet 14 15 1 16
marketing advertising 192
photography 74 7 1
printing 12 2 63
publishing 29
semiconductors 3
telecommunications 58 9 31 1 1
Additional findings
31
Digital companies’ revenue growth in 2010-2012 is
faster than non-digital ...
A. Annual Revenues
B. Annual
Revenue Growth
mean median mean median
Other 18,380,097 110,048 15.68 1.70
Digital Economy 10,547,218 123,388 20.21 4.17
Note: Sub-sample of those companies who report revenue. Companies House average revenues are averaged over the period
2010 to 2012. If for each company there is more than one observation, only the most recent is kept. Average annual revenue growth
is computed on a smaller sample, as information for at least two consecutive years is needed.
32
... and digital employers have higher average staff
levels.
Employees per company
Mean Median % of all employment
A. Official / SIC07
Other 20.94 4 94.92
Digital Economy 17.23 3 5.08
B. GI sector and product
Other 20.40 4 88.67
Digital Economy 23.37 4 11.33
Note: sub-sample of firms reporting employment to Companies House. Data is averaged over 2010-2012.
Further work
• Drivers of innovation / growth
• Use of ‘tags’ to provide further descriptive
analysis of digital companies
• Unsupervised approach to identify clusters
• Extension to sole traders
• Extending this approach to Europe – e.g.
Belgium, France, Germany, Italy
Questions?
@growthintel
@prashmaj
SIC – ICT Sector
28230 MANUFACTURE OF OFFICE MACHINERY AND COMPUTERS
26200 MANUFACTURE OF COMPUTERS AND OTHER INFORMATION PROCESSING EQUIPMENT
27320 INSULATED WIRE AND CABLE
26110 ELECTRONIC VALVES AND TUBES AND OTHER ELECTRONIC COMPONENTS
33200 TELEVISION, RADIO TRANSMITTERS AND APPARATUS FOR TELEPHONY AND TELEGRAPHY
26400 TELEVISION AND RADIO RECEIVERS, SOUND OR VIDEO RECORDING OR PRODUCING APPARATUS AND ASSOCIATED GOODS
26511 INSTRUMENTS AND APPLIANCES FOR MEASURING, CHECKING, TESTING AND NAVIGATING AND OTHER PURPOSES
26512 INDUSTRIAL PROCESS EQUIPMENT
46439 WHOLESALE OF ELECTRICAL HOUSEHOLD APPLIANCES
46510 WHOLESALE OF COMPUTERS, COMPUTER PERIPHERAL EQUIPMENT AND SOFTWARE
46660 WHOLESALE OF OTHER OFFICE MACHINERY AND EQUIPMENT
46520 WHOLESALE OF OTHER ELECTRONIC PARTS AND EQUIPMENT
46690 WHOLESALE OF OTHER MACHINERY FOR USE IN INDUSTRY, TRADE AND NAVIGATION
61900 TELECOMMUNICATIONS SERVICES
77330 RENTING OF OFFICE MACHINERY AND EQUIPMENT INCLUDING COMPUTERS
62020 COMPUTER HARDWARE CONSULTANCY
95110 MAINTENANCE AND REPAIR OF OFFICE, ACCOUNTING AND COMPUTING MACHINERY
62090 OTHER COMPUTER RELATED ACTIVITIES
SIC – Digital content industries
58110 PUBLISHING OF BOOKS
58130 PUBLISHING OF NEWSPAPERS
58142 PUBLISHING OF JOURNALS AND PERIODICALS
59200 PUBLISHING OF SOUND RECORDINGS
58190 OTHER PUBLISHING
18110 PRINTING OF NEWSPAPERS
18129 PRINTING N.E.C
18130 PRE-PRESS ACTIVITIES
18130 ANCILLARY ACTIVITIES RELATING TO PRINTING
18201 REPRODUCTION OF SOUND RECORDING
18202 REPRODUCTION OF VIDEO RECORDING
18203 REPRODUCTION OF COMPUTER MEDIA
58290 PUBLISHING OF SOFTWARE
62020 OTHER SOFTWARE CONSULTANCY AND SUPPLY
63110 DATA PROCESSING
63110 DATABASE ACTIVITIES
73110 ADVERTISING
74209 PHOTOGRAPHIC ACTIVITIES
59111 MOTION PICTURE AND VIDEO PRODUCTION
59131 MOTION PICTURE AND VIDEO DISTRIBUTION
59140 MOTION PICTURE PROJECTION
59113 RADIO & TV (DCMS ESTIMATES)
63910 NEWS AGENCY ACTIVITIES

Contenu connexe

Tendances

Electronic Components Manufacturing Unit in Delhi, available for SALE
Electronic Components Manufacturing Unit in Delhi, available for SALEElectronic Components Manufacturing Unit in Delhi, available for SALE
Electronic Components Manufacturing Unit in Delhi, available for SALEBusinessDeals
 
IRJET - Industry 4.0 for Manufacturing Organization in India
IRJET - Industry 4.0 for Manufacturing Organization in IndiaIRJET - Industry 4.0 for Manufacturing Organization in India
IRJET - Industry 4.0 for Manufacturing Organization in IndiaIRJET Journal
 
Industry 4.0 and the road towards it
Industry 4.0 and the road towards itIndustry 4.0 and the road towards it
Industry 4.0 and the road towards itRick Bouter
 
1 Ecommerce for retail 2018 session 1 and 2
1 Ecommerce for retail  2018 session 1 and 21 Ecommerce for retail  2018 session 1 and 2
1 Ecommerce for retail 2018 session 1 and 2sanjivadubey
 
Overview of Electronic Commerce
Overview of  Electronic CommerceOverview of  Electronic Commerce
Overview of Electronic CommerceUjjwal 'Shanu'
 
Intelligent Business Services Operation
Intelligent Business Services OperationIntelligent Business Services Operation
Intelligent Business Services Operationpetermoricz
 
International Technology Adoption & Workforce Issues Study - German Summary
International Technology Adoption & Workforce Issues Study - German SummaryInternational Technology Adoption & Workforce Issues Study - German Summary
International Technology Adoption & Workforce Issues Study - German SummaryCompTIA
 
New business models using Artificial Intelligence
New business models using Artificial Intelligence New business models using Artificial Intelligence
New business models using Artificial Intelligence BirgitObermeier
 
Activity 1.4 e-commerce
Activity 1.4 e-commerceActivity 1.4 e-commerce
Activity 1.4 e-commerceaudelon
 
E-Business: Chapter 1: Intro to E-B
E-Business: Chapter 1: Intro to E-BE-Business: Chapter 1: Intro to E-B
E-Business: Chapter 1: Intro to E-BArry Arman
 
Industry 4.0 | Daxue Consulting
Industry 4.0 | Daxue ConsultingIndustry 4.0 | Daxue Consulting
Industry 4.0 | Daxue ConsultingDaxue Consulting
 
Digitalisation in Asia
Digitalisation in AsiaDigitalisation in Asia
Digitalisation in AsiaAlbina Gaisina
 
Informa Overview & Marketing Services Solutions - 2020 Marketing Planning
Informa Overview & Marketing Services Solutions - 2020 Marketing PlanningInforma Overview & Marketing Services Solutions - 2020 Marketing Planning
Informa Overview & Marketing Services Solutions - 2020 Marketing PlanningLiz LaPorte Stott
 
Canada - What is next for Manufacturing | February 2022 and January 2022
Canada - What is next for Manufacturing | February 2022 and January 2022 Canada - What is next for Manufacturing | February 2022 and January 2022
Canada - What is next for Manufacturing | February 2022 and January 2022 paul young cpa, cga
 
Fmce africa mko's presentation - 13102020
Fmce africa   mko's presentation - 13102020Fmce africa   mko's presentation - 13102020
Fmce africa mko's presentation - 13102020Dr. MKO Balogun
 
Industry 4.0 Changes Everything
Industry 4.0 Changes Everything Industry 4.0 Changes Everything
Industry 4.0 Changes Everything Imaginet
 

Tendances (19)

Electronic Components Manufacturing Unit in Delhi, available for SALE
Electronic Components Manufacturing Unit in Delhi, available for SALEElectronic Components Manufacturing Unit in Delhi, available for SALE
Electronic Components Manufacturing Unit in Delhi, available for SALE
 
IRJET - Industry 4.0 for Manufacturing Organization in India
IRJET - Industry 4.0 for Manufacturing Organization in IndiaIRJET - Industry 4.0 for Manufacturing Organization in India
IRJET - Industry 4.0 for Manufacturing Organization in India
 
Industry 4.0 and the road towards it
Industry 4.0 and the road towards itIndustry 4.0 and the road towards it
Industry 4.0 and the road towards it
 
1 Ecommerce for retail 2018 session 1 and 2
1 Ecommerce for retail  2018 session 1 and 21 Ecommerce for retail  2018 session 1 and 2
1 Ecommerce for retail 2018 session 1 and 2
 
Overview of Electronic Commerce
Overview of  Electronic CommerceOverview of  Electronic Commerce
Overview of Electronic Commerce
 
Intelligent Business Services Operation
Intelligent Business Services OperationIntelligent Business Services Operation
Intelligent Business Services Operation
 
International Technology Adoption & Workforce Issues Study - German Summary
International Technology Adoption & Workforce Issues Study - German SummaryInternational Technology Adoption & Workforce Issues Study - German Summary
International Technology Adoption & Workforce Issues Study - German Summary
 
New business models using Artificial Intelligence
New business models using Artificial Intelligence New business models using Artificial Intelligence
New business models using Artificial Intelligence
 
Activity 1.4 e-commerce
Activity 1.4 e-commerceActivity 1.4 e-commerce
Activity 1.4 e-commerce
 
E-Business: Chapter 1: Intro to E-B
E-Business: Chapter 1: Intro to E-BE-Business: Chapter 1: Intro to E-B
E-Business: Chapter 1: Intro to E-B
 
Industry 4.0 | Daxue Consulting
Industry 4.0 | Daxue ConsultingIndustry 4.0 | Daxue Consulting
Industry 4.0 | Daxue Consulting
 
Ch01
Ch01Ch01
Ch01
 
Overview Eirma
Overview EirmaOverview Eirma
Overview Eirma
 
Unit 1 overview
Unit 1 overviewUnit 1 overview
Unit 1 overview
 
Digitalisation in Asia
Digitalisation in AsiaDigitalisation in Asia
Digitalisation in Asia
 
Informa Overview & Marketing Services Solutions - 2020 Marketing Planning
Informa Overview & Marketing Services Solutions - 2020 Marketing PlanningInforma Overview & Marketing Services Solutions - 2020 Marketing Planning
Informa Overview & Marketing Services Solutions - 2020 Marketing Planning
 
Canada - What is next for Manufacturing | February 2022 and January 2022
Canada - What is next for Manufacturing | February 2022 and January 2022 Canada - What is next for Manufacturing | February 2022 and January 2022
Canada - What is next for Manufacturing | February 2022 and January 2022
 
Fmce africa mko's presentation - 13102020
Fmce africa   mko's presentation - 13102020Fmce africa   mko's presentation - 13102020
Fmce africa mko's presentation - 13102020
 
Industry 4.0 Changes Everything
Industry 4.0 Changes Everything Industry 4.0 Changes Everything
Industry 4.0 Changes Everything
 

En vedette

Real-time Streams & Logs with Storm and Kafka by Andrew Montalenti and Keith ...
Real-time Streams & Logs with Storm and Kafka by Andrew Montalenti and Keith ...Real-time Streams & Logs with Storm and Kafka by Andrew Montalenti and Keith ...
Real-time Streams & Logs with Storm and Kafka by Andrew Montalenti and Keith ...PyData
 
Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create PyData
 
Crushing the Head of the Snake by Robert Brewer PyData SV 2014
Crushing the Head of the Snake by Robert Brewer PyData SV 2014Crushing the Head of the Snake by Robert Brewer PyData SV 2014
Crushing the Head of the Snake by Robert Brewer PyData SV 2014PyData
 
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...PyData
 
You Give Me Data, I Give You Art by Eric Drass - PyData London 2014
You Give Me Data, I Give You Art by Eric Drass - PyData London 2014You Give Me Data, I Give You Art by Eric Drass - PyData London 2014
You Give Me Data, I Give You Art by Eric Drass - PyData London 2014PyData
 
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...PyData
 
Joshua Bloom Data Science at Berkeley
Joshua Bloom Data Science at BerkeleyJoshua Bloom Data Science at Berkeley
Joshua Bloom Data Science at BerkeleyPyData
 
ggplot for python SV 2014
ggplot for python SV 2014ggplot for python SV 2014
ggplot for python SV 2014PyData
 
Xray: extended arrays for scientific datasets by Stephan Hoyer PyData SV 2014
Xray: extended arrays for scientific datasets by Stephan Hoyer PyData SV 2014Xray: extended arrays for scientific datasets by Stephan Hoyer PyData SV 2014
Xray: extended arrays for scientific datasets by Stephan Hoyer PyData SV 2014PyData
 
Speed Without Drag by Saul Diez-Guerra PyData SV 2014
Speed Without Drag by Saul Diez-Guerra PyData SV 2014Speed Without Drag by Saul Diez-Guerra PyData SV 2014
Speed Without Drag by Saul Diez-Guerra PyData SV 2014PyData
 
How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...PyData
 
Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis PyData
 
Evolutionary Algorithms: Perfecting the Art of "Good Enough"
Evolutionary Algorithms: Perfecting the Art of "Good Enough"Evolutionary Algorithms: Perfecting the Art of "Good Enough"
Evolutionary Algorithms: Perfecting the Art of "Good Enough"PyData
 
Idai Guertel- Self Exploring Robots
Idai Guertel- Self Exploring RobotsIdai Guertel- Self Exploring Robots
Idai Guertel- Self Exploring RobotsPyData
 
Philippe Bracke- Estimating Residential Land Prices in the UK
Philippe Bracke- Estimating Residential Land Prices in the UKPhilippe Bracke- Estimating Residential Land Prices in the UK
Philippe Bracke- Estimating Residential Land Prices in the UKPyData
 
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014PyData
 
Social Media Brand Positioning Workflow- David Gerson
Social Media Brand Positioning Workflow- David GersonSocial Media Brand Positioning Workflow- David Gerson
Social Media Brand Positioning Workflow- David GersonPyData
 
Python resampling
Python resamplingPython resampling
Python resamplingPyData
 
James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with...
James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with...James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with...
James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with...PyData
 
Straight, white & male-Being an ally in diversity: Tony Wieczorek
Straight, white & male-Being an ally in diversity: Tony WieczorekStraight, white & male-Being an ally in diversity: Tony Wieczorek
Straight, white & male-Being an ally in diversity: Tony WieczorekPyData
 

En vedette (20)

Real-time Streams & Logs with Storm and Kafka by Andrew Montalenti and Keith ...
Real-time Streams & Logs with Storm and Kafka by Andrew Montalenti and Keith ...Real-time Streams & Logs with Storm and Kafka by Andrew Montalenti and Keith ...
Real-time Streams & Logs with Storm and Kafka by Andrew Montalenti and Keith ...
 
Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create Danny Bickson - Python based predictive analytics with GraphLab Create
Danny Bickson - Python based predictive analytics with GraphLab Create
 
Crushing the Head of the Snake by Robert Brewer PyData SV 2014
Crushing the Head of the Snake by Robert Brewer PyData SV 2014Crushing the Head of the Snake by Robert Brewer PyData SV 2014
Crushing the Head of the Snake by Robert Brewer PyData SV 2014
 
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
 
You Give Me Data, I Give You Art by Eric Drass - PyData London 2014
You Give Me Data, I Give You Art by Eric Drass - PyData London 2014You Give Me Data, I Give You Art by Eric Drass - PyData London 2014
You Give Me Data, I Give You Art by Eric Drass - PyData London 2014
 
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
 
Joshua Bloom Data Science at Berkeley
Joshua Bloom Data Science at BerkeleyJoshua Bloom Data Science at Berkeley
Joshua Bloom Data Science at Berkeley
 
ggplot for python SV 2014
ggplot for python SV 2014ggplot for python SV 2014
ggplot for python SV 2014
 
Xray: extended arrays for scientific datasets by Stephan Hoyer PyData SV 2014
Xray: extended arrays for scientific datasets by Stephan Hoyer PyData SV 2014Xray: extended arrays for scientific datasets by Stephan Hoyer PyData SV 2014
Xray: extended arrays for scientific datasets by Stephan Hoyer PyData SV 2014
 
Speed Without Drag by Saul Diez-Guerra PyData SV 2014
Speed Without Drag by Saul Diez-Guerra PyData SV 2014Speed Without Drag by Saul Diez-Guerra PyData SV 2014
Speed Without Drag by Saul Diez-Guerra PyData SV 2014
 
How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...
 
Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis Mike Pittaro - High Performance Hardware for Data Analysis
Mike Pittaro - High Performance Hardware for Data Analysis
 
Evolutionary Algorithms: Perfecting the Art of "Good Enough"
Evolutionary Algorithms: Perfecting the Art of "Good Enough"Evolutionary Algorithms: Perfecting the Art of "Good Enough"
Evolutionary Algorithms: Perfecting the Art of "Good Enough"
 
Idai Guertel- Self Exploring Robots
Idai Guertel- Self Exploring RobotsIdai Guertel- Self Exploring Robots
Idai Guertel- Self Exploring Robots
 
Philippe Bracke- Estimating Residential Land Prices in the UK
Philippe Bracke- Estimating Residential Land Prices in the UKPhilippe Bracke- Estimating Residential Land Prices in the UK
Philippe Bracke- Estimating Residential Land Prices in the UK
 
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
Pythran: Static compiler for high performance by Mehdi Amini PyData SV 2014
 
Social Media Brand Positioning Workflow- David Gerson
Social Media Brand Positioning Workflow- David GersonSocial Media Brand Positioning Workflow- David Gerson
Social Media Brand Positioning Workflow- David Gerson
 
Python resampling
Python resamplingPython resampling
Python resampling
 
James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with...
James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with...James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with...
James Horey (OpenCore.io) Ferry - Share and Deploy Big Data Applications with...
 
Straight, white & male-Being an ally in diversity: Tony Wieczorek
Straight, white & male-Being an ally in diversity: Tony WieczorekStraight, white & male-Being an ally in diversity: Tony Wieczorek
Straight, white & male-Being an ally in diversity: Tony Wieczorek
 

Similaire à Measuring the Digital Economy using Big Data by Prash Majmudar

Industrial internet big data german market study
Industrial internet big data german market studyIndustrial internet big data german market study
Industrial internet big data german market studyBusiness Finland
 
Industrial internet big data german market study
Industrial internet big data german market studyIndustrial internet big data german market study
Industrial internet big data german market studySari Ojala
 
Germany- ICT Opportunities & Business Analysis
Germany- ICT Opportunities & Business AnalysisGermany- ICT Opportunities & Business Analysis
Germany- ICT Opportunities & Business AnalysisRahil Pathan
 
Data-Driven Value Generation. Is it Possible?
Data-Driven Value Generation. Is it Possible?Data-Driven Value Generation. Is it Possible?
Data-Driven Value Generation. Is it Possible?M2M Alliance e.V.
 
Management Information Systems - MIS Lectures - Day 1 cio and mis - part 1
Management Information Systems - MIS Lectures - Day 1   cio and mis - part 1Management Information Systems - MIS Lectures - Day 1   cio and mis - part 1
Management Information Systems - MIS Lectures - Day 1 cio and mis - part 1Foreign Trade University - Hanoi
 
Data sharing between private companies and research facilities
Data sharing between private companies and research facilitiesData sharing between private companies and research facilities
Data sharing between private companies and research facilitiesInstitute of Contemporary Sciences
 
Artificial Intelligence (AI) Startup Business Plan Purple variant by Slidesgo...
Artificial Intelligence (AI) Startup Business Plan Purple variant by Slidesgo...Artificial Intelligence (AI) Startup Business Plan Purple variant by Slidesgo...
Artificial Intelligence (AI) Startup Business Plan Purple variant by Slidesgo...huyminh802
 
TM Forum AI Program Overview
TM Forum AI Program OverviewTM Forum AI Program Overview
TM Forum AI Program OverviewTMForum
 
Are manufacturing companies ready to go digital capgemini consulting - digi...
Are manufacturing companies ready to go digital   capgemini consulting - digi...Are manufacturing companies ready to go digital   capgemini consulting - digi...
Are manufacturing companies ready to go digital capgemini consulting - digi...Rick Bouter
 
Business models for business processes on IoT
Business models for business processes on IoTBusiness models for business processes on IoT
Business models for business processes on IoTFabMinds
 
Devoteam itsmf 2021 - from business automation to continuous value-driven i...
Devoteam   itsmf 2021 - from business automation to continuous value-driven i...Devoteam   itsmf 2021 - from business automation to continuous value-driven i...
Devoteam itsmf 2021 - from business automation to continuous value-driven i...itSMF Belgium
 
Thinking out of the toolbox exec report - IBM
Thinking out of the toolbox exec report - IBMThinking out of the toolbox exec report - IBM
Thinking out of the toolbox exec report - IBMSusanna Harper
 
Accountant302018presentatie hs march122018
Accountant302018presentatie hs march122018Accountant302018presentatie hs march122018
Accountant302018presentatie hs march122018drs Pieter de Kok RA
 
Industrial Internet of Things (IIoT) for Automotive Paint Shop Operations
Industrial Internet of Things (IIoT) for Automotive Paint Shop OperationsIndustrial Internet of Things (IIoT) for Automotive Paint Shop Operations
Industrial Internet of Things (IIoT) for Automotive Paint Shop OperationsRam Shetty
 

Similaire à Measuring the Digital Economy using Big Data by Prash Majmudar (20)

Industrial internet big data german market study
Industrial internet big data german market studyIndustrial internet big data german market study
Industrial internet big data german market study
 
Industrial internet big data german market study
Industrial internet big data german market studyIndustrial internet big data german market study
Industrial internet big data german market study
 
Germany- ICT Opportunities & Business Analysis
Germany- ICT Opportunities & Business AnalysisGermany- ICT Opportunities & Business Analysis
Germany- ICT Opportunities & Business Analysis
 
Day 1 cio and mis - part 1
Day 1   cio and mis - part 1Day 1   cio and mis - part 1
Day 1 cio and mis - part 1
 
Data-Driven Value Generation. Is it Possible?
Data-Driven Value Generation. Is it Possible?Data-Driven Value Generation. Is it Possible?
Data-Driven Value Generation. Is it Possible?
 
Management Information Systems - MIS Lectures - Day 1 cio and mis - part 1
Management Information Systems - MIS Lectures - Day 1   cio and mis - part 1Management Information Systems - MIS Lectures - Day 1   cio and mis - part 1
Management Information Systems - MIS Lectures - Day 1 cio and mis - part 1
 
Day 1 cio and mis - part 1
Day 1   cio and mis - part 1Day 1   cio and mis - part 1
Day 1 cio and mis - part 1
 
Data sharing between private companies and research facilities
Data sharing between private companies and research facilitiesData sharing between private companies and research facilities
Data sharing between private companies and research facilities
 
Artificial Intelligence (AI) Startup Business Plan Purple variant by Slidesgo...
Artificial Intelligence (AI) Startup Business Plan Purple variant by Slidesgo...Artificial Intelligence (AI) Startup Business Plan Purple variant by Slidesgo...
Artificial Intelligence (AI) Startup Business Plan Purple variant by Slidesgo...
 
Business case of IBM
Business case of IBMBusiness case of IBM
Business case of IBM
 
TM Forum AI Program Overview
TM Forum AI Program OverviewTM Forum AI Program Overview
TM Forum AI Program Overview
 
Are manufacturing companies ready to go digital capgemini consulting - digi...
Are manufacturing companies ready to go digital   capgemini consulting - digi...Are manufacturing companies ready to go digital   capgemini consulting - digi...
Are manufacturing companies ready to go digital capgemini consulting - digi...
 
IT ITES
IT ITESIT ITES
IT ITES
 
Business models for business processes on IoT
Business models for business processes on IoTBusiness models for business processes on IoT
Business models for business processes on IoT
 
Devoteam itsmf 2021 - from business automation to continuous value-driven i...
Devoteam   itsmf 2021 - from business automation to continuous value-driven i...Devoteam   itsmf 2021 - from business automation to continuous value-driven i...
Devoteam itsmf 2021 - from business automation to continuous value-driven i...
 
101013_01 (1).pdf
101013_01 (1).pdf101013_01 (1).pdf
101013_01 (1).pdf
 
Thinking out of the toolbox exec report - IBM
Thinking out of the toolbox exec report - IBMThinking out of the toolbox exec report - IBM
Thinking out of the toolbox exec report - IBM
 
Lec 8
Lec 8Lec 8
Lec 8
 
Accountant302018presentatie hs march122018
Accountant302018presentatie hs march122018Accountant302018presentatie hs march122018
Accountant302018presentatie hs march122018
 
Industrial Internet of Things (IIoT) for Automotive Paint Shop Operations
Industrial Internet of Things (IIoT) for Automotive Paint Shop OperationsIndustrial Internet of Things (IIoT) for Automotive Paint Shop Operations
Industrial Internet of Things (IIoT) for Automotive Paint Shop Operations
 

Plus de PyData

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...PyData
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshPyData
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...PyData
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerPyData
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...PyData
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottPyData
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroPyData
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...PyData
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPyData
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydPyData
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverPyData
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldPyData
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardPyData
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
 

Plus de PyData (20)

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 

Dernier

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Dernier (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

Measuring the Digital Economy using Big Data by Prash Majmudar

  • 1. Measuring the digital economy using big data Prash Majmudar – Growth Intelligence @growthintel @prashmaj
  • 2. Overview • Background • Approach (Data + Python) • Sizing the economy - Results • Examples
  • 4. Project background • Research project supported by NESTA, Google • Worked with independent economists at the National Institute of Economic and Social Research (NIESR) – Max Nathan, Anna Rosso • Published report in 2013 • Further phases of work underway
  • 5. 5 Research questions • What’s the most appropriate definition of UK ‘digital companies’? Cleaner definitions, company counts • What do the UK’s ‘digital companies’ (really) look like? Key characteristics, focus on start-ups, innovating and ‘high- growth’ companies, spatial footprint • What drives innovation and/or high-growth status in digital companies? Performance analysis and characteristics. Sample historic data to investigate causality
  • 6. Why? • The digital economy is poorly served by conventional definitions and datasets. • Reliance on Companies House (historic data) • Standard definitions used for: – Credit / risk – Government policy (e.g. focus on Tech City) – Economic productivity measures – Companies that sell / market to other companies
  • 7. SIC - Standard Industrial Classification • Brought into being in 1948 – Since 1948 the classification has been revised in 1958, 1968, 1980, 1992, 1997, and 2003 • Latest version is “SIC 2007” – adopted by UK in 2008. – adopted by Companies House in October 2011. • 731 SIC codes, but not without issues – Self-classification – Emerging sectors e.g. no codes for Nanotechnology
  • 8. SIC • 77220 Renting of video tapes and disks • 81223 Furnace and chimney cleaning services • 01440 Raising of camels and camelids • 32110 Striking of coins – Royal Mint • 38310 Dismantling of wrecks • 01260 Growing of oleaginous fruits • 82990 Other business support service activities n.e.c. – 10% of Businesses • 20% not classified
  • 9.
  • 10. Challenge • The ‘digital economy’ is not straightforward to define • Refers to: – a set of sectors, – a set of outputs (products and services), – and a set of inputs (production and distribution tools, underpinned by information and communication technologies). • Mapping the digital economy onto industries is necessarily imprecise. • Government defines it as ‘information’ and ‘digital content’ industries (BIS 2012, 2013) • Data driven methods can provide richer, more informative and more up to date analysis.
  • 12. All Companies in the Economy ~ 3M companies Online activity News / Events Technologies Classifications Financials TMs / Patents UNUSUAL DATA Trade activity UNIQUE DATA COMPANIES USER DATA Linked datasets and algorithms Enterprise users Tech Users Medium company users
  • 13. Approach • Classification system is multi-dimensional: – Sector: vertical they operate in – Product type: principal output (services / physical goods) – Client type: business or consumer focussed – Sales process: how they sell / route to market
  • 14. IT Film Telco Publishing Oil & Gas Architecture Software – web Consultancy Hardware / tools Electronics Media distribution
  • 15. Approach Crowd sourced labelled data Crawl / APIs Pre-labelled data Feature generation / selection Model training Feature Extraction / pre-processing Scrapy Processing Python scikit-learn / pandas Training set
  • 16. Building training sets Crowd sourcing – create classification tasks Expert panels Pre-labelled data • Using crowd sourcing – Users follow pre-defined instructions – are rewarded for successfully completing tasks – Can put in place qualification tests etc. – Vote to produce labels – majority of 5 • Used expert panel when large number of classes
  • 17. Feature engineering – Multiple sources of features • Free text (News / Web) • Structured datasets (e.g. patent filings etc.) – Cleaning data • Malformed HTML • Stripping out HTML, Javascript – Tokenising and calculating TF-IDF weights
  • 18. Modelling • Supervised learning classification problem • Scikit learn (fast iteration on different models). Use of Linear SVMs and processing pipelines – One vs many classifier • Pandas plays well here – can quickly build up feature sets • Large number of features (thousands) – linear models are fast.
  • 19. 0 0.2 0.4 0.6 0.8 1 1.2 1.4 cables smes termination ip networking server sap consultant ethernet installer fault cloud remote setup ict servers copper telecom wireless hardware conferencing desk disruption crm infrastructure hosting fibre cisco switches cabling 0 0.2 0.4 0.6 0.8 1 1.2 1.4 luxurious quantity footwear collection cotton courier shirts stockists cart logo satin wholesale hats nylon wear workwear bridal womens designs socks accessories lace mens clothing fashion apparel FashionComputer networking clf.coef_
  • 20. Summary • Use multiple datasets as an input • Build multi-class classifiers for sector, product, client, sales process • Apply classifiers to 3M companies in the UK
  • 22. Challenges • Sole traders are not observed • Registered company addresses are not always trading addresses • Understanding company structure • Employee coverage is limited – gaps in data due to reliance on historic filing data traditionally
  • 23. 23 Cleaning the company data • Aim = build a benchmarking sample • Include only observations with SIC and GI info => smaller than ‘true’ - Step 1: drop non-trading, dormant, dissolved companies or those in administration - Step 2: drop holding companies - Step 3: identify groups of linked companies (via name, postcode), keep the unit that reports highest revenue • Benchmarking sample = 1.868m companies • Validate ‘true’ sample (2.254m) vs. BPS enterprise counts
  • 24. 24 Identifying ‘digital companies’ • Aim = more robust definition, compare against SIC-based • Use ‘sector’ and ‘product’ categories • Intuition = we want companies in ‘digital’ sectors’ that also do ‘digital’ things (e.g. digital publishing, media, design …) - Step 1: Identify GI sector and product categories - Steps 2-5: clean out ‘non-digital’ GI sectors, products combinations - Step 6: Count companies - E.g. Process designed to exclude large proportion of architecture firms, except those whose principal product type is software for CAD / technical drawing
  • 25. 25 Company counts Observations % A. SIC 07 Other 1,681,151 89.96 Digital Economy 187,616 10.04 B.GI sector and product Other 1,599,072 85.57 Digital Economy 269,695 14.43 Note: Panel A follows the BIS (2009) definition. Panel B defines the digital economy using GI digital sector by digital product "cells".
  • 26. Classifications: Sector – Oil and Energy Product – Computer Software Client – Businesses Sales process – Project Based in Aberdeen SIC Code: 82990 - Other business support service activities
  • 27. Company counts are highest in London. But we also find large counts in Manchester, Birmingha m, Bristol and Brighton ... ... as well as the wider Greater South East.
  • 28. 280.000 0.200 0.400 0.600 0.800 1.000 1.200 1.400 1.600 1.800 Livingston & Bathgate Crawley Oxford Southampton Coventry Middlesbrough & Stockton Cheltenham & Evesham Swindon Cambridge Andover Brighton Bournemouth Wycombe & Slough Luton & Watford Stevenage Guildford & Aldershot Poole Milton Keynes & Aylesbury Newbury Reading & Bracknell Basingstoke
  • 29. Guildford consultancy custom software development digital media media distribution peer to peer communicati ons photography printing services software desktop or server software web application web hosting animation 1 architecture 178 computer games 2 80 computer hardware 12 7 1 computer network security 7 1 computer networking 23 5 computer software 88 459 70 defense space 37 electrical electronic manufacturing 13 72 1 entertainment film production 6 33 financial services 820 information services 8 3 information technology 2756 6 94 internet 14 15 1 16 marketing advertising 192 photography 74 7 1 printing 12 2 63 publishing 29 semiconductors 3 telecommunications 58 9 31 1 1
  • 31. 31 Digital companies’ revenue growth in 2010-2012 is faster than non-digital ... A. Annual Revenues B. Annual Revenue Growth mean median mean median Other 18,380,097 110,048 15.68 1.70 Digital Economy 10,547,218 123,388 20.21 4.17 Note: Sub-sample of those companies who report revenue. Companies House average revenues are averaged over the period 2010 to 2012. If for each company there is more than one observation, only the most recent is kept. Average annual revenue growth is computed on a smaller sample, as information for at least two consecutive years is needed.
  • 32. 32 ... and digital employers have higher average staff levels. Employees per company Mean Median % of all employment A. Official / SIC07 Other 20.94 4 94.92 Digital Economy 17.23 3 5.08 B. GI sector and product Other 20.40 4 88.67 Digital Economy 23.37 4 11.33 Note: sub-sample of firms reporting employment to Companies House. Data is averaged over 2010-2012.
  • 33. Further work • Drivers of innovation / growth • Use of ‘tags’ to provide further descriptive analysis of digital companies • Unsupervised approach to identify clusters • Extension to sole traders • Extending this approach to Europe – e.g. Belgium, France, Germany, Italy
  • 35. SIC – ICT Sector 28230 MANUFACTURE OF OFFICE MACHINERY AND COMPUTERS 26200 MANUFACTURE OF COMPUTERS AND OTHER INFORMATION PROCESSING EQUIPMENT 27320 INSULATED WIRE AND CABLE 26110 ELECTRONIC VALVES AND TUBES AND OTHER ELECTRONIC COMPONENTS 33200 TELEVISION, RADIO TRANSMITTERS AND APPARATUS FOR TELEPHONY AND TELEGRAPHY 26400 TELEVISION AND RADIO RECEIVERS, SOUND OR VIDEO RECORDING OR PRODUCING APPARATUS AND ASSOCIATED GOODS 26511 INSTRUMENTS AND APPLIANCES FOR MEASURING, CHECKING, TESTING AND NAVIGATING AND OTHER PURPOSES 26512 INDUSTRIAL PROCESS EQUIPMENT 46439 WHOLESALE OF ELECTRICAL HOUSEHOLD APPLIANCES 46510 WHOLESALE OF COMPUTERS, COMPUTER PERIPHERAL EQUIPMENT AND SOFTWARE 46660 WHOLESALE OF OTHER OFFICE MACHINERY AND EQUIPMENT 46520 WHOLESALE OF OTHER ELECTRONIC PARTS AND EQUIPMENT 46690 WHOLESALE OF OTHER MACHINERY FOR USE IN INDUSTRY, TRADE AND NAVIGATION 61900 TELECOMMUNICATIONS SERVICES 77330 RENTING OF OFFICE MACHINERY AND EQUIPMENT INCLUDING COMPUTERS 62020 COMPUTER HARDWARE CONSULTANCY 95110 MAINTENANCE AND REPAIR OF OFFICE, ACCOUNTING AND COMPUTING MACHINERY 62090 OTHER COMPUTER RELATED ACTIVITIES
  • 36. SIC – Digital content industries 58110 PUBLISHING OF BOOKS 58130 PUBLISHING OF NEWSPAPERS 58142 PUBLISHING OF JOURNALS AND PERIODICALS 59200 PUBLISHING OF SOUND RECORDINGS 58190 OTHER PUBLISHING 18110 PRINTING OF NEWSPAPERS 18129 PRINTING N.E.C 18130 PRE-PRESS ACTIVITIES 18130 ANCILLARY ACTIVITIES RELATING TO PRINTING 18201 REPRODUCTION OF SOUND RECORDING 18202 REPRODUCTION OF VIDEO RECORDING 18203 REPRODUCTION OF COMPUTER MEDIA 58290 PUBLISHING OF SOFTWARE 62020 OTHER SOFTWARE CONSULTANCY AND SUPPLY 63110 DATA PROCESSING 63110 DATABASE ACTIVITIES 73110 ADVERTISING 74209 PHOTOGRAPHIC ACTIVITIES 59111 MOTION PICTURE AND VIDEO PRODUCTION 59131 MOTION PICTURE AND VIDEO DISTRIBUTION 59140 MOTION PICTURE PROJECTION 59113 RADIO & TV (DCMS ESTIMATES) 63910 NEWS AGENCY ACTIVITIES