SlideShare une entreprise Scribd logo
1  sur  55
Spreadsheet Engineering
Jácome Cunha
jacome@di.uminho.pt
www.di.uminho.pt/~jacome
Researcher @

HASLab / INESC TEC & Universidade do Minho, Portugal
Oregon State University (office 2077)
OSU - EECS Colloquium - 02/24/14
Agenda
I.

Motivation

II.

Spreadsheets Meet Models

III.

Models for Spreadsheets – ClassSheets

IV.

Inferring ClassSheets

V.

Embedding ClassSheets

VI.

Evolution!

VII.

Model-Driven Spreadsheets

VIII.

Summary
2
I. Motivation

3
4
Why do Spreadsheets matter?

Financial intelligence firm CODA reports that
95% of all U.S. firms use spreadsheets for
financial reporting.

Sarbanes-Oxley: What About all the Spreadsheets?, Raymond R.
Panko and Nicholas Ordway, 2008

5
Why do Spreadsheets matter?

They are the programming language of choice by nonprofessional programmers, a.k.a. end users.
In the U.S. alone, the number of end-user
programmers is conservatively estimated at 11 million,
compared to only 2.75 million other, professional
programmers.
Estimating the numbers of end users and end-user programmers,
Christopher Scaffidi, Mary Shaw, and Brad Myers, VL/HCC 2005
6
7
Why do Spreadsheets matter?

In 2004, RevenueRecognition.com (now
Softtrax) had the International Data Corporation
interview 118 business leaders.
IDC found that 85% were using spreadsheets in
financial reporting and forecasting.

Sarbanes-Oxley: What About all the Spreadsheets?, Raymond R.
Panko and Nicholas Ordway, 2008

8
In fact, spreadsheets lack:
●

Abstraction
●

●

Encapsulation

Type system
●

Testing
●

●

IDE

...
9
And the consequences may be...
Around 200 people who thought their only
experience of the London 2012
Olympic Games would be minor heats of
synchronised swimming have
received an unexpected upgrade to the men’s
100m final following an
embarrassing ticketing mistake.
...
Locog said the error occurred in the summer,
between the first and
second round of ticket sales, when a member of
staff made a single
keystroke mistake and entered ‘20,000’ into a
spreadsheet rather than
the correct figure of 10,000 remaining tickets.
The Telegraph, 04 January 2012

10
And the consequences may be...
In a 2010 paper* Carmen Reinhart, now a professor at
Harvard Kennedy School, and Kenneth Rogoff, an
economist at Harvard University...argued that GDP
growth slows to a snail’s pace once government-debt
levels exceed 90% of GDP. The 90% figure quickly
became ammunition in political arguments over
austerity...This week a new piece of research poured fuel
on the fire by calling the 90% finding into question..

Economy losses of $10 billion/year!

The Economist, 17 April 2013

Harvard University economists Carmen Reinhart
and Kenneth Rogoff have acknowledged making
a spreadsheet calculation mistake in a 2010
research paper, “Growth in a Time of Debt”,
which has been widely cited to justify budgetcutting.
Business Week, 18 April 2013

11
II. Spreadsheets Meet Models
[PEPM'09, VL/HCC'10]

12
Why Models?

13
Spreadsheet Example

14
Functional Dependency?

→

↛
15
Functional Dependencies

●

●

●

We compute the business logic from the data, by
inferring FDs
They are the building blocks inferring models for
(legacy) spreadsheets
The better the FDs we infer, the better the model we
compute!
16
Too Many??
["A"] -> ["B","C","D","E","F"]
["C"] -> ["A","B","D","E","F"]
["D"] -> ["A","B","C","E","F"]
["E"] -> ["A","B","C","D","F"]
["F"] -> ["A","B","C","D","E"]
["G"] -> ["H","I","J"]
["H"] -> ["G","I","J"]
["I"] -> ["G","H","J"]
["J"] -> ["G","H","I"]
["K"] -> ["L","M"]
["L"] -> ["K","M"]
["M"] -> ["K","L"]
["B","K"] -> ["A","C","D","E","F"]
["B","L"] -> ["A","C","D","E","F"]
["B","M"] -> ["A","C","D","E","F"]

17
Accidents happen

●

●

We use a data mining algorithm which produces to
many accidental FDs!

We introduce some spreadsheet specific heuristics to
filter out “accidental” FDs

18
Organize them
●

●

Label semantics: often keys are labeled “code” or “id”
Label arrangement: we prefer FDs respecting the
order of columns

●

Antecedent size: small keys are preferable

●

Ratio: small ratio between keys and non-keys

●

Single value columns: columns always with the
same value appear in too many FDs

19
Final set

Pilot-Id → Pilot-Name, Phone
N-Number → Model, Plane-Name
Pilot-Id, N-Number, Depart, Destination, Date, Hours → {}
20
The first model: a relational model
Having computed the FDs, we can now use the FUN
algorithm to produce a relational model for the
spreadsheet:

Pilots (Pilot-Id, Pilot-Name, Phone)
Planes (N-Number, Model, Plane-Name)
<Flights> (#Pilot-Id,# N-Number, Depart, Destination, Date Hours)
21
III. Models for Spreadsheet – ClassSheets
Engels and Erwig ASE'05

22
ClassSheets - Models for Spreadsheets
ClassSheets are a high-level, object-oriented formalism to
specify spreadsheets

23
ClassSheets - Models for Spreadsheets

24
ClassSheets - Models for Spreadsheets

25
IV. Inferring ClassSheets
[VL/HCC'10]

26
ClassSheet Inference

27
V. Embedding ClassSheets
[VL/HCC'11]

28
Why the Embedding?

29
Embedding...
●

Embedding a language into another language is
a recurring strategy (e.g. for DSLs)
–

–

Users are used to the host language and do not
need to learn a (complete) new language :-)

–

Implementation effort is much reduced :-)

–

●

Embedded language inherit all the power of the
host language :-)

It may have some restrictions :-(

We embedded ClassSheets in traditional
spreadsheet systems

30
Vertically Expandable Tables

31
Horizontally Expandable Tables

32
Relationship Tables

33
34
VI. Evolution!
[FASE'11, ICMT'12]

35
Why do Spreadsheet Models Need Evolution?
●

●

Suppose now you need to add new information
to the spreadsheet
For instance, the number of passengers of
each flight

●

It would require to do several error-prone tasks

●

Add columns, labels, update formulas, etc.

●

We can do it automatically!

36
Why do Spreadsheet Instances Need Evolution?

●

●

●

Some evolution steps are easier to perform on
the instance
For instance, to add a column to one of the
repetition blocks
People felt the need to evolve the data
37
38
Bidirectional Transformation System

39
(Data) Operations on Instances

40
(Model) Operations on ClassSheets

41
Bidirectional Transformation Functions

42
Compositional
Example: Add a Column and a Class

43
VII. MDSheet – Model-Driven Spreadsheets
[ICSE'12]

44
MDSheet Tool

http://youtu.be/6LNdTdCpV2U

45
●

Available at http://ssaapp.di.uminho.pt

●

Built out of 7886 LOC:
–

3181 in Haskell, for the inference and evolution

–

980 in Basic, for the embedding

–

2884 in C++, for gluing all components

–

340 in Perl, for compilation and setup

–

722, for makefiles
46
VIII. Summary

47
Acknowledgments

This work has been done in collaboration with
many people:
Martin Erwig, João Paulo Fernandes, Jorge
Mendes, Hugo Pacheco, Rui Pereira, João
Saraiva, Joost Visser

48
Thanks!
Questions?

49
More?
●

More at http://ssaapp.di.uminho.pt

●

Querying model-driven spreadsheet

●

Visually querying model-driven spreadsheets

●

Detections of bad smells

●

Edit assistance

●

Empirical validations

●

Variational spreadsheets (@ OSU)

●

...

50
Does It Work?

51
Empirical Study Settings

●

17 student from a MSc course

●

2 different spreadsheets
–

Microsoft budget

–

Local company responsible for water supply of
Braga, Portugal - agere
52
Study Setting

●

Hypotheses:
(1) In order to perform a given set of tasks,
users spend less time when using modeldriven spreadsheets instead of plain ones.
(2) Spreadsheets developed in the modeldriven environment hold less errors than plain
ones.
53
Main Results
Number of tasks performed on the MS spreadsheet

54
Main Results
Error rate in the budget spreadsheet

55

Contenu connexe

Similaire à Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14

Spreadsheet Engineering
Spreadsheet EngineeringSpreadsheet Engineering
Spreadsheet EngineeringJácome Cunha
 
Summer School DSL 2013 - SpreadSheet Engineering
Summer School DSL 2013 - SpreadSheet EngineeringSummer School DSL 2013 - SpreadSheet Engineering
Summer School DSL 2013 - SpreadSheet EngineeringJácome Cunha
 
Varied encounters with data science (slide share)
Varied encounters with data science (slide share)Varied encounters with data science (slide share)
Varied encounters with data science (slide share)gilbert.peffer
 
Best Practices in Software Cost Estimation - Metrikon 2015 - Frank Vogelezang
Best Practices in Software Cost Estimation - Metrikon 2015 - Frank VogelezangBest Practices in Software Cost Estimation - Metrikon 2015 - Frank Vogelezang
Best Practices in Software Cost Estimation - Metrikon 2015 - Frank VogelezangFrank Vogelezang
 
Model-Driven Spreadsheet Development
Model-Driven Spreadsheet DevelopmentModel-Driven Spreadsheet Development
Model-Driven Spreadsheet DevelopmentJácome Cunha
 
The Study of the Large Scale Twitter on Machine Learning
The Study of the Large Scale Twitter on Machine LearningThe Study of the Large Scale Twitter on Machine Learning
The Study of the Large Scale Twitter on Machine LearningIRJET Journal
 
Computer aided design, computer aided manufacturing, computer aided engineering
Computer aided design, computer aided manufacturing, computer aided engineeringComputer aided design, computer aided manufacturing, computer aided engineering
Computer aided design, computer aided manufacturing, computer aided engineeringuniversity of sust.
 
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...DataBench
 
51477813 45498199-sonia-f-sap-fico-project
51477813 45498199-sonia-f-sap-fico-project51477813 45498199-sonia-f-sap-fico-project
51477813 45498199-sonia-f-sap-fico-projectArup Bose, PMP
 
Eng Cal Reduced1
Eng Cal Reduced1Eng Cal Reduced1
Eng Cal Reduced1rtmote
 
Model-driven Spreadsheets
Model-driven SpreadsheetsModel-driven Spreadsheets
Model-driven SpreadsheetsJácome Cunha
 
Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...
Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...
Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...DataBench
 
New Developments in New-Product Development
New Developments in New-Product DevelopmentNew Developments in New-Product Development
New Developments in New-Product DevelopmentPTC
 
Recent DEA Applications to Industry: A Literature Review From 2010 To 2014
Recent DEA Applications to Industry: A Literature Review From 2010 To 2014Recent DEA Applications to Industry: A Literature Review From 2010 To 2014
Recent DEA Applications to Industry: A Literature Review From 2010 To 2014inventionjournals
 
MGT 410Homework Set 3Provide a short answer to each of the f
MGT 410Homework Set 3Provide a short answer to each of the fMGT 410Homework Set 3Provide a short answer to each of the f
MGT 410Homework Set 3Provide a short answer to each of the fDioneWang844
 
Dutch Industry Experience - Presentatie Jan Baan
Dutch Industry Experience - Presentatie Jan BaanDutch Industry Experience - Presentatie Jan Baan
Dutch Industry Experience - Presentatie Jan BaanSalesforce_Benelux
 
Machine Learning for Finance Master Class
Machine Learning for Finance Master Class Machine Learning for Finance Master Class
Machine Learning for Finance Master Class QuantUniversity
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLPaco Nathan
 

Similaire à Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14 (20)

Spreadsheet Engineering
Spreadsheet EngineeringSpreadsheet Engineering
Spreadsheet Engineering
 
Summer School DSL 2013 - SpreadSheet Engineering
Summer School DSL 2013 - SpreadSheet EngineeringSummer School DSL 2013 - SpreadSheet Engineering
Summer School DSL 2013 - SpreadSheet Engineering
 
Varied encounters with data science (slide share)
Varied encounters with data science (slide share)Varied encounters with data science (slide share)
Varied encounters with data science (slide share)
 
Best Practices in Software Cost Estimation - Metrikon 2015 - Frank Vogelezang
Best Practices in Software Cost Estimation - Metrikon 2015 - Frank VogelezangBest Practices in Software Cost Estimation - Metrikon 2015 - Frank Vogelezang
Best Practices in Software Cost Estimation - Metrikon 2015 - Frank Vogelezang
 
Model-Driven Spreadsheet Development
Model-Driven Spreadsheet DevelopmentModel-Driven Spreadsheet Development
Model-Driven Spreadsheet Development
 
The Study of the Large Scale Twitter on Machine Learning
The Study of the Large Scale Twitter on Machine LearningThe Study of the Large Scale Twitter on Machine Learning
The Study of the Large Scale Twitter on Machine Learning
 
Computer aided design, computer aided manufacturing, computer aided engineering
Computer aided design, computer aided manufacturing, computer aided engineeringComputer aided design, computer aided manufacturing, computer aided engineering
Computer aided design, computer aided manufacturing, computer aided engineering
 
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
Benchmarking for Big Data Applications with the DataBench Framework, Arne Ber...
 
51477813 45498199-sonia-f-sap-fico-project
51477813 45498199-sonia-f-sap-fico-project51477813 45498199-sonia-f-sap-fico-project
51477813 45498199-sonia-f-sap-fico-project
 
Eng Cal Reduced1
Eng Cal Reduced1Eng Cal Reduced1
Eng Cal Reduced1
 
Model-driven Spreadsheets
Model-driven SpreadsheetsModel-driven Spreadsheets
Model-driven Spreadsheets
 
Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...
Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...
Improving Business Performance Through Big Data Benchmarking, Todor Ivanov, B...
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
 
New Developments in New-Product Development
New Developments in New-Product DevelopmentNew Developments in New-Product Development
New Developments in New-Product Development
 
Recent DEA Applications to Industry: A Literature Review From 2010 To 2014
Recent DEA Applications to Industry: A Literature Review From 2010 To 2014Recent DEA Applications to Industry: A Literature Review From 2010 To 2014
Recent DEA Applications to Industry: A Literature Review From 2010 To 2014
 
MGT 410Homework Set 3Provide a short answer to each of the f
MGT 410Homework Set 3Provide a short answer to each of the fMGT 410Homework Set 3Provide a short answer to each of the f
MGT 410Homework Set 3Provide a short answer to each of the f
 
Dutch Industry Experience - Presentatie Jan Baan
Dutch Industry Experience - Presentatie Jan BaanDutch Industry Experience - Presentatie Jan Baan
Dutch Industry Experience - Presentatie Jan Baan
 
Machine Learning for Finance Master Class
Machine Learning for Finance Master Class Machine Learning for Finance Master Class
Machine Learning for Finance Master Class
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 

Plus de Jácome Cunha

Energy Efficiency Across 
Programming Languages
Energy Efficiency Across 
Programming LanguagesEnergy Efficiency Across 
Programming Languages
Energy Efficiency Across 
Programming LanguagesJácome Cunha
 
Explaining Spreadsheets with Spreadsheets
Explaining Spreadsheets with SpreadsheetsExplaining Spreadsheets with Spreadsheets
Explaining Spreadsheets with SpreadsheetsJácome Cunha
 
Automatically Inferring ClassSheet Models from Spreadsheets
Automatically Inferring ClassSheet Models from SpreadsheetsAutomatically Inferring ClassSheet Models from Spreadsheets
Automatically Inferring ClassSheet Models from SpreadsheetsJácome Cunha
 
On Understanding Data Scientists
On Understanding  Data ScientistsOn Understanding  Data Scientists
On Understanding Data ScientistsJácome Cunha
 
Systematic Spreadsheet Construction Processes @ VL/HCC 2017
Systematic Spreadsheet Construction Processes @ VL/HCC 2017Systematic Spreadsheet Construction Processes @ VL/HCC 2017
Systematic Spreadsheet Construction Processes @ VL/HCC 2017Jácome Cunha
 
jStanley: Placing a Green Thumb on Java Collections
jStanley: Placing a Green Thumb on  Java CollectionsjStanley: Placing a Green Thumb on  Java Collections
jStanley: Placing a Green Thumb on Java CollectionsJácome Cunha
 
Type-Safe Evolution of 
Web Services
Type-Safe Evolution of 
Web ServicesType-Safe Evolution of 
Web Services
Type-Safe Evolution of 
Web ServicesJácome Cunha
 
MDSheet – Model-Driven Spreadsheets
MDSheet – Model-Driven SpreadsheetsMDSheet – Model-Driven Spreadsheets
MDSheet – Model-Driven SpreadsheetsJácome Cunha
 

Plus de Jácome Cunha (12)

Energy Efficiency Across 
Programming Languages
Energy Efficiency Across 
Programming LanguagesEnergy Efficiency Across 
Programming Languages
Energy Efficiency Across 
Programming Languages
 
LMCC - 30 Anos
LMCC - 30 AnosLMCC - 30 Anos
LMCC - 30 Anos
 
Explaining Spreadsheets with Spreadsheets
Explaining Spreadsheets with SpreadsheetsExplaining Spreadsheets with Spreadsheets
Explaining Spreadsheets with Spreadsheets
 
Automatically Inferring ClassSheet Models from Spreadsheets
Automatically Inferring ClassSheet Models from SpreadsheetsAutomatically Inferring ClassSheet Models from Spreadsheets
Automatically Inferring ClassSheet Models from Spreadsheets
 
On Understanding Data Scientists
On Understanding  Data ScientistsOn Understanding  Data Scientists
On Understanding Data Scientists
 
Systematic Spreadsheet Construction Processes @ VL/HCC 2017
Systematic Spreadsheet Construction Processes @ VL/HCC 2017Systematic Spreadsheet Construction Processes @ VL/HCC 2017
Systematic Spreadsheet Construction Processes @ VL/HCC 2017
 
jStanley: Placing a Green Thumb on Java Collections
jStanley: Placing a Green Thumb on  Java CollectionsjStanley: Placing a Green Thumb on  Java Collections
jStanley: Placing a Green Thumb on Java Collections
 
Type-Safe Evolution of 
Web Services
Type-Safe Evolution of 
Web ServicesType-Safe Evolution of 
Web Services
Type-Safe Evolution of 
Web Services
 
MDSheet – Model-Driven Spreadsheets
MDSheet – Model-Driven SpreadsheetsMDSheet – Model-Driven Spreadsheets
MDSheet – Model-Driven Spreadsheets
 
Talk
TalkTalk
Talk
 
Talk at IS-EUD '11
Talk at IS-EUD '11Talk at IS-EUD '11
Talk at IS-EUD '11
 
Talk at VL/HCC '11
Talk at VL/HCC '11Talk at VL/HCC '11
Talk at VL/HCC '11
 

Dernier

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Dernier (20)

Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Spreadsheet Engineering @ OSU - EECS Colloquium - 02/24/14

  • 1. Spreadsheet Engineering Jácome Cunha jacome@di.uminho.pt www.di.uminho.pt/~jacome Researcher @ HASLab / INESC TEC & Universidade do Minho, Portugal Oregon State University (office 2077) OSU - EECS Colloquium - 02/24/14
  • 2. Agenda I. Motivation II. Spreadsheets Meet Models III. Models for Spreadsheets – ClassSheets IV. Inferring ClassSheets V. Embedding ClassSheets VI. Evolution! VII. Model-Driven Spreadsheets VIII. Summary 2
  • 4. 4
  • 5. Why do Spreadsheets matter? Financial intelligence firm CODA reports that 95% of all U.S. firms use spreadsheets for financial reporting. Sarbanes-Oxley: What About all the Spreadsheets?, Raymond R. Panko and Nicholas Ordway, 2008 5
  • 6. Why do Spreadsheets matter? They are the programming language of choice by nonprofessional programmers, a.k.a. end users. In the U.S. alone, the number of end-user programmers is conservatively estimated at 11 million, compared to only 2.75 million other, professional programmers. Estimating the numbers of end users and end-user programmers, Christopher Scaffidi, Mary Shaw, and Brad Myers, VL/HCC 2005 6
  • 7. 7
  • 8. Why do Spreadsheets matter? In 2004, RevenueRecognition.com (now Softtrax) had the International Data Corporation interview 118 business leaders. IDC found that 85% were using spreadsheets in financial reporting and forecasting. Sarbanes-Oxley: What About all the Spreadsheets?, Raymond R. Panko and Nicholas Ordway, 2008 8
  • 9. In fact, spreadsheets lack: ● Abstraction ● ● Encapsulation Type system ● Testing ● ● IDE ... 9
  • 10. And the consequences may be... Around 200 people who thought their only experience of the London 2012 Olympic Games would be minor heats of synchronised swimming have received an unexpected upgrade to the men’s 100m final following an embarrassing ticketing mistake. ... Locog said the error occurred in the summer, between the first and second round of ticket sales, when a member of staff made a single keystroke mistake and entered ‘20,000’ into a spreadsheet rather than the correct figure of 10,000 remaining tickets. The Telegraph, 04 January 2012 10
  • 11. And the consequences may be... In a 2010 paper* Carmen Reinhart, now a professor at Harvard Kennedy School, and Kenneth Rogoff, an economist at Harvard University...argued that GDP growth slows to a snail’s pace once government-debt levels exceed 90% of GDP. The 90% figure quickly became ammunition in political arguments over austerity...This week a new piece of research poured fuel on the fire by calling the 90% finding into question.. Economy losses of $10 billion/year! The Economist, 17 April 2013 Harvard University economists Carmen Reinhart and Kenneth Rogoff have acknowledged making a spreadsheet calculation mistake in a 2010 research paper, “Growth in a Time of Debt”, which has been widely cited to justify budgetcutting. Business Week, 18 April 2013 11
  • 12. II. Spreadsheets Meet Models [PEPM'09, VL/HCC'10] 12
  • 16. Functional Dependencies ● ● ● We compute the business logic from the data, by inferring FDs They are the building blocks inferring models for (legacy) spreadsheets The better the FDs we infer, the better the model we compute! 16
  • 17. Too Many?? ["A"] -> ["B","C","D","E","F"] ["C"] -> ["A","B","D","E","F"] ["D"] -> ["A","B","C","E","F"] ["E"] -> ["A","B","C","D","F"] ["F"] -> ["A","B","C","D","E"] ["G"] -> ["H","I","J"] ["H"] -> ["G","I","J"] ["I"] -> ["G","H","J"] ["J"] -> ["G","H","I"] ["K"] -> ["L","M"] ["L"] -> ["K","M"] ["M"] -> ["K","L"] ["B","K"] -> ["A","C","D","E","F"] ["B","L"] -> ["A","C","D","E","F"] ["B","M"] -> ["A","C","D","E","F"] 17
  • 18. Accidents happen ● ● We use a data mining algorithm which produces to many accidental FDs! We introduce some spreadsheet specific heuristics to filter out “accidental” FDs 18
  • 19. Organize them ● ● Label semantics: often keys are labeled “code” or “id” Label arrangement: we prefer FDs respecting the order of columns ● Antecedent size: small keys are preferable ● Ratio: small ratio between keys and non-keys ● Single value columns: columns always with the same value appear in too many FDs 19
  • 20. Final set Pilot-Id → Pilot-Name, Phone N-Number → Model, Plane-Name Pilot-Id, N-Number, Depart, Destination, Date, Hours → {} 20
  • 21. The first model: a relational model Having computed the FDs, we can now use the FUN algorithm to produce a relational model for the spreadsheet: Pilots (Pilot-Id, Pilot-Name, Phone) Planes (N-Number, Model, Plane-Name) <Flights> (#Pilot-Id,# N-Number, Depart, Destination, Date Hours) 21
  • 22. III. Models for Spreadsheet – ClassSheets Engels and Erwig ASE'05 22
  • 23. ClassSheets - Models for Spreadsheets ClassSheets are a high-level, object-oriented formalism to specify spreadsheets 23
  • 24. ClassSheets - Models for Spreadsheets 24
  • 25. ClassSheets - Models for Spreadsheets 25
  • 30. Embedding... ● Embedding a language into another language is a recurring strategy (e.g. for DSLs) – – Users are used to the host language and do not need to learn a (complete) new language :-) – Implementation effort is much reduced :-) – ● Embedded language inherit all the power of the host language :-) It may have some restrictions :-( We embedded ClassSheets in traditional spreadsheet systems 30
  • 34. 34
  • 36. Why do Spreadsheet Models Need Evolution? ● ● Suppose now you need to add new information to the spreadsheet For instance, the number of passengers of each flight ● It would require to do several error-prone tasks ● Add columns, labels, update formulas, etc. ● We can do it automatically! 36
  • 37. Why do Spreadsheet Instances Need Evolution? ● ● ● Some evolution steps are easier to perform on the instance For instance, to add a column to one of the repetition blocks People felt the need to evolve the data 37
  • 38. 38
  • 40. (Data) Operations on Instances 40
  • 41. (Model) Operations on ClassSheets 41
  • 43. Compositional Example: Add a Column and a Class 43
  • 44. VII. MDSheet – Model-Driven Spreadsheets [ICSE'12] 44
  • 46. ● Available at http://ssaapp.di.uminho.pt ● Built out of 7886 LOC: – 3181 in Haskell, for the inference and evolution – 980 in Basic, for the embedding – 2884 in C++, for gluing all components – 340 in Perl, for compilation and setup – 722, for makefiles 46
  • 48. Acknowledgments This work has been done in collaboration with many people: Martin Erwig, João Paulo Fernandes, Jorge Mendes, Hugo Pacheco, Rui Pereira, João Saraiva, Joost Visser 48
  • 50. More? ● More at http://ssaapp.di.uminho.pt ● Querying model-driven spreadsheet ● Visually querying model-driven spreadsheets ● Detections of bad smells ● Edit assistance ● Empirical validations ● Variational spreadsheets (@ OSU) ● ... 50
  • 52. Empirical Study Settings ● 17 student from a MSc course ● 2 different spreadsheets – Microsoft budget – Local company responsible for water supply of Braga, Portugal - agere 52
  • 53. Study Setting ● Hypotheses: (1) In order to perform a given set of tasks, users spend less time when using modeldriven spreadsheets instead of plain ones. (2) Spreadsheets developed in the modeldriven environment hold less errors than plain ones. 53
  • 54. Main Results Number of tasks performed on the MS spreadsheet 54
  • 55. Main Results Error rate in the budget spreadsheet 55

Notes de l'éditeur

  1. Spreadsheet are widely used By everyone By many companies
  2. They are grate Lots of features
  3. Intended to be used by trained person Professional on Spreadsheet models/ClassSheets Not by instance/data end users
  4. This generated spreadsheet guides users in introducing correct data The spreadsheet includes mechanisms that guarantee that the spreadsheet data always conforms to the model after an user update
  5. Baseado em haskell Integrado no OO Clicanca-se em botoes Espetacular Basic