SlideShare une entreprise Scribd logo
1  sur  104
Winter 2015: Session #2
Programming on the Whiteboard
(Sarah Kremen-Hicks & Brian Gutierrez)
Previously, at DMDH...
• The work of creating usable data
• Forms that this data might take:
• markup language
• spreadsheets
Workshop #2
• Caveat Curator (challenges of working with
data)
• Programming on the whiteboard, i.e.,
conceptualizing the specific steps that you
need to take to accomplish your goals
Why this focus on data?
• Understanding your data, and your
intended actions, is a key skill for working
with any programming language or
platform.
• This is true whether you are the
programmer or whether you are working
with professional programmers.
Programming languages
are like human
languages in that they
both have phrases,
patterns, and rules.
Programming languages
are unlike human
languages in that they
aren’t for communicating
with people.
They are also unlike human
languages in that every
programming utterance
does something, i.e., causes
an action to occur.
You can get used to
patterns – even
unfamiliar ones.
The shift is in getting
used to thinking in
terms of every single
action.
Our subject matter today is all
actions that you’ll need to
think about before you work
with...
Image: Josh Lee, @wtrsld, via Twitter, January 2014.
Even when you’re just
experimenting, you need to
prep your data.
You may know your dataset
in detail already, from your
research -- but your
computer is concerned with
different levels of detail.
Becoming aware of those levels
of detail is not only helpful for
your project ideas...
...it’s also a useful skill for
working with programming
languages.
(where a stray /> or ; can break your program/website)
Caveat Curator
Data only works if your
computer can read it.
But my data is just text!
(Isn’t that easy?)
(Remember, your computer is
fairly stupid).
Formatted text
is often full of
text your
computer can’t
parse correctly.
The┘re┘sÜlt ís that yoÜr te┘xt
might come┘ oÜt looking
like┘this
whe┘n yoÜ ope┘n it in a
programming e┘nvironme┘nt.
So you need to
convert it to
plain text.
(without any of the fancy details
encoded in MS Word fonts.)
But even that can produce
unexpected errors.
Maybe you want to work with
sailing data and ports of call:
The ship you’re interested in
leaves the Ivory Coast for St.
Helena...
But when you create your map,
you get this:
The latitude/longitude
coordinate is the significant
datum.
The city name is just the
human-readable component.
Each datum needs to be
unique.
Figuring out what sort of
unique configuration will
work best involves at
least some
experimentation.
To experiment effectively, you’ll
want to keep careful records.
If you develop categories of
information, you’ll want to
keep a record of what each
category means, and what
its limits are.
Cleaning and structuring your
data is a foundation issue that
changes, depending on the
available format of your data.
What if your data is
crowdsourced?
You can require a particular
format for submissions
You can even put
programmatic limits on the
formats available for
submission
But in the end, you’re still going
to need to scrub and/or
format.
This is true even for data
from supposedly reputable
sources, like government or
media organizations.
Example: Doctor WhoVillains
dataset
http://tinyurl.com/doctorwhovil
lains
This step is no fun!
But it’s absolutely necessary.
Break!
Working with multiple types of
data:
GIS and the Spatial Turn
GIS technology has paved the
way for the analyzing qualitative
data associated with cultural
experiences
“A good map is worth a thousand words,
cartographers say, and they are right: because
it produces a thousand words: it raises doubts,
ideas. It poses new questions, and forces you
to look for new answers.”
(Moretti 1998, 3–4)
Literary texts are filled with
subjective spatial data: an
author or character's
articulation of geographically
located dwellings, urban and
rural landscapes, as well as
performance spaces
Project: Mapping William
Wordsworth's Conspicuous
Consumption in The Prelude
(Brian R. Gutierrez)
Objective: to map the visual culture
events referenced in Wordsworth’s
autobiographical poem The Prelude (as
well as the ones not referenced)
Problem to solve: Prove that literary
galleries, specifically Joseph Boydell’s
“Shakespeare Gallery” shaped the
dramaturgical choices in the only play
written by Wordsworth. He reads
Shakespeare not through a personal copy
of the play, but through the visual and
performative texts at that time
Data: place-names, indirect
references, and all non-
referenced visual cultural
events
Access to data: Project
Gutenberg, digital archive of
British newspapers and
periodicals
What to do with that data?
Map it!!
First data set:
Literary spatial articulations
Wordsworth mentions these following place
names and references:
"Oh wonderous power of words, how sweet  they are
 / According to the meaning which they bring-- /
Vauxhall and Ranelagh, I then had heard / Of your green
groves and wilderness of lamps, /Your gorgeous ladies,
fairy cataracts,And pageant fireworks"  (119-125)
"Half-rural Sadler's Wells" (267)
First, I need to know what and
where these places were in
order to identify them as
spatial data
Ex:Vauxhall and Ranelagh
Second, if I'm interested in
visual cultural experiences, I
need to identify what kind of
event occurred there: galley
play, etc.
Third, how would I access the data?
Answer: place-names in a book are not
under any copyright.  
However, if I wanted to include sections
from the text when a viewer would click
on that place name then I would have to
think about copyright, but it's on PG, so
that's covered.
Fourth, I would have to locate any indirect
reference to visual cultural phenomena.
Ex:Wordsworth mentions two actresses by
name Mary Robinson and Sarah Siddons.
Since I cannot map a person, I need to
investigate which plays they were in and at which
theaters during that moment of his life (it's an
autobiography)
Fifth, I need to research what special
events were occurring at other places
he mentions. For that, I look to The
Times (newspapers) and various
periodicals.
Sixth, because I going to create
a map, using ArcGIS, I need to
put my data in an excel
spreadsheet so that it can be
read by the program.
What is the relationship
between the data?
Analyze the qualitative data
Humanist skill=
Dhumanist skill
Programming on the
whiteboard involves looking at
the categories of information,
and thinking about how they
interact.
Categories
• Place names
• Poetic lines
• Genre of visual/cultural event
• Spatial data (latitude/longitude)
Return to the source of original
data—the literary text—to
examine how the author is
describing these phenomena
Why use ArcGIS?
Benefits of ArcGIS
• It allows the overlay of historical maps
• Trainings were available and accessible
(through DHSI and UW courses)
• As a software program,ArcGIS is
established enough to be considered robust
• Available through the UW software suite
Disadvantages of ArcGIS
• Available only for PCs
• Proprietary file format (even if input data is
open-access, the end result is not)
• Available only on an annual subscription
model (and prohibitively expensive for
scholars without campus-granted access)
In Franco Moretti’s Atlas of the
European Novel 1800-1900
(1998), he calls for a “literary
geography,” predicated on the
creation of “readerly maps”
and the use of those maps as
analytical tools.
Caveats?
The pursuit of mapping data
may exclude complex social
spaces (e.g., gender domestic
environments)
Caveats?
Cartographical representations
should not be divorced from
their primary texts
Break!
Project:Visualizing Prosody
(Sarah Kremen-Hicks)
x / |x /|xx / | x / |x /
Sir Walter Vivian all a summer's day
/ x | / x | x / | x / | x /
Gave his broad lawns until the set of sun
Marking up a poem for
metrical scansion is encoding it
with data.
What can a computer do with
that data?
Computers are good at
counting things – like iambs.
Is it possible to predict
deviations from a metrical
norm based on author or lyric
classification?
Will authors show a tendency
for particular types of metrical
substitution?
Prepping the Data
• For proof of concept, start with one author
(Alfred, LordTennyson)
• Get Tennyson’s poems from Project
Gutenberg
• Hand-mark representative poems for
prosody
Programming on the Whiteboard
What should the
computer do?
Computer tasks• Count feet per line
• Recognize | as a foot boundary
• Recognize carriage return as a line boundary
• Supply foot boundaries at beginning/end of
lines
• Count the number of areas contained within
foot boundaries for each line
These steps involve recognizing
each metrical foot as units that
contain particular accentual-
syllabic data.
x / |x /|xx / | x / |x /
Sir WalterVivian all a summer's day
Computer tasks, cont’d.
• Identify the most common number of feet
per line
• Supply a report on lines (by number) that
deviate
• Calculate rate of deviation/adherence
• Mode = paradigm
After recognizing the foot as a
unit, the computer can calculate
what patterns of data each foot
contains.
Computer tasks, cont’d.
• Identify the most common foot type
• Identify markings within foot boundaries
• Compare markings to foot dictionary to
identify type
These tasks identify each line
as a unit composed of one or
more feet.
x / |x /|xx / | x / |x /
Sir WalterVivian all a summer's day
(iambic pentameter with third foot anapestic
substitution)
Still more computing tasks!
• Identify the most common foot type within
a poem
• Supply a report on feet (by line and foot
number) that deviate
• Calculate rate of deviation/adherence
• Mode = paradigm
Just as the feet contain
patterns, the lines contain
patterns that can be analyzed
as well.
Still more computing tasks!
• Report on types of deviations arranged by
most to least common
• Information should include location
(line/foot number), as well as prevalence of
substitution type
Deviations and their placement
within each line and each poem
should display certain patterns
unique to each author (I hope!)
Current status: I’m investigating
using the Natural Language
Toolkit to tokenize each foot;
and to establish syllables, feet,
and lines as a unique hierarchy.
ApplicableValues
•Iterative development
•Failure as valuable
•Collaboration
If you are thinking about your
data, and the tasks that you
need to accomplish, then it’s
easier to determine what sort
of language or platform your
project needs.
There are countless tutorials,
online courses, etc., for almost
any programming language or
platform.
(We’re giving you a cheat sheet,
too; and http://www.dmdh.org is
your friend. So is Google.)
Learning them can be a slow
process, especially at first.
However, knowing what tasks
you’re working towards makes
it easier to understand the
purpose of the introductory
lessons.
It’s also easy to think about
how the first rules you learn
for any language or platform
might affect your goals.
And now, it’s your turn...
For this activity, we
recommend that you pair up,
or form small groups to work
together.
Group Activity
• What do you need to do with your data?
• What units might that data exist in?
• What categories do you need to create?
• What relationships need to exist between
the units and categories?
Upcoming Workshops!
• Crash Course on R: Feb 4, 12:30-2:00
(location TBD)
• SpringWorkshops on Project Ideation and
Development:April 11th and April 25th
DMDH content is developed by Paige Morgan,
Sarah Kremen-Hicks, and Brian Gutierrez, with
generous support from the Simpson Center for
the Humanities at the University of Washington.
Content is available under a
Creative Commons Attribution-NonCommercial
3.0 Unported License.
Please contact Sarah at sarahkh@uw.edu with
questions.

Contenu connexe

Similaire à Dmdh winter 2015 session #2

Domain-Driven Design at ZendCon 2012
Domain-Driven Design at ZendCon 2012Domain-Driven Design at ZendCon 2012
Domain-Driven Design at ZendCon 2012Bradley Holt
 
Can functional programming be liberated from static typing?
Can functional programming be liberated from static typing?Can functional programming be liberated from static typing?
Can functional programming be liberated from static typing?Vsevolod Dyomkin
 
Phase III Presentation
Phase III PresentationPhase III Presentation
Phase III PresentationGrey Vaisius
 
Introduction to information visualisation for humanities PhDs
Introduction to information visualisation for humanities PhDsIntroduction to information visualisation for humanities PhDs
Introduction to information visualisation for humanities PhDsMia
 
N8_R_for_Text_Analysis_Slides.pptx
N8_R_for_Text_Analysis_Slides.pptxN8_R_for_Text_Analysis_Slides.pptx
N8_R_for_Text_Analysis_Slides.pptxNafisa Vaz
 
Beyond the Black Box: Data Visualisation
Beyond the Black Box: Data VisualisationBeyond the Black Box: Data Visualisation
Beyond the Black Box: Data VisualisationMia
 
Come with an idea - go home with a web map: Tools for sharing maps and vector...
Come with an idea - go home with a web map: Tools for sharing maps and vector...Come with an idea - go home with a web map: Tools for sharing maps and vector...
Come with an idea - go home with a web map: Tools for sharing maps and vector...Stefan Keller
 
Domain-Driven Design
Domain-Driven DesignDomain-Driven Design
Domain-Driven DesignBradley Holt
 
20110324 linked openeuropeanahumanities
20110324 linked openeuropeanahumanities20110324 linked openeuropeanahumanities
20110324 linked openeuropeanahumanitiesStefan Gradmann
 
Linked Open Europeana: Semantics for the Citizen
Linked Open Europeana: Semantics for the CitizenLinked Open Europeana: Semantics for the Citizen
Linked Open Europeana: Semantics for the CitizenStefan Gradmann
 
Platforms and the Semantic Web
Platforms and the Semantic WebPlatforms and the Semantic Web
Platforms and the Semantic WebDanny Ayers
 
Design Patterns for Future Content
Design Patterns for Future Content Design Patterns for Future Content
Design Patterns for Future Content Don Day
 
Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Cornelius Puschmann
 
BL Demo Day - July2011 - (9) IMPACT Interoperability and Evaluation Framework
BL Demo Day - July2011 - (9) IMPACT Interoperability and Evaluation FrameworkBL Demo Day - July2011 - (9) IMPACT Interoperability and Evaluation Framework
BL Demo Day - July2011 - (9) IMPACT Interoperability and Evaluation FrameworkIMPACT Centre of Competence
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poliivascucristian
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?Samet KILICTAS
 

Similaire à Dmdh winter 2015 session #2 (20)

Digital humanities
Digital humanitiesDigital humanities
Digital humanities
 
Domain-Driven Design at ZendCon 2012
Domain-Driven Design at ZendCon 2012Domain-Driven Design at ZendCon 2012
Domain-Driven Design at ZendCon 2012
 
Can functional programming be liberated from static typing?
Can functional programming be liberated from static typing?Can functional programming be liberated from static typing?
Can functional programming be liberated from static typing?
 
Phase III Presentation
Phase III PresentationPhase III Presentation
Phase III Presentation
 
Introduction to information visualisation for humanities PhDs
Introduction to information visualisation for humanities PhDsIntroduction to information visualisation for humanities PhDs
Introduction to information visualisation for humanities PhDs
 
N8_R_for_Text_Analysis_Slides.pptx
N8_R_for_Text_Analysis_Slides.pptxN8_R_for_Text_Analysis_Slides.pptx
N8_R_for_Text_Analysis_Slides.pptx
 
Beyond the Black Box: Data Visualisation
Beyond the Black Box: Data VisualisationBeyond the Black Box: Data Visualisation
Beyond the Black Box: Data Visualisation
 
Come with an idea - go home with a web map: Tools for sharing maps and vector...
Come with an idea - go home with a web map: Tools for sharing maps and vector...Come with an idea - go home with a web map: Tools for sharing maps and vector...
Come with an idea - go home with a web map: Tools for sharing maps and vector...
 
Domain-Driven Design
Domain-Driven DesignDomain-Driven Design
Domain-Driven Design
 
20110324 linked openeuropeanahumanities
20110324 linked openeuropeanahumanities20110324 linked openeuropeanahumanities
20110324 linked openeuropeanahumanities
 
Linked Open Europeana: Semantics for the Citizen
Linked Open Europeana: Semantics for the CitizenLinked Open Europeana: Semantics for the Citizen
Linked Open Europeana: Semantics for the Citizen
 
Bne impact iif
Bne impact iifBne impact iif
Bne impact iif
 
Digital Humanities Workshop
Digital Humanities WorkshopDigital Humanities Workshop
Digital Humanities Workshop
 
co:op-READ-Convention Marburg - Daniel Shakespeare
co:op-READ-Convention Marburg - Daniel Shakespeareco:op-READ-Convention Marburg - Daniel Shakespeare
co:op-READ-Convention Marburg - Daniel Shakespeare
 
Platforms and the Semantic Web
Platforms and the Semantic WebPlatforms and the Semantic Web
Platforms and the Semantic Web
 
Design Patterns for Future Content
Design Patterns for Future Content Design Patterns for Future Content
Design Patterns for Future Content
 
Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)Corpora, Blogs and Linguistic Variation (Paderborn)
Corpora, Blogs and Linguistic Variation (Paderborn)
 
BL Demo Day - July2011 - (9) IMPACT Interoperability and Evaluation Framework
BL Demo Day - July2011 - (9) IMPACT Interoperability and Evaluation FrameworkBL Demo Day - July2011 - (9) IMPACT Interoperability and Evaluation Framework
BL Demo Day - July2011 - (9) IMPACT Interoperability and Evaluation Framework
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
 
How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
 

Dernier

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Dernier (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Dmdh winter 2015 session #2

  • 1. Winter 2015: Session #2 Programming on the Whiteboard (Sarah Kremen-Hicks & Brian Gutierrez)
  • 2. Previously, at DMDH... • The work of creating usable data • Forms that this data might take: • markup language • spreadsheets
  • 3. Workshop #2 • Caveat Curator (challenges of working with data) • Programming on the whiteboard, i.e., conceptualizing the specific steps that you need to take to accomplish your goals
  • 4. Why this focus on data? • Understanding your data, and your intended actions, is a key skill for working with any programming language or platform. • This is true whether you are the programmer or whether you are working with professional programmers.
  • 5. Programming languages are like human languages in that they both have phrases, patterns, and rules.
  • 6. Programming languages are unlike human languages in that they aren’t for communicating with people.
  • 7. They are also unlike human languages in that every programming utterance does something, i.e., causes an action to occur.
  • 8. You can get used to patterns – even unfamiliar ones.
  • 9. The shift is in getting used to thinking in terms of every single action.
  • 10. Our subject matter today is all actions that you’ll need to think about before you work with...
  • 11. Image: Josh Lee, @wtrsld, via Twitter, January 2014.
  • 12. Even when you’re just experimenting, you need to prep your data.
  • 13. You may know your dataset in detail already, from your research -- but your computer is concerned with different levels of detail.
  • 14. Becoming aware of those levels of detail is not only helpful for your project ideas...
  • 15. ...it’s also a useful skill for working with programming languages. (where a stray /> or ; can break your program/website)
  • 17. Data only works if your computer can read it.
  • 18. But my data is just text! (Isn’t that easy?)
  • 19. (Remember, your computer is fairly stupid).
  • 20. Formatted text is often full of text your computer can’t parse correctly.
  • 21. The┘re┘sÜlt ís that yoÜr te┘xt might come┘ oÜt looking like┘this whe┘n yoÜ ope┘n it in a programming e┘nvironme┘nt.
  • 22. So you need to convert it to plain text. (without any of the fancy details encoded in MS Word fonts.)
  • 23. But even that can produce unexpected errors.
  • 24. Maybe you want to work with sailing data and ports of call:
  • 25. The ship you’re interested in leaves the Ivory Coast for St. Helena...
  • 26.
  • 27. But when you create your map, you get this:
  • 28.
  • 29. The latitude/longitude coordinate is the significant datum.
  • 30. The city name is just the human-readable component.
  • 31. Each datum needs to be unique.
  • 32. Figuring out what sort of unique configuration will work best involves at least some experimentation.
  • 33. To experiment effectively, you’ll want to keep careful records.
  • 34. If you develop categories of information, you’ll want to keep a record of what each category means, and what its limits are.
  • 35. Cleaning and structuring your data is a foundation issue that changes, depending on the available format of your data.
  • 36. What if your data is crowdsourced?
  • 37. You can require a particular format for submissions
  • 38. You can even put programmatic limits on the formats available for submission
  • 39. But in the end, you’re still going to need to scrub and/or format.
  • 40. This is true even for data from supposedly reputable sources, like government or media organizations.
  • 42. This step is no fun!
  • 43. But it’s absolutely necessary.
  • 45. Working with multiple types of data: GIS and the Spatial Turn
  • 46. GIS technology has paved the way for the analyzing qualitative data associated with cultural experiences
  • 47. “A good map is worth a thousand words, cartographers say, and they are right: because it produces a thousand words: it raises doubts, ideas. It poses new questions, and forces you to look for new answers.” (Moretti 1998, 3–4)
  • 48. Literary texts are filled with subjective spatial data: an author or character's articulation of geographically located dwellings, urban and rural landscapes, as well as performance spaces
  • 49. Project: Mapping William Wordsworth's Conspicuous Consumption in The Prelude (Brian R. Gutierrez)
  • 50. Objective: to map the visual culture events referenced in Wordsworth’s autobiographical poem The Prelude (as well as the ones not referenced)
  • 51. Problem to solve: Prove that literary galleries, specifically Joseph Boydell’s “Shakespeare Gallery” shaped the dramaturgical choices in the only play written by Wordsworth. He reads Shakespeare not through a personal copy of the play, but through the visual and performative texts at that time
  • 52. Data: place-names, indirect references, and all non- referenced visual cultural events
  • 53. Access to data: Project Gutenberg, digital archive of British newspapers and periodicals
  • 54. What to do with that data? Map it!!
  • 55. First data set: Literary spatial articulations
  • 56. Wordsworth mentions these following place names and references: "Oh wonderous power of words, how sweet  they are  / According to the meaning which they bring-- / Vauxhall and Ranelagh, I then had heard / Of your green groves and wilderness of lamps, /Your gorgeous ladies, fairy cataracts,And pageant fireworks"  (119-125) "Half-rural Sadler's Wells" (267)
  • 57. First, I need to know what and where these places were in order to identify them as spatial data Ex:Vauxhall and Ranelagh
  • 58. Second, if I'm interested in visual cultural experiences, I need to identify what kind of event occurred there: galley play, etc.
  • 59. Third, how would I access the data? Answer: place-names in a book are not under any copyright.   However, if I wanted to include sections from the text when a viewer would click on that place name then I would have to think about copyright, but it's on PG, so that's covered.
  • 60. Fourth, I would have to locate any indirect reference to visual cultural phenomena. Ex:Wordsworth mentions two actresses by name Mary Robinson and Sarah Siddons. Since I cannot map a person, I need to investigate which plays they were in and at which theaters during that moment of his life (it's an autobiography)
  • 61. Fifth, I need to research what special events were occurring at other places he mentions. For that, I look to The Times (newspapers) and various periodicals.
  • 62. Sixth, because I going to create a map, using ArcGIS, I need to put my data in an excel spreadsheet so that it can be read by the program.
  • 63.
  • 64. What is the relationship between the data?
  • 65. Analyze the qualitative data Humanist skill= Dhumanist skill
  • 66. Programming on the whiteboard involves looking at the categories of information, and thinking about how they interact.
  • 67. Categories • Place names • Poetic lines • Genre of visual/cultural event • Spatial data (latitude/longitude)
  • 68. Return to the source of original data—the literary text—to examine how the author is describing these phenomena
  • 70. Benefits of ArcGIS • It allows the overlay of historical maps • Trainings were available and accessible (through DHSI and UW courses) • As a software program,ArcGIS is established enough to be considered robust • Available through the UW software suite
  • 71. Disadvantages of ArcGIS • Available only for PCs • Proprietary file format (even if input data is open-access, the end result is not) • Available only on an annual subscription model (and prohibitively expensive for scholars without campus-granted access)
  • 72. In Franco Moretti’s Atlas of the European Novel 1800-1900 (1998), he calls for a “literary geography,” predicated on the creation of “readerly maps” and the use of those maps as analytical tools.
  • 73. Caveats? The pursuit of mapping data may exclude complex social spaces (e.g., gender domestic environments)
  • 74. Caveats? Cartographical representations should not be divorced from their primary texts
  • 76. Project:Visualizing Prosody (Sarah Kremen-Hicks) x / |x /|xx / | x / |x / Sir Walter Vivian all a summer's day / x | / x | x / | x / | x / Gave his broad lawns until the set of sun
  • 77. Marking up a poem for metrical scansion is encoding it with data. What can a computer do with that data?
  • 78. Computers are good at counting things – like iambs.
  • 79. Is it possible to predict deviations from a metrical norm based on author or lyric classification?
  • 80. Will authors show a tendency for particular types of metrical substitution?
  • 81. Prepping the Data • For proof of concept, start with one author (Alfred, LordTennyson) • Get Tennyson’s poems from Project Gutenberg • Hand-mark representative poems for prosody
  • 82. Programming on the Whiteboard What should the computer do?
  • 83. Computer tasks• Count feet per line • Recognize | as a foot boundary • Recognize carriage return as a line boundary • Supply foot boundaries at beginning/end of lines • Count the number of areas contained within foot boundaries for each line
  • 84. These steps involve recognizing each metrical foot as units that contain particular accentual- syllabic data. x / |x /|xx / | x / |x / Sir WalterVivian all a summer's day
  • 85. Computer tasks, cont’d. • Identify the most common number of feet per line • Supply a report on lines (by number) that deviate • Calculate rate of deviation/adherence • Mode = paradigm
  • 86. After recognizing the foot as a unit, the computer can calculate what patterns of data each foot contains.
  • 87. Computer tasks, cont’d. • Identify the most common foot type • Identify markings within foot boundaries • Compare markings to foot dictionary to identify type
  • 88. These tasks identify each line as a unit composed of one or more feet. x / |x /|xx / | x / |x / Sir WalterVivian all a summer's day (iambic pentameter with third foot anapestic substitution)
  • 89. Still more computing tasks! • Identify the most common foot type within a poem • Supply a report on feet (by line and foot number) that deviate • Calculate rate of deviation/adherence • Mode = paradigm
  • 90. Just as the feet contain patterns, the lines contain patterns that can be analyzed as well.
  • 91. Still more computing tasks! • Report on types of deviations arranged by most to least common • Information should include location (line/foot number), as well as prevalence of substitution type
  • 92. Deviations and their placement within each line and each poem should display certain patterns unique to each author (I hope!)
  • 93. Current status: I’m investigating using the Natural Language Toolkit to tokenize each foot; and to establish syllables, feet, and lines as a unique hierarchy.
  • 95. If you are thinking about your data, and the tasks that you need to accomplish, then it’s easier to determine what sort of language or platform your project needs.
  • 96. There are countless tutorials, online courses, etc., for almost any programming language or platform. (We’re giving you a cheat sheet, too; and http://www.dmdh.org is your friend. So is Google.)
  • 97. Learning them can be a slow process, especially at first.
  • 98. However, knowing what tasks you’re working towards makes it easier to understand the purpose of the introductory lessons.
  • 99. It’s also easy to think about how the first rules you learn for any language or platform might affect your goals.
  • 100. And now, it’s your turn...
  • 101. For this activity, we recommend that you pair up, or form small groups to work together.
  • 102. Group Activity • What do you need to do with your data? • What units might that data exist in? • What categories do you need to create? • What relationships need to exist between the units and categories?
  • 103. Upcoming Workshops! • Crash Course on R: Feb 4, 12:30-2:00 (location TBD) • SpringWorkshops on Project Ideation and Development:April 11th and April 25th
  • 104. DMDH content is developed by Paige Morgan, Sarah Kremen-Hicks, and Brian Gutierrez, with generous support from the Simpson Center for the Humanities at the University of Washington. Content is available under a Creative Commons Attribution-NonCommercial 3.0 Unported License. Please contact Sarah at sarahkh@uw.edu with questions.