This document summarizes lessons learned from a collaborative project called One Week, One Tool that was conducted remotely. It discusses managing the transition to remote collaboration and sustaining collaboration from afar. It also lists various software libraries and content providers that were used in the project. The document concludes by providing some solutions for setting expectations, communication, problem-solving, and balancing sustainability and agility when collaborating remotely.
Time Series Foundation Models - current state and future directions
'Sustaining Collaboration from Afar' Lessons from One Week, One Tool
1. 'Sustaining Collaboration from Afar’
Lessons from One Week, One Tool
Mia Ridge @mia_out
Institute of Historical Research, Senate House, London
Sustainable History: Ensuring today's digital history survives
2.
3. ‘A project is not a project without
an end’
Sharon Leon at OWOT
26. Software libraries used
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Natural Language Toolkit, Apache2 license
Beautiful Soup, MIT License
Django, custom license?
Isotope, MIT license
DBpedia Spotlight, Apache v.2 with other small thing
Skeleton.css, MIT license
Requests, Apache2 license
bibs, GPLv3 license
libZotero, no license listed
Flickr API, Flickr license
PyYAML
dicttoxml, GPL v.2
xmltodict, MIT license
lxml
OAuth
simplejson, MIT License
27. Content providers
• International Cultural Heritage Aggregators
– Digital Public Library of America – http://dp.la
– 4.5 million items from US cultural heritage institutions
– Europeana – http://www.europeana.eu
– Approximately 15 million items from European collections
– Flickr Commons http://www.flickr.com/commons
– International collection of publicly held photos
– Trove - http://trove.nla.gov.au/
– 370 million “Australian and online” resources
28. Some solutions
• Set expectations about technical and
collaborative sustainability
• Use the richest possible communication
channels; document for transparency
• Create bite-sized problems that need solving;
support constructive procrastination
• Activity begets activity; so do deadlines
• Balance sustainability and agility in platforms
Firstly, thanks for having me to speak today.I'm Mia,I'm a PhD student in digital humanities at Open University. Before that I was a programmer and analyst at various museums.I'm going to talk about a tool created in one week over the summer, and how we've kept working on it after that week. At a technical level this is about sustaining a piece of software, but more broadly it's about creating and sustaining relationships and trust when we've all gone back to our everyday lives.I'll start by introducing the One Week One Tool concept, run through we did in that week, what happened afterwards, and what we learned about the sustainability of digital projects.Image: http://www.flickr.com/photos/14456531@N07/4688628004/ National Library of Scotlandfound through serendipomatic.
Initial expectations about sustainability - the expectation that the tool should be sustainable and sustained - were set in the description of the project on its one page website. 'During the week of Sunday July 28 – Saturday August 3, 2013 ... the group will conceive a tool, outline a roadmap, develop and disseminate an initial prototype, lay the ground work for building an open source community, and make first steps toward securing the project’s long-term sustainability.' But the type of person who applies and gets in as a Fellow is probably already busy with lots of projectsso a tension around realistic sustainability was therefore built in right from the start.When we arrived, we had no idea what we'd make. Tom S explained that the key metric was to make something useful and usable, which implies a known audience - but we had to figure out who they'd be first. There was pressure on us to have something to launch at the end of the week for real audiences to use. It was a fairly high profile institute so there was a feeling we'd look a bit daft if we didn't have something to show at the end of our time.
But there's also this, from Sharon right at the start. Projects need deadlines... We were encouraged to think of ourselves as the core open source dev team for this project who will need to think about how to give time to the project over the year. Plan to build something small and concrete that leaves room for other people to build on it.We also foundout we'd be back at CHNM one year afterwards, so one level the project scope expanded from one week to one year. Not clear what would happen afterwards - would it be up to the community? As funders, what were NEH's expectations about sustainability of the resource created? Not personally sure how that conversation happened.
There's no point holding back the surprise... We made a tool that takes a piece of content, extracts the key terms, runs them as queries against collections aggregators like Europeana, DPLA and Flickr Commons, and gives you back surprising results. We aimed to give you back something you weren’t expecting to see from in your text, whether it was an academic article, blog post, historic text, or even song lyrics.
So how did we make it? We weren’t exactly locked up in a tower the whole time, but it was a pretty intense week. Didn't know at start who was there, what we were making, let alone how we'd make it.We met up on the Sunday night and did a round of intros, but I assume I wasn’t the only one who struggled to remember everyone’s names and what they did.
Open brainstorm; Day 1.Bouncing around between succinct ideas like 'better timelines', detailed specific ideas from previous conversations, to bigger problem spaces. Bouncing up and down between detail, finding our way through different types of jargon, swapping between problem spaces. It felt about as messy as that board makes it look.
Trying to harness ideas into something buildable - tool x for audience y’s need z.Came up with a list of ideas, put them on a website for voting overnight and tweeted to ask people to vote on them and maybe tell us why they'd find them useful - we weren't going to take the results as binding but it would be useful input.
Day 2. Reducing ideas from 12 (?) to four... to one. Got stuck on the final reduction process.Picking an idea - needed to be technically, legally, socially feasible; something people wanted to work on.Decision before lunch. From then on, felt properly real to me.
I felt even better once drew things on a board. This was our technical architecture (and it still pretty much is). Some of my training is as a requirements engineer and I’ve spent a long time working as a business analyst, so I always think it’s important to create visual artefacts - something people can point at - to help make sure everyone’s thinking about the same thing in a conversation. So the first lesson is probably: make sure everyone knows what you’re trying to build, and how.
Once we'd settled on an idea, we split into teams - outreach, design and development, project managers. I was design and dev team lead. Had to make rapid technical decisions - quick audit of tech skills to decide build language and assign tasks. Some people learning new language - brave!Used whiteboards to manage tasks within the teams, also lots of running around between rooms, checking in with people, trying to balance making sure everyone was ok under the pressure with letting them get on with it.Comms: whiteboards. Post-it notes. GitHub commit comments. Google Docs for working with Outreach team.
Deployment decisions took longer - we wanted to build on CHNM services so that the server the software ran on would be maintained, even if our codebase wasn't. (Learnt that the hard way working in museums). But we'd chosen a language/framework (Python/Django) based on the skills in the room, but CHNM uses a different language (PHP), so had to set up their servers especially. None of us had experience with Amazon Web Services (AWS) that CHNM used so we were dependent on them to work through the setup issues. Rest of org not moving at our pace, had their own work as well as helping us. Dev/staging server was on Heroku because we could get up and running really quickly; CHNM's servers were built with AWS - these small differences in architecture add complexity. Was worried about the AWS server being built in time so had backup plan to go live on Heroku if necessary, only made the call to switch over the live domain the morning of launch.Often a tension in experimental or research projects - the platforms that support hacky work aren't usually supported by organisational IT teams - trade-off flexibility, latest code libraries for sustainability. At least they weren't a Windows-only system...
We used GitHubfor code version control, issue tracking, documentation - secret repository at first then moved it to a public repo after launch.This screen shows commits - code added to the app - by time. You can see where we started getting set up on GitHubon Tuesday afternoon, and the increasing intensity of work... All hands on deck until code freeze on Friday afternoon a few hours before launch. For a long time we didn't know whether we'd be able to pull it off. (You can also see a few more commits after launch, tidying things up.)
So we launched, people liked it, the dev/design team went back to tweaking and coding... There's always that temptation to fix just one more thing, and being able to show people what we were working on was a new motivation. We went to dinner, had cocktails...Image http://www.flickr.com/photos/49487266@N07/8091747743/ First Aerial Wedding SDASM Archives
Once the surprise was revealed, we openedup our GitHub repo - all our code and discussion was visible in commits and 'issues'. We wrote documentation to help outside people understand what was in Serendipomatic.A lot of useful activity within the team, and with people outside the team, takes place in the issues - it's not traditional project management, but we’re working with a messy inherited system for grouping work. Being open in this way has lead to collaborations with strangers e.g. working to enhance multi-lingual text parsing and API queries.
And immediately after, Tim Sherratt @wraggewas able to clone our code and get a local version running in 15 minutes, then update our code to add the Trove API. So the next time we pushed code to the live server, we were able to include Tim's code and make Trove available in Serendipomatic results.This was partly the result of relationships - Tim was meant to be at OWOT but had started a new job at the National Library of Australia so couldn't go, but had followed our progress... It also helps that Tim has a great reputation so we were likely to trust code that he sent our way - if it had been a complete stranger we might have had a more formal code review process, which would have taken longer and probably not gone live that morning.
And then we left the cocoon and all went our separate ways... Some people were teaching (or taking) intense courses, one or two changed jobs, and some of us had PhDs to get back to.Transition from rich in-person communication, all-hands-on-deck race against a very public deadline......to asynchronous, text-based communication fit in around busy lives and other commitments-> closer to volunteer open source projectTwo ways of looking at sustainability afterwards... Technical (dependencies, third-party APIs and libraries) and personal/social/cultural sustainability (open source, open conversations, open libraries).
We chip away at tasks as and when people have time; try to coordinate around software releases. Being distributed in time and space means we rely on software like GitHub issues to keep track of conversations and decisions; the visibility of task status is vital. Also use email and Google Docs so have to be careful that people can find things - tricky if there are too many places to look, and ideally as much as possible is public so we're making it easy for people to help us without stumbling across no-go areas etc.Image: http://twitpic.com/dgkfgy
Tidying up documentation, defining issues, updating code continued at a lesser pace through August, some again through October. Usually around conference calls - deadline and peer attention as motivator!After launch, we aimedto get release to a point where ‘ok if didn’t do anything else for a year’ - good to do, but reduces urgency for continued work. The pressure of expectations is reduced - there are lots of improvements we want to make to the graphic design, the UX, the magic moustache, but it's now just one of many options for some constructive procrastination. We were aiming for a November release but that's unlikely to happen unless people get coding over Thanksgiving.
Onto the issues... Even if you’re a proficient programmer, tech documentation or project manager, there’s a lot of jargon to get your head around in GitHub. Some of its mental models can be tricky, and until you’re comfortable with them, they can be a barrier.Even during our project, we would have benefited from taking time to take everyone through the basics of GitHub on the web and merging from their machine. Everyone could feel like they knew how to contribute and save bottlenecks/merge conflicts later.
Reliance on git to show recent activity raises a question about what's not shown. When choosing a tool, you’re buying into a system, a way of thinking about the world. Think carefully about what kinds of behaviours and relationships it privileges. GitHub doesn’t count issues as activity, only code commits, so there's a lot of work that's not represented on the screen where geeks assign credit for work done.Outreach and design people made a Café Press store - that also doesn’t show on GitHub.
So we managed to collaborate on specifying functionality with new people, but we haven't had anyone new contribute code. Partly for lack of time to encourage and manage it? But also because the combination of skills required means you're looking at a smaller pool of potential contributors.Ideally, coding skills in the right language. Some parts of the code might be ideal for people learning Python/Django (e.g. copying API bits to add a new service provider) but any language has conventions that can trip you up.
And as something that would now be a volunteer project, probably the biggest factor is finding people with all those skills and with free time... We'd need to embark on some outreach work to break requests into reasonably-sized pieces, refactor some code to make it easier for others to update, and let people know where we need help. (Or find an organisation who wanted to invest in it)
Open source projects also need some form of code review when accepting pull requests from someone, whether a OWOT participant or otherwise. Gatekeeping takes time, but perhaps we could devise ways to make it less of a technical task (e.g. automated builds, guidelines for code review). We also need to make sure the process is as playful, intentionally lighthearted as the original - it's finicky work but it doesn't have to be stressful.During the week itself, I trusted the decisions everyone made and the code they wrote because we had to. Hacky was ok cos we were so focused on the deadline. Can you sustain that?Imagehttp://www.flickr.com/photos/29454428@N08/2866498615/, State Library of New South Wales. First "Miss Australia", Beryl Mills of WA, 1927 / photographed by Sam Hood. Image found via Serendipomatic.
To really support external code or design contributions, we'd need to finish our design/UX and content guidelines, plus a description of the kinds of content providers we think would be suitable.There's also a role in keeping the project within the original spirit - playful, not academic; primary material, not journal articles, etc.Guidelines would help reduce the overhead on making and communicating decisions. There's a risk that the tacit knowledge we use to decide what's in and what's out creates a sense of insiders and outsiders - it's probably not a factor but it's something to think about.http://www.flickr.com/photos/29454428@N08/4415843058/ State Library of New South Wales
Beyond the issue of contributions for new functionality or adding content providers... We have to deal with the inevitable entropyin software projects. Browsers change, so things look different on the front end. APIs (i.e. access points to repositories)change, producing unexpected or fewer results. Software libraries change - we need to maintain things enough to stop weeds growing. http://www.flickr.com/photos/8623220@N02/6130074335/ The Library of Congress via Serendipomatic
In other words, a list of libraries is also a list of project dependencies...
And a list of APIs is also a list of external things that can break your site...
Set expectations about sustainability at the start of project; Talk to funders about their expectations for lifetime of project Use the richest communication channel possible that fits into contributors lives - visual, voice/video chat - not just textFind balance between institutional resources for sustainability and flexibility in IT platformsNurture your contributors; welcome people to your teamActivity begets activity; so do deadlinesDocumentation, tutorials, 'start here' Two ways of looking at sustainability afterwards... Technical (dependencies, third-party APIs and libraries) and personal/social/cultural sustainability (open source, open conversations, open libraries).Distributed hackdays as solution? Energy of working together plus deadline.
Conclusion - we were successfulin attracting external collaborators contributing code and ideas through personal relationships and open communication. Perhaps working in public helps people feel involved? It helps to have a clear ask - add new content provider; help us specify multi-lingual requirements (which itself came from feedback after launch). We assumed good faith on the part of commenters (even if seemed negative there's value when someone cares enough to comment) and contributors, and wanted them to share our enjoyment of the project.
Image from http://www.europeana.eu/portal/record/2022345/grabimg_php_wm_1_kv_122704.html?utm_source=api&utm_medium=api&utm_campaign=5QpzDWzoy