Contenu connexe Similaire à DevOps in an Embedded and Regulated Environment (20) DevOps in an Embedded and Regulated Environment1. DT8
DevOps & Embedded
6/8/2017 3:00:00 PM
DT8 DevOps in an Embedded and
Regulated Environment
Presented by:
Arjun Comar
Coveros
Brought to you by:
350 Corporate Way, Suite 400, Orange Park, FL 32073
888-‐268-‐8770 ·∙ 904-‐278-‐0524 - info@techwell.com - https://www.techwell.com/
2. Arjun Comar
Coveros
Arjun Comar became interested in software at the tender age of nine when his
parents got sick of him at home and threw him into a summer class at the local
community college. It took many years to satisfy his hunger for learning, and it's
become a lifelong passion. For the past few years Arjun has been in the weeds
implementing DevOps solutions for a variety of projects and organizations-from
advertising with its websites and services to healthcare with medical devices.
What matters to Arjun is bridging the gap between what we want to build and
what we actually place in the hands of our users. Let's get that right.
3. 5/31/2017
1
Agility. Security. Delivered.
DevOps in a Regulated andDevOps in a Regulated and
Embedded Environment
By: Arjun Comar
(Was DevOps on a Legacy Project)
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
1
twitter: @arjuncomar email: arjun.comar@coveros.com
Agenda
• About Me
• Agile, DevOps, and Medical Devices: What’s the Problem?
l l d ld• Git Flow in a Regulated World
• Expect to Deploy
• Scaling for Success and Resource Management
• Questions
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
2
twitter: @arjuncomar email: arjun.comar@coveros.com
4. 5/31/2017
2
About Me
• B.S. in Computer Science from the Rose‐
Hulman Institute of Technology
W k d thi f th Li• Worked on everything from the Linux
kernel to computer vision.
• Interested in software quality and
correctness.
• Been with Coveros for ~2.5 years.
• Run the local HaskellDC meetup group.
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED. 3
twitter: @arjuncomar email: arjun.comar@coveros.com
About Coveros
• Coveros builds security‐critical applications using
agile methods.
• Coveros Services
• Agile transformations• Agile transformations
• Agile development and testing
• DevOps and continuous integration
• Application security analysis
• Agile & Security training
• Government qualifications
• DCAA approved rates and accounting
• TS facility clearance
Areas of Expertise
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED. 4
twitter: @arjuncomar email: arjun.comar@coveros.com
5. 5/31/2017
3
Select Clients
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS RESERVED. 5
twitter: @arjuncomar email: arjun.comar@coveros.com
Medical Devices and the Law
• It isn’t sufficient to write the code, release requires regulatory
approval.
l f ( )• Approval is per feature (epic)
• Contingent on development, testing, risk mitigation, etc.
• We want short‐lived branches, but…
• If we don’t get approval for one feature, business still wants to release
the others
• Unmerge all the feature branches that went into an epic?
• Further requirements around documentation, especially:
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
6
q , p y
• Design
• Testing
• Risk Management
twitter: @arjuncomar email: arjun.comar@coveros.com
6. 5/31/2017
4
Legacy Problems
• C code, embedded device target
• cross compilation: Windows ‐> QNX
S d l l b ilt Wi XP• Some modules only built on WinXP
• Manual build, deploy, test process
• Custom hardware, custom firmware
• Old codebase, not written to be unit tested
• Unit test execution requires target environment
• Rough order of magnitude, 200 kloc codebase
• Hardware platform ~25 years old
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
7
Hardware platform 25 years old
twitter: @arjuncomar email: arjun.comar@coveros.com
Integration and Deployment
• Manual builds, deploy to unit test?
• Unmaintained deployment scripts
b k h• Written by a contractor in ksh,
• Last maintainer had already left the company
• Working deployments flashed unit with usb stick and physical dongle
• Rewrite with Chef? ...Ansible? … Bash?
• try: sh run over telnet
• No ruby, python, perl, bash, ssh, dhcp
• Network deployments/updates to a device that goes in a human
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
8
p y / p g
being…?
twitter: @arjuncomar email: arjun.comar@coveros.com
7. 5/31/2017
5
Feedback Cycles
• Deployments took ~30 minutes and required physical interaction
through the process
l d l l h d l d d l• Testing involved long protocols with detailed and very particular
steps
• ~5‐6 weeks for the test team, maybe 8 weeks, but at least 3‐4.
• Release cycle on the order of years.
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
9
twitter: @arjuncomar email: arjun.comar@coveros.com
Resource Needs and Team Size
• Business wanted multiple features in development in parallel
• Different tests take different lengths of time to run
h d• even when automated
• seconds ‐> weeks
• Business needed 4 teams like the one they had
• Continuous integration targets, unit test targets, deployment
testing targets, full functional test targets, partially automated test
targets
• Performance reliability security durability etc ?
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
10
• Performance, reliability, security, durability, etc.?
twitter: @arjuncomar email: arjun.comar@coveros.com
10. 5/31/2017
8
But I can’t merge back daily...
• No, really. Daily merges back to develop means pulling an epic out
requires a virtually impossible unmerge.
h b l ll d f d h f• Might be legally required not to go forward with a feature
• Can’t get approval until feature is developed and tested with
known risks documented and mitigated
• Business still wants to release what they can
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
15
twitter: @arjuncomar email: arjun.comar@coveros.com
Can’t not integrate...
• Long lived lines of development, all separate
• Tested independently prior to release
l b h d• Business wants to release, integrate necessary branches and…
• Disaster: merge conflicts, retest everything, unknown interactions
everywhere
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
16
twitter: @arjuncomar email: arjun.comar@coveros.com
11. 5/31/2017
9
Extending the
workflow to deal with
regulation
Extend the git flow modelExtend the git flow model
Keep epic specific code in ‘develop/epic‐
name’ branches
Use ‘feature/epic‐name/feature‐name’
branches for daily work
Merge these back daily!
Epic branches get merged back for a
release
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
17
twitter: @arjuncomar email: arjun.comar@coveros.com
Integrating Continuously
• Use tooling to manage the problem for you
• Have Jenkins (or your CI stack of choice) do builds by merging
d l h h b h fdevelop with the epic branches first
• develop holds code that will be released, features that conflict must be
fixed
• Run the normal deployment and testing cycle on these builds
• merge conflicts are failed builds
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
18
twitter: @arjuncomar email: arjun.comar@coveros.com
12. 5/31/2017
10
Integrating even more continuously
• Still need to know if there’s potential conflicts between epic
branches
f l l f l f h• fail early, fail often, right?
• Take all the epic branches and merge them with develop
• Run a full build/deploy/test cycle on this mess as well.
• Any failures found ‐> failed build
• If it doesn’t cleanly merge, we can’t release, right?
• The software should always be ready to release; make it a business
decision not a technical one
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
19
decision, not a technical one.
twitter: @arjuncomar email: arjun.comar@coveros.com
Digging deeper to unearth conflict
• Better error detection and reporting:
• If we merge everything together, it looks like the later branches cause
fli t ftconflicts more often
• Branches that conflict exclude each other
• Find conflicting pairs and report them both as failed
• Conflicts may only show up with the interaction of 3+ branches
• But this gets exponentially hard to detect
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
20
twitter: @arjuncomar email: arjun.comar@coveros.com
13. 5/31/2017
11
Do what you can
• Merge all possible epic branch pairs together, track+report failures
• Report these failures once or the team will ignore you...
B h th t l l ith thi t d t th• Branches that cleanly merge with everything get merged together
with development and built
• This assesses the health of the software as it exists at this moment
• This might be expensive, so do it overnight.
• Shortcuts:
• If ‘A’ merges with ‘B’, then ‘B’ merges with ‘A’
• ‘A’ always merges with ‘A’
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
21
• A always merges with A
• (You only need the top half of the n x n matrix)
twitter: @arjuncomar email: arjun.comar@coveros.com
This is a lot of work...
• Long‐lived branches are hard to deal with.
• You could even go further and build the sets of conflicting
b h h b d hbranches that can be merged together
• This is really hard; it’s easier to ask the team to fix the mess.
• If you don’t have to do it, don’t.
• You probably don’t unless regulatory constraints make you.
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
22
twitter: @arjuncomar email: arjun.comar@coveros.com
14. 5/31/2017
12
Expect to Deploy
What a lifesaver
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
23
twitter: @arjuncomar email: arjun.comar@coveros.com
Expect?
• Tcl scripting language used to automate interactive programs
• ...like telnet and ftp
d b k h d• Was used to automate testing way back in the day
• Turns out to be rather perfect for scripting deployments, testing,
etc. in this tool restricted environment
• sh, ksh, telnet, ftp
• not: bash, python, ruby, ssh, perl, etc.
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
24
twitter: @arjuncomar email: arjun.comar@coveros.com
15. 5/31/2017
13
Wait, why not use ...
• Yes, we could have tried to beat that wall down
• Lots of effort/expertise to produce a working build of python for
hthe target environment
• QNX support would probably have been willing to help
• But loading new software onto the target environment to increase
its capabilities is fundamentally risky
• Business was understandably risk averse
• Rather limited DevOps team at this point of me, myself, and I.
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
25
twitter: @arjuncomar email: arjun.comar@coveros.com
A little expect script
$ cat login.expect
#!/usr/bin/expect
set timeout 20
set addr [lindex $argv 0]
set user [lindex $argv 1]
set pass [lindex $argv 2]
spawn telnet $addr
expect "login:"
send "$userr"
expect "Password:"
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
26
send "$passr"
expect "#"
interact
twitter: @arjuncomar email: arjun.comar@coveros.com
16. 5/31/2017
14
Adding a little abstraction
proc login { addr user pass } {
spawn telnet $addr
expect {
timeout { send_user "Could not connectn"; exit 1 }
eof { send_user "Connection refusedn"; exit 1 }
"login:"
}
send "$userr"
expect "Password:"
send "$passr"
expect {
timeout { send_user "Failed to login.n"; exit 1 }
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
27
"#"
}
}
twitter: @arjuncomar email: arjun.comar@coveros.com
Separation of Concerns
• It only takes minor modifications to use the same logic to connect
to ftp
f l d d l h ll h• Use ftp to upload deployment archive, install sh script
• Use telnet to set permissions and execute install script on archive
• Deployment logic is now separate from connecting, setup, etc.
• “talking to the target” vs “doing stuff on the target”
• This is exactly the separation chef/puppet/ansible provide
• (They also provide a whole lot of other value as well, but it’s nice to
recover any of it!)
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
28
recover any of it!)
twitter: @arjuncomar email: arjun.comar@coveros.com
17. 5/31/2017
15
Towards a deployment framework
• How many environments like this are out there?
• limited tooling, embedded platform, etc.
If th l t h th t t f d l t f k t• If there are a lot… we have the start of a deployment framework to
target these environments
• Dependencies are very minimal, can be used to target virtually
anything
• With work, we could get something idempotent with clean
modularity and composability.
A h l l t f k I th k t th t d thi ?
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
29
• A whole lot of work… Is there a market that needs this?
twitter: @arjuncomar email: arjun.comar@coveros.com
Scaling for Success
and Resource Management
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
30
twitter: @arjuncomar email: arjun.comar@coveros.com
18. 5/31/2017
16
Resource Needs
• Embedded device with potential hardware attachments for
particular tests ‐‐ virtualization is out.
• Unit tests need to run in the target environment so one target is g g
needed at a minimum just for rapid feedback CI.
• Basic integration testing (i.e. devint env) takes ~1 min to ~ 10 mins
• Fully automated functional testing takes ~10 mins to 1+ hours to
run (i.e. test env)
• Partially automated tests require interaction, need another target.
• Longer term testing (i.e. stress, durability, performance, etc.) takes
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
31
Longer term testing (i.e. stress, durability, performance, etc.) takes
weeks and needs its own target.
• ~5 targets minimum to support development for basic CI/CD
twitter: @arjuncomar email: arjun.comar@coveros.com
Tackling Resource Allocation
• If a new build kicks off and reaches deployment testing while the
previous round of smoke testing is still on‐going, what happens?
b bl b k d l l d d d h l h• Probably: target gets bricked as OS level code is updated while the
machine is in use.
• Even if the pipeline is built carefully so these things can’t happen,
there’s always PEBKAC
• Deployment and testing tools need to be smart enough to check if
a console is available before attempting to use it
• We need a resource allocator
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
32
• We need a resource allocator...
twitter: @arjuncomar email: arjun.comar@coveros.com
19. 5/31/2017
17
Making a first pass
• Track the target state on the target
• Use an old Unix trick ‐‐ drop a lock file in a well‐known spot, and
k l h l k b f hmake tools attempt to acquire the lock before using the target
• Pros: Extremely simple to implement and use; it’s a really simple
pair of shell scripts.
• Cons: If the lockfile isn’t cleaned up, the target is unavailable; if the
tool (user) doesn’t check for the lock, they could still cause
problems. It’s hard to track what targets are in use where, there’s
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
33
no centralized management.
twitter: @arjuncomar email: arjun.comar@coveros.com
Aside: Jenkins Pipeline
• Specifying the pipeline in groovy
instead of shell/jenkins xml
prevented a lot of bugs
def locking(target, action) {
try {
acquireLock(target)
action()
} fi ll {prevented a lot of bugs.
• acquireLock and releaseLock have
simple contracts and provide
strong guarantees with try/finally
idiom.
• This is tricky/hard to achieve with
traditional jenkins.
} finally {
releaseLock(target)
}
}
downloadTests(latest)
locking(targetAddr) {
deploy(targetAddr)
runTests(targetAddr, myBuild, testTags)
}
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
34
twitter: @arjuncomar email: arjun.comar@coveros.com
20. 5/31/2017
18
Multiple teams, multiple workstreams
• Goal is to reduce cycle time. If one team has to wait for feedback
for another team’s build to finish, we’re wasting time.
k ’ ff l h b• Key takeaway: we can’t effectively share environments between
parallel streams of development.
• Business wanted ~4 streams of work progressing in parallel.
• Team needs to be able to support old releases via hotfixes (~2 old,
previous release, current stream of development).
• Hardware/firmware platform changes between releases
T t t ti t d t i t t t t th i t t
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
35
• Test automation team needs to an environment to test their tests.
• DevOps team needs to be able to test pipeline changes.
• ~40 target machines to effectively support CI/CD pipeline.
twitter: @arjuncomar email: arjun.comar@coveros.com
That’s a lot of equipment...
• Where do you put it all?
• Shelving/rackspace, cooling, switches, networking…
U it i if th ’t i / d d b i i i• Units are expensive; if they aren’t in use/needed, business is going
to get annoyed.
• Hard to track utilization, load, etc. from a really decentralized
place.
• We might also be able to save money / use fewer targets if we’re
more intelligent about allocating them; i.e. allocate on demand.
C t li ti l t t hitti i t h
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
36
• Centralization also means we can start hitting nice‐to‐haves:
• console access from the web browser for debugging
• status/health check daemon reporting to the manager
twitter: @arjuncomar email: arjun.comar@coveros.com
21. 5/31/2017
19
Centralized Resource Management
• Pool available targets, expose REST API to acquire a target for use,
release a target, check a target, etc.
k• Track target status, usage metrics, target requester statistics in
backend database.
• Set up a simple frontend to display statistics about usage, provide
a manual form to acquire a target for manual/ad‐hoc testing, etc.
• Like a library; acquire target for duration, get grumpy emails if it’s
not returned in time.
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
37
• Can be easily expanded to provide additional services over time.
twitter: @arjuncomar email: arjun.comar@coveros.com
Lightning Quick Recap
• Integrate continuously to keep software testable, increase quality,
and build confidence.
h d l f k f• Prioritize the delivery of working software.
• Fail early, fail often.
• Make your tools serve your needs.
• Set yourself up to success ‐‐ plan ahead to cover scaling needs.
© COPYRIGHT 2016 COVEROS, INC. ALL RIGHTS
RESERVED
38
twitter: @arjuncomar email: arjun.comar@coveros.com