We see reports all the time with headlines like "90% of data breaches caused by human error". But what does that really mean? In this talk I will cover the traditional view of human error and how it hinders our ability to develop and maintain secure systems. Other industries have improved safety and security by shifting their view of human error. We can apply many of the the same concepts to software development and operations, minimizing risk and maximizing learning opportunity.
4. ...breaches caused by
insiders are often
unintentional. In fact,
over 95 percent of
these breaches are
caused by human
error.
IBM 2015 Cyber Security Intelligence Index
5. human error
‘Human error’ blamed for Rogers online security breach
Healthcare breaches need a cure for human errors
Human error causes most data breaches, Ponemon study
finds
Human Error Blamed for Most UK Data Breaches
Human error is the root cause of most data breaches
Human error causes alarming rise in data breaches
Human Error: The Largest Information Security Risk To Your
Organization
Huge rise in data breaches and it’s all your fault
Data breaches caused mostly by negligence and glitches
10. experience = bias
Our ability to reason about the systems that
we’re working with (and are part of) diminishes
as their scale and interdependence increases.
We can no longer rely solely on past experience,
and instead have to continuously discover how
systems are functioning or failing, and adapt
accordingly.
Dave Zwieback - Every company is a learning
company
11. “human error”
we can do better
other industries
have already
learned this
lesson
http://amzn.com/B00Q8XCSFI
12. Old View
◦ Asks who is responsible
for the outcome
◦ Sees human error as the
cause of trouble
◦ Human error is random,
unreliable behaviour
◦ Human error is an
acceptable conclusion of
an investigation
two views of “human error”
New View
◦ Asks what is responsible
for the outcome
◦ Sees human error as a
symptom of deeper
trouble
◦ Human error is
systematically connected
to features of people’s
tools, tasks and operating
environment
◦ Human error is only the
starting point for further
investigation
13. “
Rather than being the main
instigators of an accident, operators
tend to be the inheritors of system
defects created by poor design,
incorrect installation, faulty
maintenance and bad management
decisions. Their part is usually that
of adding the final garnish to a
lethal brew whose ingredients have
already been long in the cooking.
http://amzn.com/0521314194
14. When we’re dealing
with complex systems,
the magnitude of a
cause is often not
proportionate to the
magnitude of its effect
18. warning signs
◦ security policy is not visible
◦ security is at odds with how work
gets done
◦ developers use a different workflow
than production
◦ documentation featuring warnings
(“don’t do this in production!”)
◦ SSH + sudo
◦ talking processes, not people
◦ audits are time-consuming
19. references
Sidney Dekker
◦ “Just Culture” Lecture (video)
◦ A Field Guide to Understanding ‘Human Error’
◦ Just Culture: Balancing Safety and
Accountability
◦ Human Error - James Reason
◦ The Design of Everyday Things - Dan Norman
◦ Universal Principles of Design - William Lidwell
What do I mean by reformed?
It took me 5 years to understand that delivering software is about people, not just code.
“Everyone but me is an idiot” - This was my problem, not everyone else’s - I just didn’t understand their needs and motives
If you want to have more impact as an engineer, you have to learn how to deal with people
The Oz principle - described in a business book (keep it above the line)
Above the line: accountability and success
Below the line: self-victimization and failure
Recommended books - okay yeah tech books, but other good ones:
How to make friends and influence people
Thinking fast and slow
Talk about mentors, tech and not-tech
Talk about being an introvert a little bit
A few of the more high-profile breaches this year - almost every company is being breached
55% of attacks are by insiders
How media reports security breaches, this is the status quo
Obviously what we’re doing is not working
Let’s talk about some of the things we’re doing
Obscuring security leads to a false sense of security, which is often more dangerous than not addressing security at all.
If the security of a system is maintained by keeping the implementation of the system a secret, the entire system collapses when the first person discovers how the security mechanism works—and someone is always determined to discover these secrets.
Visibility is usually the biggest blocker to adoption of security practices across an org.
Shadow IT
open vs closed source? you pick
hybrid - talk about slosilo @ Conjur (Esperanto for ‘key to open a lock’)
root cause analysis boils down a complex problem into one cause (usually who to blame)
Rarely results in enough data to remediate a problem and stop if from happening in the future
5 whys has the same problem
instead of “why did you screw up the backup?”, ask “how did you do the backup?” - we’re looking for details, what made sense at the time?
instead, we should be asking “how?” to collect multiple narratives of what happened at the time
Negative reinforcement leads to people hiding things they know to avoid punishment
Known problems > unknown problems
Someone admits they messed up, gets disciplinary action, do you think other people on the team will admit when they mess up? No
We’re not going to code our way out of these problems
Experience shapes how we do things in the future
this can be a good thing! it saves us time, employers want it.
but experience is a bias, so we need to constantly evaluate how useful our experience is in new and changing conditions
Some problems only become apparent at scale or when combined with other systems
Lots of great examples of how other industries have benefitted from shifting to the “new view” of human error (covered in next slide)
Tell story of NZ surgeon jailed for negligence
Moved from Britain
Bunch of patients died, newspaper picked it up, he went to jail
Root cause: he was negligent
Further investigation: understaffed - assistants were med students; no re-licensing even though procedures were different in NZ
Old View solution: people should do as they're told. Their attitude is the problem, fix with sanctions and shaming.
This view, the Old View, is limited in its usefulness. In fact, it can be deeply counterproductive. It has been tried for decades, without noticeable effect. Safety improvement comes from abandoning the idea that errors are causes, and that people are the major threat to otherwise safe systems.
The point of a New View ‘human error’ investigation is not to say where people went wrong (that much is easy). The point is to understand why they thought they were doing things right; why it made sense to them at the time.
Think up and out, not down and in (reductionist).
James Reason in “Human Error”
Underneath every simple, obvious story about ‘human error,’ there is a deeper, more complex story about the organization.
Complex systems are not cause == effect. We have the intuition that a large effect, a big screwup, must have a big cause and be punished accordingly.
A small inconsistency can eventually bring down a whole system.
We’re talking consistency and chaos theory.
Notions of accountability become difficult. What we want to hear diverges from how we can affect change.
Bad apples are the ghost story, the lullaby that we tell ourselves - we can feel better.
We come to the problem here, we need to figure out what accountability means so that we can make it compatible with learning.
What we’re talking about is creating a Just Culture. One where people can feel safe reporting problems.
holding people accountable is fine;
but you need to be able to show that people had the authority to live up to the responsibility that you are now asking of them;
if you can’t do that, your calls for accountability have no merit, and you’d better start looking at yourself.
At some point there is a discretionary space wherein we must hold individuals accountable.
Be very clear about where this space begins and ends. Anything you are responsible for you should have full authority over.
you cannot keep this gap small and open it when a problem happens - this is deeply unfair
How do we motivate people to take conscientious decisions in this space? Fear or invitation to participate?
Fear is going to cause issues to go unreported
accountability = share your account
The way we view failures has changed over time - now they are ‘failures of risk management’
they used to be random meaningless events in the early 1900s, in the mid-1900s acts of God, now it’s human error - only in the last 100 years has this changed
this is a consequence of our engineering prowess - complex systems are not allowed to fail
Airlines in the 80’s and 90’s were having problems with oil caps not being properly replaced on JT8D engines. Lots of good mechanics were temporarily suspended over this. Closer investigation revealed that the caps were so hot that mechanics could not properly get a hand on them to check whether they were actually sealed. A visual check was not sufficient, but supervisors did not believe mechanics telling them about the problem. They finally fixed the caps, problem solved. The bottom line: discipline without understanding the problem is ineffective.
These groups will have different needs, you need to create a feedback mechanism for each and between them.
Talk about UX here - these are really user personas - SecX?
point 1 - check it in as code, mention Conjur DSL
point 2 - a change that should be simple ends up being really hard - inflexible security system, technical debt
point 3 - secrets, SSH management
point 4 - too easy to screw up
point 5 - what do we use SSH for? changing state shouldn’t be an answer
point 6 - “I don’t care who it is, they have to follow the process!” - inflexible, shadow IT
point 7 - If audits are custom one-off events, OUR visibility into our own system is probably subpar