7: Security Operations

Contents ↕

Investigations & Forensics

Investigations

Operational or administrative investigations

seek to resolve technical issues
seek to restore normal operations
low standard of evidence
should end with a root cause analysis
- shouldn’t fix issue, should find root issue

Criminal investigations

look into possible crimes
involve fines / jail time
use the “beyond a reasonable doubt” standard for evidence

Civil investigations

resolve issues between two parties
no fines / jail time
use the “preponderance of evidence” standard for evidence

Regulatory investigations

conducted by government or industy regulators
may be civil or criminal in nature
interviews are a valuable tool for investigations
- should always be voluntary
- involuntary interviews are an interrogation — this should be left to law enforcement

Evidence types

all types of evidence are used in many ways during an investigation and trial

Real evidence

tangible items that can be taken into a court room

Documentary evidence

written info
- contracts
- logs
documentary evidence rules
- authenticity rule
  - documents must be authenticated by testimony
- best evidence rule
  - original evidence is superior to copies
- parol evidence rule
  - written contracts are presumed to be the entire agreement
  - verbal agreements are not included

Testimonial evidence

witness statements
- may not be hearsay
types
- direct evidence
  - witness provides evidence based on their own observations
- expert testimony
  - expert witness draws conclusions based on other evidence and experience

Digital forensics

investigative techniques that collect, preserve, analyze and interpret digital evidence

important
investigations must never alter evidence.
volatility
- relative permanence of a piece of evidence
  - evidence that may not last long is more volatile
  - evidence that is more permanent is less volatile
- order of volatility
  1. network traffic
  2. RAM / memory contents
  3. system / process data
  4. files
  5. logs
  6. archives
time offsets help correlate records from different sources
- check that times are correct
- check that timezones match
consider alternative sources of evidence
- video recordings
- witness statements

System and file forensics

images take the place of original physical media
write blockers
forensic disk controller
- prevents accidental modification to disks during an investigation
hashing
- protects evidence
- provides a unique file signature
- proves that files have not been changed (non-repudiation)
file metadata
- contains a lot of forensic info
  - file owner
  - creation time
  - modfication times
  - geolocation
- email and HTML headers
  - contain info about sender, receiver, network path, transit time, etc.
other forensic sources
- screenshots
- process tables
- memory contents
- OS configurations

Network forensics

Wireshark

monitors network
can capture full packet data
- requires a lot of storage

NetFlow

summarizes traffic
provides high level info
- IP address and port
- time stamp
- amount of data transferred
routers and firewalls also capture flow data using NetFlow, sFlow or IPFIX
- sFlow and IPFIX are similar to NetFlow

Bandwidth monitoring

reports network utilization

Software forensics

software code may be used as evidence

Intellectual property

may be used to resolve disputes between parties about intellectual property rights
- example: origin of code after a developer moves to another org

Malware origins

may be used to identify author of malware found on a systems
- example: how does the NSA know attacker code is “Russian”

Mobile device forensics

mobile devices are a goldmine
- email
- browsing history
- GPS history
- network connectivity history
device manufactuers know this, so devices are typically protected by strong encryption
requires a special set of tools and skills

Embedded device forensics

special-purpose computers found in smart devices
often found in homes and offices
- can provide info about location, presence, occupancy, temperature, etc.
modern vehicles
- often contain embedded systems
- can provide info about location, speed, time stopped, etc.

Chain of custody

chain of evidence

provides paper trail of evidence
evidence should be labeled and stored in sealed evidence containers

Evidence log events

initial collection
transfers
storage
handling / opening/resealing evidence container

Evidence log details

name of investigator
date/time
purpose
nature of action
evidence logs must be available to present in court
- can make evidence inadmissable w/out it

Reporting and documenting incidents

communication is a critical part of incident response

Nofication of key stakeholders

key stakeholders should be contacted
- CISO
- cybersecurity director
- other IT response teams
- businees process owners
- PR staff
- legal teams
- may need to notify external agencies
  - law enforecement
  - government
  - regulatory bodies
  - other officials
- may share data w/ ISAC centers
automated systems are efficient way to notify individuals

Regular updates

stakeholders need to be kept in the loop
may use same automated systems to send out status updates
may also use phone, video conferences ot in-person meetings

Formal incident reports

historical documenation written at conclusion of incident
should provide details
- nature of the incident
- incident response timeline
- containment, eradication and recovery
- lessons learned
  - recommendations for improvement

eDiscovery

Preservation

legal holds
- require preservation of relevant digital / paper records
- sysadmims need to suspend auto-deletion of relevant logs

Collection

sources of electronic records
- file servers
- endpoint systems
- emails
- cloud services
security team often assists w/ gathering these records

Production

attorneys review documents for releveance
turn over relevant documents to other side
most litigation holds never move to this phase

Logging & Monitoring

SIEM (security information and event management)

AI can help w/ security data overload
has access to logs across the org
performs log correlation
- example: might gather logs from firewall, web server and database, which can be used to see trands

Core functions

central secure collection point for logs
- all systems send logs to SIEM
source of AI
- detects patterns that other systems might miss

Dashboards

provide a centralized view of security info
can generate alerts
facilitate in trand analysis
- can build graphs, etc.
offer adjustible sensitivity

SOAR (security orchestration and automated response)

super SIEM
can use playbooks and runbooks to have automated responses to security events
playbooks
- process-focused response to security events
  - includes human and automated responses
runbooks
- automated responses to security events
  - executes immediately
  - can aid human investigators

Continuous security monitoring

facilitates real-time responses
maintains ongoing awareness of potential issues
supports org risk management decisions

Monitoring process

outlined in NIST SP 800.137
define
estanlish
implement
analyze/report
respond
review/update
SIEMS can assist w/ security data analysis and correlation

Analysis types

anomoly analysis
heuristical analysis
- detects outlier data points
trend analysis
- detects changes over time
behavioral analysis
- detects unusual user behavior
availability analysis
- provides uptime info

Endpoint monitoring

a single device can serve as a springboard to a larger attack, so devices should be monitored
normal monitoring monitors processor, memory and file system activity

User and entity behavior analytics

compares user activity to individual baselines
security tools can provide insight into endpoint behavior
- behavior monitoring can detect patterns related to specific exploit techniques
  - compare w/ baselines
  - look for patterns resembling known attacks

Resource Security

Physical asset management

maintaining control of physical assets starts w/ asset inventorying
- you can’t manage assets if you don’t know what you have!
asset management should follow a lifecycle technique
- for example
  1. user requests new hardware
  2. hardware is ordered and inventory record is created
  3. hardware arrives, receiving clerk records, gives to IT staff and updates inventory record
  4. IT staff images machine, affixes hardware asset tag, gives to user and updates inventory record
  5. hardware is used, reallocated and inventory record is updated
- in all steps, data updates are critical (to avoid losing assets)
media management
- tracks highly sensitive data
- often, hardware inventory softeware can track this as well

Change and Configuration Management

Change management

change comes frequently in IT — which is good — but change must be controlled and managed

change management
- ensures that orgs follow standard procedures for…
  - requesting,
  - reviewing,
  - approving, and
  - implementing…
- …changes to their info systems
request for change (RFC)
- a formal request to make a change which includes:
  - description of the change
  - expected impact
  - risk assessment
  - rollback steps
  - identification of those involved in the change
  - proposed schedule
  - affected configuration items (CIs)
changed made in an org should be approved by relevant authorities
- might be a user’s
- can include a change advisory board (CAB)
routine changes may be pre-approved (ex. rotating out tape backups)

Configuration and asset management

tracks specific device and system settings

baseline
- snapshot of a configuration
- can be used to identify changes to a system
  - compare the system’s current state to the baseline and note any differences
- automation allows for alerts in changes that deviate from baselines
versioning
- assigns a number to each version
  - ex. #.##.##, version.major.minor
- often used in software development
diagrams also serve as an important configuration artifact
should standardize configurations
- naming conventions
- IP address scheming
ultimate goal of change and configuration management is to help ensure a stable operating system

Security Principles

Need to know and least privilege

Need to know

limits info access
having a clearance to a certain level of information doesn’t entitle someone access to all of it
- access is given w/a valid reason
common in military and government

Least privilege

limits systems permissions to those needed for job function
implmenting in the real world can be burdensome
- emergency access procedures reduce business impact

Privilege aggregation

privilege creep

jeopardizes least privilege

Separation of duties and responsibilities

Separation of duties

no individual should possess permissions that when combined allow them to perform a highly sensitive action
- ex. accountant creating a new vendor and cutting checks to that vendor
infosec pros are often called on to create controls for separation of duties
infosec pros are often the subject of separation of duties
- example: a developer can’t create code and deploy it to a production system

Two person control

aka dual control
requires authorization of two individuals to perform a sensitive action
- examples
  - missile launches
  - checks that require two signators
separation of duties and two person control reduce the likelihood of fraud
- must collude to commit fraud

Privileged account management

safeguard admin accounts

password vaulting
- store admin password
- may remote into a server w/ admin account username and password
  - prevents owner of admin account from even knowing password
- may provide just-in-time access
command proxying
- eliminates need for direct server access
- PAM system sends commands to services/servers as the admin account
monitoring
- logs admin account activity
credential management
- rotates passwords and keys
PAM solutions will need to provide emergency access workflows
sudo
super user do
- allows users to temporarily assume admin rights
- use should be minimized

Incident Management

Incident response program

provides structure during cybersecurity incidents
describes the policies and procedures governing cybersecurity incidents
piss poor planning yields piss poor performance
- prior planning → strong incident response
- failure to plan → disaster

Incident response plan elements

statement of purpose
strategies and goals
approach to incident response
communications w/ other groups
senior management approval
NIST SP 800-61 can be used for guidance in develpoing an org’s plan

Incident response team

must have personnel available 24/7

IR team components

should have diverse membership
- various parts of the org
  - management
  - infosec
  - SMEs
  - legal
  - PR
  - HR
  - physical security team
- IR service providers
  - compliment an IR team
  - can provide critical resource for things not supported by the org
    - example: forensics expert
  - contract should be worked out in advance
    - don’t want to be negotiating a contract mid-incident

Incident communications plan

critical component of stakeholder mangement
ensures that all participants have the right info at the right time

Considerations

external communications should be limited to trusted parties
- info going public can be bad
  - bad PR
  - jeopardizes the investigation
law enforcement involvement requires careful consideration
- should consult w/ legal team
legal team will also ensure that the org is in compliance
- legislative requirements
- regulatory notification requirements
secure communications
- use to prevent inadvertant leaks

Incident identification

Internal incident data sources

monitoring is critical to effectively identify incidents
- IDS/IPS
- firewalls
- authentication systems
- vulnerability scanners
- systems event logs
- netflow records
- antimalware software
SIEM systems help in this

External incident data sources

first reports of an incident may also come from external sources
- employees
- customers
- websites
there should be a method for receiving external reports
strategic intelligence programs (ISACs) facilitate incident indentification efforts
counterintelligence hinders adversaries ability to gather information
first responders need to act quickly after identifying an incident
- may isolate systems
  - may quarantine systems

exam tip:
the highest priority of a first responder is to damage containment.

Escalation and notification

after initial containment, move to escalation and notification

Objectives

evaluate severity
- based on impact
esaclate resonse to appropriate level
notify mangement and other stakeholders

Triage

low impact
- minimal potential to affect security
- normally handled by the first responder
- doesn’t require an after-hours response
moderate impact
- significant potential to affect security
- triggers reaction from incident response team
- prompt notification of management
high impact
- may cause critical damage to security
- demands full mobilization fo incident response team
- requires immediate full response
- requires immediate notification of senior management
first responders must have the ability to activate the full incident response process

Mitigation

controls damange and loss to the org through containment

Containment strategy selection criteria

use criteria to chose the best containment option for the org
goal is to balance business needs and security
- need to use best judgement — there is no easy “right” answer

damage potential
evidence preservation
service availability
resource requirements
expected effectiveness
solution timeframe

attackers may detect containment actions
mitigation ends w/ stability
- business functions w/out danger to operations

Containment techniques

containment
- limits damage to confidentiality, integrity and availablilty

Segmentation

common network security technique
moves infected systems to a quarantine network

Isolation

moves infected systems to a network that is disconnected from the internal network

Removal

completely disconnects infected systems from network communications
orgs should select the most appropriate containment strategy for the situation
trade-off decision
- need to continue the investigation
- need to prevent further damage to systems
- need to prevent disruptions to the org

Incident eradication and recovery

eradication
- removing all traces of an incident
recovery
- restoring normal business functions
eradication and recovery go hand-in-hand and it’s often difficult to say which is which in regards to an action
attackers compromise systems
- may not know to what extent or how
- affected systems shoud be rebuilt to avoid future issues
- prevents the later use of backdoors
security issues that lead to the incident need to be corrected
when recovering, look at the following
- endpoint security controls
  - application white-/blacklisting
  - quarantine controls
  - access controls
- enterprise security controls
  - firewall rules
  - mobile device policies
  - DLP systems
  - URL and content filters
  - updating / revoking certificates

Data sanitation techniques

prevents confidential info leakage
clearing
- overwrites data with new data
- frustrates casual analysis
purging
- more advances techniques, deguassing
- frustrates laboratory analysis
- storage media is unusable by normal means
destroying
- media is obliterated and cannot be recovered
- impossible to analyze
use NIST flowchart to select most appropriate sanitization
physical destruction is the only true way to ensure that data has been deleted
deleting / formatting will never be the answer

Validation

final act of containment, eradication and recovery

Process

verify security configurations of all systems
run vulnerability scans
review account and permission reviews
verify that systems are logging
verify that logs are being sent to SIEM
validate that capabilities and services have been restored successfully

Post-incident activities

Lessons Learned

reflect on incident response
offer feedbak to improve future incident response
can use a trained facilitator
- neutral party
- played no part in the incident response
time is critical
- people tend to forget what they did relatively quickly
example lessons learned questions
- how well did staff perform?
- were processes followed
- were processes adequate?
- did anything inhibit the recovery?
- what should be done differently next time?
- etc.
create a report w/ lessons learned and recommendations
- be sure to follow change management when implementing any recommended changes
incident summary report
- describes response efforts
- useful for future incidents and future training

Evidence retention

need to comply w/ org policies
need to comply w/ legal requirements
- consult w/ legal team
store evidence in a secure manner
- ensure that chain of custody is maintained

Indicators of compromise

note indicators of change found from incident
incorporate into monitoring systems

Incident response exercises

read-through
- team members review the incident response plan and their roles individually
- provide feedback
walk-through
tabletop exercise
- teams gather for a review, and talk through the plan
- provide feedback
simulation
- teams gather and go through a practice secenario
- provide feedback
testing strategies frequently use a combination of all test types

Personnel Safety

Personnel safety

the physical safety of employees is always top priority

exam tip:
watch for questions that ask to prioritize business operations over human life.
isolated employees should be monitored to ensure their safety
- employees working overnight
- employees working alone in a SOC, NOC, etc.
- detective controls —such as CCTV — can be used for this monitoring
travelling employees should also be monitored to ensure their safety
panic buttons
- silently alerts security to a dangerous situation when pressed
duress codes
- codes that appear to function normally, but also trigger a safety response

Emergency management

emergency management plans should be based upon risk management
fire plans
- evacuation prcedures
- fire department nofitication
- accounting of all personnel
- regular fire drills
weather emergency plans
- dependent on site location
- blizzards
- floods
- wildfires
tornado plans
- include sheltering instructions
- regular drills
lock down plans
- in the event of workplace violence

Physical Security

Site and facility design

physical security is important to protecting info and systems
data centers contain massive amounts of sensitive data and computing resources
server rooms
- usually less secure than data centers
- often grow organically in small orgs
media storage locations
- especially if media / backups are stored off-site
- locations should have at least equal — if not better — physical security than a data center
evidence storage locations
- chain of custody must be preserved
wiring closets
- unauthorized access can result in eavesdropping and network device tampering
- distribution cabling should be protected as well
operations centers and other sensitive areas

Data center environmental controls

Cooling requirements

data centers have significant cooling requirements
excessive heat can reduce the life of equipment
old school data centers used to be very cool
- great expense to the org and environment
equipment is now less sensitive
explanded environmental envelope
- 64.4°F – 80.6°F

Humidity requirements

condensation can form if humidity is too high
static electricity can happen if humidity is too low
dew point 41.9°F – 50.0°F

HVAC and hot aisle/cool aisle

HVAC systems keep temperature and humidity in control
hot aisle/cold aisle
- servers draw in cool air from the front and expel hot air out the back
- using this idea, one can line up server racks back to back, creating cool air aisles and hot air aisles
- watch for questions that indirectly ask about hot aisle/cold aisle strategies

Data center environmental protection

natural disasters put data centers at risk

Fire

fire is a grave threat
fire requires:
- oxygen
- heat
- fuel
depriving a fire of any of these three requirements will extinguish it

Fire extinguishers

class	type	examples
A	common combustibles	wood, cloth, trash, paper, etc.
B	flammable liquids	gasoline, kerosene, oil
C	electrical	wiring, server racks
D	heavy metal	iron, nickle, Metallica
K	kitchen	fats, oil, grease

labels on fire extinguishers contain info about the class and type of fires it can extinguish
be able to identify fire extinguisher classes

Fire suppression systems

building-wide fire suppression systems
- wet pipe approach
  - contain water in pipes that are ready to deploy in a fire
  - can be dangerous to data centers if they leak
- dry pipe approach
  - contain pipes that only fill if a valve opens during a fire alarm
- chemical fire suppression systems
  - deprive a fire of oxygen of fire
  - dangerous to humans!

Sensors

fire detection sensors
- temperature sensors
- smoke detectors
- incipient detectors
moisture sensors

Flooding

data centers should be protected against the risk of flooding
- natural
  - flood plains, location w/in the building
- man-made
  - burst pipes, etc.
  - consider layout of pipes w/in building if possible

EFI

generated by all electrical equipment
can interfere w/ other equipment
can be used by attackers to eavesdrop
faraday cages can protect against EMI

Physical access control

Locks and entrances

locks
- restrict access through a portal (i.e. a window or door)
- preset lock
  - use a hardware lock
  - need correct key to open
  - should use key management to keep track of keys
- cipher lock
  - use a physical or electronic keypads
- biometric locks
  - use a person’s physical features
    - fingerprint, voice, retina
- card-based locks
  - use a card
    - magstripe, RFID, smart
tailgating
- following another authorized user into an area
mantraps *
remember to carefully maintain ACLs!

Facilities monitoring

use motion and noise detection systems
video surveillance systems
- act as deterrent and detective controls
- IR video may be useful in dark environments
- can play an important role in investigations

Other controls

fences can block traffic on foot or vehicles
- bollards can allow foot traffic but protects entrances from vehicles
cages can be used to protect equipment
- important in shared data centers
lighting increases intruder detection and acts as a deterrent
signage can provide legal recource
inducstrial camoflauge
- useful for making data centers non-descript
- drones and UAVs make it important that buildings are camoflauged from the ground and from the air

Visitor management

visitor management procedures protect against intrusions
visitor procedures should
- describe allowable visit purposes
- explain visitation approval authority
- describe requirements for unescorted access
- role of vistor escorts
all visitors should be logged
all visitors should be idendified w/ distinct badges
- if necessary badges should include “ESCORT REQUIRED”
cameras can provide extra monitoring of visitor areas

Physical security personnel

security guards are important to physical security
- receptionists can act as security guards
- menacing looking guards can also be used
robotic sentinels may be used in place of humans
two-person integrity
- requires two people to enter a sensitive are together
- discourages malicious activity in that area
  - requires collusion with other person
- think of two people needing to enter a bank vault
two-person control
- aka dual control
- requires authorization of two individuals to perform a sensitive action
  - examples
    - missile launches
    - checks that require two signators

Security Incident Response

event: change in state
incident: series of events that has a negative impact on an organization or their security
incident response focuses on containing damage and restoring normal operations
- minimize damage, minimize downtime!
investigations focus on the gathering evidence of the attack with a goal of prosecution
framework
- response capability
- incident response / handling
- recovery and feedback

Response Capability

corporate incident response policy, procedures and guidelines should be in place
legal, HR, senior management, key business units must be involved
if in-house, incident response team should be in place
- incident response team should have:
  - list of agencies and resources to contact / report to
  - list of experts to contact
  - steps for searching for, securing and preserving evidence
  - list of items to include on reports
  - lists of how items on various systems should be treated

Incident Response / Handling

triage
- detect
- identify
- notify
investigate
contain
analyze and track

Recovery and Feedback

exam tip:
unless specified by an exam question, always assume that you are not on the incident response team.
you should report, contain, and not touch / interfere!

Recover and Repair

restore system to operations
provide greater security afterwards

Provide Feedback

most important
often overlooked
document, document, document!

Computer Forensics

discipline of proven methods of collection, preservation, validation, identification, analysis, interpretation, documentation and presentation of digital evidence
forensic principles must be followed
can’t alter evidence as doing so can make it inadmissible in court
individuals dealing with original evidence should be trained in evidence handling
all activity with evidence should be documented — chain of custody
individuals are responsible for all of their actions while in possession of evidence

Five Rules of Digital Evidence

evidence must be:
- authentic
- accurate
- complete
- convincing
- admissible

Forensic Investigation Process

1.) Identification

Locard’s Theory of Exchange
- for everything taken, something is left behind
- what’s left behind can help identify the attacker

2.) Preservation

chain of custody must be well documented
- this is a history of how the evidence was:
  - collected
  - analyzed
  - transported
  - preserved
- necessary because digital evidence can easily be manipulated
hashing algorithms are used during process to show that data hasn’t been changed

3.) Collection

document!
minimize movement / handling
work on copies
work from most volatile to least
- CPU caches → RAM → HDD
capture an accurate image of the system
- need three hashes
  - hash of original hard drive from read-only system
  - hash of bit-level copy of hard drive used for analysis
  - hash of analyzed data from the copy
- all three hashes need to match
steps to collect evidence
- photo area, record what is on screens
- dump contents of memory
- power off system
- photo interior of system
- label all pieces of evidence
  - record location, who collected, how collected, date and time
- have legal and HR involved
Forth Amendment considerations
- protects citizens from illegal searches and seizures
- only applies to law enforcement or those acting on their behalf
- citizens may also be subject to the Electronic Communications Privacy Act
- police can only gather evidence with:
  - subpoena
  - search warrants
  - voluntary consent
  - exigent circumstances

4.) Examination

yields data (analysis yields information)
document what is seen
look for signatures of known attacks
review audit logs
perform hidden data recovery

5.) Analysis

yields information (examination yields data)
what’s the root cause?
what files were altered?
what files / applications were installed?
what communications channels were used?

6.) Presentation

interpret results of investigation and present in an appropriate format
document findings
provide expert testimony

7.) Decision

result of the investigation
what to do with suspects?
- corrective action
- legal response?

Evidence Types

direct:
- can prove a fact
- doesn’t need backup information
- information provided by the five senses of a reliable witness
real:
- physical evidence
- objects used in a crime / objects left behind at crime scene
best:
- most reliable
- signed contract
secondary
- supporting evidence
- expert opinion / testimony
corroborative:
- supporting evidence
- doesn’t stand on its own
circumstantial
- x
hearsay
- secondhand written / spoken testimony
- usually not admissible
demonstrative
- presentation-based
  - diagrams, x-rays, demonstations

Law Enforcement Investigations

do they have the available skills to perform an investigation?
bound by Forth Amendment, jurisdiction, Miranda rights, privacy laws
- more restricted than a citizen investigator
information is not controlled by the organization

Enticement vs. Entrapment

Enticement

tempting a potential criminal… but not actively
legal and ethical

Entrapment

tricking a person into committing a crime
illegal and unethical

Fault Management

Spares

redundant hardware
available in the event that the primary device in unusable
often associated with HDD
- hot
- warm
- cold
SLAs
MTBF and MTTR

Redundant Array of Independent Disks (RAID)

RAID-0

disk striping
writes to both disks
no redundancy or fault tolerance
provides performance improvements

RAID-1

disk mirroring
provides redundancy
least efficient use of disks (expensive)

RAID-5

disk striping with parity
provides fault tolerance
provides performance improvements

Redundant Servers

primary server mirrors a secondary server
- if the primary fails, roll over to the secondary
- provides server fault tolerance

Clustering

group of servers that act as a single system
- looks like a single server to users
provides high availability
provides scalability
easier to manage
may provide redundancy, load balancing or a combination of both
- active/active
- active/passive

Uninterruptible Power Supplies (UPS)

provide temporary battery-based power to systems in event of power loss
- usually long enough for systems to execute a graceful shutdown
good features
- long battery life
- remote diagnostic software
- surge protection and line conditioning
  - spike
  - sag
  - brown out
- EMI / RFI filtering
- allowance for shutting down of systems attached to it
issues to consider
- load that the UPS / battery can support
- battery duration
- speed to take load during outage
- physical space needed

Backups

backups of software and data
- having backup hardware is a large part of network availability
important to be able to restore data if:
- HDD fails
- disaster takes place
- software corruption

Backup types

Full

Full backup

archive bit reset
backs up everything
takes a lot of time and space
- impractical to do daily
- usually done on weekends

Incremental

Incremental backup

archive bit reset
backs up all files that have been modified since last backup
restoration process is to restore all backups
- full → incremental → incremental

Differential

Differential backup

archive bit not reset
backs up all files that have been modified since last full backup
restoration is to restore two backups
- full → differential

Copy

Copy backup

archive bit not reset
same as full backup, except archive bit being kept
unscheduled backup
- used before upgrades, system maintenance, deployments, etc.

Redundancy of staff

eliminate single points of failure — lol
cross train
job rotations
train and educate

Business Continuity Planning

Business continuity planning (BCP) vs. disaster recovery planning (DRP)

BCP
- long term focus
- focus on sustaining operations and protecting the business
- umbrella term that includes many other plans, including DRP

DRP
- short term focus
- often IT focused
- goal is to minimize effects of disaster and to take steps to resume operations in a timely manner
- deals with immediate aftermath of disaster

BCP and risk management relationship

BCP acts as a safety net for risk management
BCP acts in the event that risk mitigation steps in risk management fail
risk management: covers identified risks
BCP: covers gaps after

Disruption categories

non-disaster
- an inconvenience (disruption of service, device malfunctions)

emergency / crisis
- urgent immediate event where there is a potential for injury / loss of life / property

disaster
- entire facility is unusable for longer than a day

catastrophe
- destroys a facility

organization should be prepared for each category
emergency declaration: anyone should be able to make the call (i.e. pull a fire alarm)
disaster declaration: BCP coordinator makes the call

BCP frameworks

standards help with inconsistancy in terms, definitions, documents
organizations and guidance on BRP / DRP:
- DRII
- NIST SP 800-34 rev1
- ISO 27031
- BCI GPG
- (ISC)² Four Processes of Business Continuity

NIST SP 800-34 rev 1

(create graphic)

(ISC)² four processes of business continuity

1.) Project scope & planning

acquire BCP Policy Statement from senior mangement
conduct Business Organizational Analysis: structured analysis of business’s organizational assets
create BCP team
- should be cross-functional
- includes a project manager
- includes HR and legal reps.
analyze legal and regulatory issues related to the organization’s response to a catastrophic event

Business organization analysis

provides foundation for rest of BCP process
provides a groundwork to identify members of BCP team
evaluates:
- which departments responsible for core services
- what critical support services are needed
- senior management and other key individuals essential to the viability of the organization

BCP team selection

gather representatives from:
- departments responsible for core services
- key support departments identified by organization analysis
- IT staff
- security staff
- legal
- HR
- senior management

Assess resource needs

BCP development
- BCP team will need resources to perform all four steps of BCP process
- major resources needed:
  - effort of BCP team
  - assistance of supporting teams called to help with development of the plan
BCP testing, training, maintenance
- will require some hardware and software
- main resource is still manpower of employees involved
BCP implementation
- in the event of full-scale BCP being conducted:
  - significant use of resources and manpower
  - BCP will become focus of most—if not all—of the organization
  - BCP team will use resources judiciously, yet decisively

Legal & regulatory rompliance

senior management:
- has the ultimate legal responsibility
- may be:
  - held responsible / liable under various laws and regulations
  - sued by:
    - stockholders, if due care / diligence is not used in managing
    - employees / families, in the event of injury / death
legal and financial repercussions are a major way to attain senior management buy-in

2.) Business impact assessment

identifies and prioritizes all business processes / resources based on criticality
risk identification
- internal vs. third party assessment
- probability and impact
defines quantitative metrics to assist with prioritizing recovery focus
BIA helps prioritize recovery priorities

Identify priorities

create a list of business procedures and their impact on the organization
often delegated to departments for accuracy and buy-in
criticality driven by the amount of loss to the organization if a resource is unavailable
maximum tolerable downtime / maximum tolerable outage (MTD / MTO)
- longest time a function can be down before causing a loss that’s unacceptable to senior management
recovery time objective (RTO)
- estimated time to recover a function in the event of a disruption
- should be less time than MTD / MTO
recovery point objective (RPO)
- tolerance for data loss

Goals

Risk identification

Risks associated with procurement & cloud

evaluate a CSP’s BCP
- examine SLAs
verify that controls are in place in person or through a third party audit (SOC)

	Reports On	Visibility
SOC 1	financial reporting	private
SOC 2	security and technology	private
SOC 3	security and technology	public

Probability & impact assessment

asset value × probability × impact = total risk
total risk × controls gap = residual risk
AV = asset value
probability = ARO
impact = EF
SLE = single loss expectancy
ALE = annual loss expectancy
remember that some losses can’t be quantified (i.e. loss of reputation)

Resource prioritization

qualitative analysis can be used to prioritize risk
quantitative analysis is needed to…
- perform cost/benefit analysis
- justify mitigation steps

3.) Continuity planning

Strategy development

examine BIA for metrics and to map controls to meet objectives
determine appropriate responses to risk
- reduce
- assign / transfer
- accept
- reject
some risk will have to be accepted, while others require an active strategy

Provisions & processes

designs specific procedures to mitigate risk to a level acceptable to senior management
three assets types
- people — always the first priority
- buildings / facilities
  - hardening provisioning: mitigating harm to a building / facility
  - alternate sites
    - mirrored site
    - leased sites
      - cold site
      - warm site
      - hot site
- infrastructure
  - redundancy of critical systems and services
  - recovery strategies
  - failover / failback
    - failover: moving to secondary device/server/system
    - failback: resuming operations of primary

Facility recovery

mirrored site: dedicated site owned/operated by the organization
reciprical agreement with an internal/external entity
commercially leased site:
- cold
- warm
- hot
MOAs or SLAs

Cold Site	Warm Site	Hot Site
secondary location	secondary location	secondary location
~~equipment at location~~	equipment at location	equipment at location
~~connectivity at location~~	connectivity at location	connectivity at location
~~active before failover~~	~~active before failover~~	active before failover
outage measured in weeks	outage measured in day/hours	outage measured in hours/minutes

Infrastructure

supports critical elements of business
- servers
- systems
- routers
- switches
- processes
- architecture
high availability
- redundancy
- resiliency
- fault tolerance
hardened systems

4.) Approval & implementation

Approval

if possible, approved by CEO or other C-level individual
indicates the business’s dedication to BCP

Implementation

create implementation guide/schedule
deploy resources
supervise implementation plan

Training & education

distribute plan on need-to-know basis
everyone should get an overview

BCP subplans

three main purposes
- protect
- recover
- sustain

Protect

Crisis Communications Plan
- dissemination of necessary information
Occupant Emergency Plan (OEP)
- procedures for minimizing injury, loss of life, property damage in the event to an emergency

Recover

Business Recovery (/Resumption) Plan (BRP)
- procedures for business operations after a disaster
Continuity of Support / IT Contingency Plan
- procedures for recovering major applications or general systems
Cyber Incident Response Plan
- procedures to detect, respond to and limit consequences of a cyber incident
Disaster Recovery Plan
- procedures to recover capabilities at an alternate site

Sustain

Continuity of Operations Plan (COOP):

Roles & Responsibilities

senior executive management
- consistent support
- final plan approvals
- prioritization of critical business functions
- allocation of resources and personnel
- oversight and approval of BCP
- directing and reviewing test results
- ensuring maintenance of current plan

senior functional management
- develop and document maintenance and testing strategies
- identify and prioritize mission critical systems
- monitor plan development and execution
- ensure that periodic testing takes place
- create various teams needed to execute BCP

BCP steering committee
- conduct BIA
- coordinate with department reps
- should include:
  - business units
  - senior management
  - IT
  - secuirty
  - communciations
  - legal
  - HR

DRP Teams

Rescue team

responsible for immediate disaster
- employee evacuation
- “crashing” server room
- etc.

Recovery team

gets alternate site up and running
restores systems in order of criticality

Salvage team

returns operations to original location or permanent facility (reconstitution)

Developing DRP teams

management should appoint members
each member must:
- undersand goals of the BCP
- be familiar with the department they are responsible for
agree before event:
- who talks to the media, customers and stakeholders
  - whoever is trained in communications — can be PR rep, doesn’t have to be the CEO
- who will set up alternate communications methods
- who will set up offsite facility
- who will work on the primary facility

Types of tests

checklist test

copies of plans given to departments
functional managers review
no risk associated with test
don’t get a good assessment

structured walk-through (tabletop) test

department reps sit down and go through the plan together
more like a talk-through, am I right?
no risk associated with test
don’t get a good assessment

simulation test

go through a disaster scenario
continues up to the point of actually moving to a secondary site

parallel test

some systems are moved to the alternate site and processes work from alternate site
functionality still remains at primary site

full interruption test

original site is shut down
all processes moved to alternate site

Post-Incident Review / After Action Report

after a test or disaster:
- focus on improvement
  - what should have happened and didn’t
  - what went well
  - what happens next / what can be improved
- not a blame game — not productive use of review

Maintaining the BCP

keep plan up-to-date
- make it a part of business meetings and decisions
- centralize responsibility of updating the plan
- make the plan a part of
  - job descriptions
  - personnel evaluations
- report on BCP status regularly
if the plan is revised, original copies of the plan should be retrieved and destroyed
- you don’t want to work off of an old plan