7: Security Operations
- seek to resolve technical issues
- seek to restore normal operations
- low standard of evidence
- should end with a root cause analysis
- shouldn’t fix issue, should find root issue
- resolve issues between two parties
- no fines / jail time
- use the “preponderance of evidence” standard for evidence
- all types of evidence are used in many ways during an investigation and trial
- tangible items that can be taken into a court room
-
written info
- contracts
- logs
-
documentary evidence rules
-
authenticity rule
- documents must be authenticated by testimony
-
best evidence rule
- original evidence is superior to copies
-
parol evidence rule
- written contracts are presumed to be the entire agreement
- verbal agreements are not included
-
-
witness statements
- may not be hearsay
-
types
-
direct evidence
- witness provides evidence based on their own observations
-
expert testimony
- expert witness draws conclusions based on other evidence and experience
-
-
investigative techniques that collect, preserve, analyze and interpret digital evidence
important
investigations must never alter evidence. -
volatility
-
relative permanence of a piece of evidence
- evidence that may not last long is more volatile
- evidence that is more permanent is less volatile
-
order of volatility
- network traffic
- RAM / memory contents
- system / process data
- files
- logs
- archives
-
-
time offsets help correlate records from different sources
- check that times are correct
- check that timezones match
-
consider alternative sources of evidence
- video recordings
- witness statements
-
images take the place of original physical media
-
write blockers
forensic disk controller- prevents accidental modification to disks during an investigation
-
hashing
- protects evidence
- provides a unique file signature
- proves that files have not been changed (non-repudiation)
-
file metadata
-
contains a lot of forensic info
- file owner
- creation time
- modfication times
- geolocation
-
email and HTML headers
- contain info about sender, receiver, network path, transit time, etc.
-
-
other forensic sources
- screenshots
- process tables
- memory contents
- OS configurations
- monitors network
- can capture full packet data
- requires a lot of storage
- summarizes traffic
- provides high level info
- IP address and port
- time stamp
- amount of data transferred
- routers and firewalls also capture flow data using NetFlow, sFlow or IPFIX
- sFlow and IPFIX are similar to NetFlow
- reports network utilization
- software code may be used as evidence
- may be used to resolve disputes between parties about intellectual property rights
- example: origin of code after a developer moves to another org
- may be used to identify author of malware found on a systems
- example: how does the NSA know attacker code is “Russian”
-
mobile devices are a goldmine
- browsing history
- GPS history
- network connectivity history
-
device manufactuers know this, so devices are typically protected by strong encryption
-
requires a special set of tools and skills
-
special-purpose computers found in smart devices
-
often found in homes and offices
- can provide info about location, presence, occupancy, temperature, etc.
-
modern vehicles
- often contain embedded systems
- can provide info about location, speed, time stopped, etc.
chain of evidence
- provides paper trail of evidence
- evidence should be labeled and stored in sealed evidence containers
- initial collection
- transfers
- storage
- handling / opening/resealing evidence container
-
name of investigator
-
date/time
-
purpose
-
nature of action
-
evidence logs must be available to present in court
- can make evidence inadmissable w/out it
- communication is a critical part of incident response
-
key stakeholders should be contacted
-
CISO
-
cybersecurity director
-
other IT response teams
-
businees process owners
-
PR staff
-
legal teams
-
may need to notify external agencies
- law enforecement
- government
- regulatory bodies
- other officials
-
may share data w/ ISAC centers
-
-
automated systems are efficient way to notify individuals
- stakeholders need to be kept in the loop
- may use same automated systems to send out status updates
- may also use phone, video conferences ot in-person meetings
- historical documenation written at conclusion of incident
- should provide details
- nature of the incident
- incident response timeline
- containment, eradication and recovery
- lessons learned
- recommendations for improvement
- legal holds
- require preservation of relevant digital / paper records
- sysadmims need to suspend auto-deletion of relevant logs
-
sources of electronic records
- file servers
- endpoint systems
- emails
- cloud services
-
security team often assists w/ gathering these records
-
attorneys review documents for releveance
-
turn over relevant documents to other side
-
most litigation holds never move to this phase
- AI can help w/ security data overload
- has access to logs across the org
- performs log correlation
- example: might gather logs from firewall, web server and database, which can be used to see trands
- central secure collection point for logs
- all systems send logs to SIEM
- source of AI
- detects patterns that other systems might miss
- provide a centralized view of security info
- can generate alerts
- facilitate in trand analysis
- can build graphs, etc.
- offer adjustible sensitivity
-
super SIEM
-
can use playbooks and runbooks to have automated responses to security events
-
playbooks
- process-focused response to security events
- includes human and automated responses
- process-focused response to security events
-
runbooks
- automated responses to security events
- executes immediately
- can aid human investigators
- automated responses to security events
- facilitates real-time responses
- maintains ongoing awareness of potential issues
- supports org risk management decisions
-
outlined in NIST SP 800.137
-
define
-
estanlish
-
implement
-
analyze/report
-
respond
-
review/update
-
SIEMS can assist w/ security data analysis and correlation
-
anomoly analysis
heuristical analysis- detects outlier data points
-
trend analysis
- detects changes over time
-
behavioral analysis
- detects unusual user behavior
-
availability analysis
- provides uptime info
-
a single device can serve as a springboard to a larger attack, so devices should be monitored
-
normal monitoring monitors processor, memory and file system activity
-
compares user activity to individual baselines
-
security tools can provide insight into endpoint behavior
- behavior monitoring can detect patterns related to specific exploit techniques
- compare w/ baselines
- look for patterns resembling known attacks
- behavior monitoring can detect patterns related to specific exploit techniques
-
maintaining control of physical assets starts w/ asset inventorying
- you can’t manage assets if you don’t know what you have!
-
asset management should follow a lifecycle technique
- for example
- user requests new hardware
- hardware is ordered and inventory record is created
- hardware arrives, receiving clerk records, gives to IT staff and updates inventory record
- IT staff images machine, affixes hardware asset tag, gives to user and updates inventory record
- hardware is used, reallocated and inventory record is updated
- in all steps, data updates are critical (to avoid losing assets)
- for example
-
media management
- tracks highly sensitive data
- often, hardware inventory softeware can track this as well
change comes frequently in IT — which is good — but change must be controlled and managed
-
change management
- ensures that orgs follow standard procedures for…
- requesting,
- reviewing,
- approving, and
- implementing…
- …changes to their info systems
- ensures that orgs follow standard procedures for…
-
request for change (RFC)
- a formal request to make a change which includes:
- description of the change
- expected impact
- risk assessment
- rollback steps
- identification of those involved in the change
- proposed schedule
- affected configuration items (CIs)
- a formal request to make a change which includes:
-
changed made in an org should be approved by relevant authorities
- might be a user’s
- can include a change advisory board (CAB)
-
routine changes may be pre-approved (ex. rotating out tape backups)
tracks specific device and system settings
-
baseline
- snapshot of a configuration
- can be used to identify changes to a system
- compare the system’s current state to the baseline and note any differences
- automation allows for alerts in changes that deviate from baselines
-
versioning
- assigns a number to each version
- ex. #.##.##, version.major.minor
- often used in software development
- assigns a number to each version
-
diagrams also serve as an important configuration artifact
-
should standardize configurations
- naming conventions
- IP address scheming
-
ultimate goal of change and configuration management is to help ensure a stable operating system
- limits info access
- having a clearance to a certain level of information doesn’t entitle someone access to all of it
- access is given w/a valid reason
- common in military and government
- limits systems permissions to those needed for job function
- implmenting in the real world can be burdensome
- emergency access procedures reduce business impact
-
no individual should possess permissions that when combined allow them to perform a highly sensitive action
- ex. accountant creating a new vendor and cutting checks to that vendor
-
infosec pros are often called on to create controls for separation of duties
-
infosec pros are often the subject of separation of duties
- example: a developer can’t create code and deploy it to a production system
-
aka dual control
-
requires authorization of two individuals to perform a sensitive action
- examples
- missile launches
- checks that require two signators
- examples
-
separation of duties and two person control reduce the likelihood of fraud
- must collude to commit fraud
safeguard admin accounts
-
password vaulting
- store admin password
- may remote into a server w/ admin account username and password
- prevents owner of admin account from even knowing password
- may provide just-in-time access
-
command proxying
- eliminates need for direct server access
- PAM system sends commands to services/servers as the admin account
-
monitoring
- logs admin account activity
-
credential management
- rotates passwords and keys
-
PAM solutions will need to provide emergency access workflows
-
sudo
super user do- allows users to temporarily assume admin rights
- use should be minimized
-
provides structure during cybersecurity incidents
-
describes the policies and procedures governing cybersecurity incidents
-
piss poor planning yields piss poor performance
- prior planning → strong incident response
- failure to plan → disaster
-
statement of purpose
-
strategies and goals
-
approach to incident response
-
communications w/ other groups
-
senior management approval
-
NIST SP 800-61 can be used for guidance in develpoing an org’s plan
- must have personnel available 24/7
- should have diverse membership
- various parts of the org
- management
- infosec
- SMEs
- legal
- PR
- HR
- physical security team
- IR service providers
- compliment an IR team
- can provide critical resource for things not supported by the org
- example: forensics expert
- contract should be worked out in advance
- don’t want to be negotiating a contract mid-incident
- various parts of the org
-
critical component of stakeholder mangement
-
ensures that all participants have the right info at the right time
-
external communications should be limited to trusted parties
- info going public can be bad
- bad PR
- jeopardizes the investigation
- info going public can be bad
-
law enforcement involvement requires careful consideration
- should consult w/ legal team
-
legal team will also ensure that the org is in compliance
- legislative requirements
- regulatory notification requirements
-
secure communications
- use to prevent inadvertant leaks
- monitoring is critical to effectively identify incidents
- IDS/IPS
- firewalls
- authentication systems
- vulnerability scanners
- systems event logs
- netflow records
- antimalware software
- SIEM systems help in this
-
first reports of an incident may also come from external sources
- employees
- customers
- websites
-
there should be a method for receiving external reports
-
strategic intelligence programs (ISACs) facilitate incident indentification efforts
-
counterintelligence hinders adversaries ability to gather information
-
first responders need to act quickly after identifying an incident
- may isolate systems
- may quarantine systems
- may isolate systems
exam tip:
the highest priority of a first responder is to damage containment.
- after initial containment, move to escalation and notification
- evaluate severity
- based on impact
- esaclate resonse to appropriate level
- notify mangement and other stakeholders
-
low impact
- minimal potential to affect security
- normally handled by the first responder
- doesn’t require an after-hours response
-
moderate impact
- significant potential to affect security
- triggers reaction from incident response team
- prompt notification of management
-
high impact
- may cause critical damage to security
- demands full mobilization fo incident response team
- requires immediate full response
- requires immediate notification of senior management
-
first responders must have the ability to activate the full incident response process
- controls damange and loss to the org through containment
- use criteria to chose the best containment option for the org
- goal is to balance business needs and security
- need to use best judgement — there is no easy “right” answer
- damage potential
- evidence preservation
- service availability
- resource requirements
- expected effectiveness
- solution timeframe
-
attackers may detect containment actions
-
mitigation ends w/ stability
- business functions w/out danger to operations
- containment
- limits damage to confidentiality, integrity and availablilty
- common network security technique
- moves infected systems to a quarantine network
- moves infected systems to a network that is disconnected from the internal network
-
completely disconnects infected systems from network communications
-
orgs should select the most appropriate containment strategy for the situation
-
trade-off decision
- need to continue the investigation
- need to prevent further damage to systems
- need to prevent disruptions to the org
-
eradication
- removing all traces of an incident
-
recovery
- restoring normal business functions
-
eradication and recovery go hand-in-hand and it’s often difficult to say which is which in regards to an action
-
attackers compromise systems
- may not know to what extent or how
- affected systems shoud be rebuilt to avoid future issues
- prevents the later use of backdoors
-
security issues that lead to the incident need to be corrected
-
when recovering, look at the following
-
endpoint security controls
- application white-/blacklisting
- quarantine controls
- access controls
-
enterprise security controls
- firewall rules
- mobile device policies
- DLP systems
- URL and content filters
- updating / revoking certificates
-
-
prevents confidential info leakage
-
clearing
- overwrites data with new data
- frustrates casual analysis
-
purging
- more advances techniques, deguassing
- frustrates laboratory analysis
- storage media is unusable by normal means
-
destroying
- media is obliterated and cannot be recovered
- impossible to analyze
-
use NIST flowchart to select most appropriate sanitization
-
physical destruction is the only true way to ensure that data has been deleted
-
deleting / formatting will never be the answer
- final act of containment, eradication and recovery
- verify security configurations of all systems
- run vulnerability scans
- review account and permission reviews
- verify that systems are logging
- verify that logs are being sent to SIEM
- validate that capabilities and services have been restored successfully
-
reflect on incident response
-
offer feedbak to improve future incident response
-
can use a trained facilitator
- neutral party
- played no part in the incident response
-
time is critical
- people tend to forget what they did relatively quickly
-
example lessons learned questions
- how well did staff perform?
- were processes followed
- were processes adequate?
- did anything inhibit the recovery?
- what should be done differently next time?
- etc.
-
create a report w/ lessons learned and recommendations
- be sure to follow change management when implementing any recommended changes
-
incident summary report
- describes response efforts
- useful for future incidents and future training
-
need to comply w/ org policies
-
need to comply w/ legal requirements
- consult w/ legal team
-
store evidence in a secure manner
- ensure that chain of custody is maintained
- note indicators of change found from incident
- incorporate into monitoring systems
-
read-through
- team members review the incident response plan and their roles individually
- provide feedback
-
walk-through
tabletop exercise- teams gather for a review, and talk through the plan
- provide feedback
-
simulation
- teams gather and go through a practice secenario
- provide feedback
-
testing strategies frequently use a combination of all test types
-
the physical safety of employees is always top priority
exam tip:
watch for questions that ask to prioritize business operations over human life. -
isolated employees should be monitored to ensure their safety
- employees working overnight
- employees working alone in a SOC, NOC, etc.
- detective controls —such as CCTV — can be used for this monitoring
-
travelling employees should also be monitored to ensure their safety
-
panic buttons
- silently alerts security to a dangerous situation when pressed
-
duress codes
- codes that appear to function normally, but also trigger a safety response
-
emergency management plans should be based upon risk management
-
fire plans
- evacuation prcedures
- fire department nofitication
- accounting of all personnel
- regular fire drills
-
weather emergency plans
- dependent on site location
- blizzards
- floods
- wildfires
-
tornado plans
- include sheltering instructions
- regular drills
-
lock down plans
- in the event of workplace violence
-
physical security is important to protecting info and systems
-
data centers contain massive amounts of sensitive data and computing resources
-
server rooms
- usually less secure than data centers
- often grow organically in small orgs
-
media storage locations
- especially if media / backups are stored off-site
- locations should have at least equal — if not better — physical security than a data center
-
evidence storage locations
- chain of custody must be preserved
-
wiring closets
- unauthorized access can result in eavesdropping and network device tampering
- distribution cabling should be protected as well
-
operations centers and other sensitive areas
- data centers have significant cooling requirements
- excessive heat can reduce the life of equipment
- old school data centers used to be very cool
- great expense to the org and environment
- equipment is now less sensitive
- explanded environmental envelope
- 64.4°F – 80.6°F
-
HVAC systems keep temperature and humidity in control
-
hot aisle/cold aisle
- servers draw in cool air from the front and expel hot air out the back
- using this idea, one can line up server racks back to back, creating cool air aisles and hot air aisles
- watch for questions that indirectly ask about hot aisle/cold aisle strategies
natural disasters put data centers at risk
- fire is a grave threat
- fire requires:
- oxygen
- heat
- fuel
- depriving a fire of any of these three requirements will extinguish it
class | type | examples |
---|---|---|
A | common combustibles | wood, cloth, trash, paper, etc. |
B | flammable liquids | gasoline, kerosene, oil |
C | electrical | wiring, server racks |
D | heavy metal | iron, nickle, Metallica |
K | kitchen | fats, oil, grease |
- labels on fire extinguishers contain info about the class and type of fires it can extinguish
- be able to identify fire extinguisher classes
- building-wide fire suppression systems
-
wet pipe approach
- contain water in pipes that are ready to deploy in a fire
- can be dangerous to data centers if they leak
-
dry pipe approach
- contain pipes that only fill if a valve opens during a fire alarm
-
chemical fire suppression systems
- deprive a fire of oxygen of fire
- dangerous to humans!
-
-
fire detection sensors
- temperature sensors
- smoke detectors
- incipient detectors
-
moisture sensors
- data centers should be protected against the risk of flooding
- natural
- flood plains, location w/in the building
- man-made
- burst pipes, etc.
- consider layout of pipes w/in building if possible
- natural
- generated by all electrical equipment
- can interfere w/ other equipment
- can be used by attackers to eavesdrop
- faraday cages can protect against EMI
-
locks
-
restrict access through a portal (i.e. a window or door)
-
preset lock
- use a hardware lock
- need correct key to open
- should use key management to keep track of keys
-
cipher lock
- use a physical or electronic keypads
-
biometric locks
- use a person’s physical features
- fingerprint, voice, retina
- use a person’s physical features
-
card-based locks
- use a card
- magstripe, RFID, smart
- use a card
-
-
tailgating
- following another authorized user into an area
-
mantraps *
-
remember to carefully maintain ACLs!
- use motion and noise detection systems
- video surveillance systems
- act as deterrent and detective controls
- IR video may be useful in dark environments
- can play an important role in investigations
- fences can block traffic on foot or vehicles
- bollards can allow foot traffic but protects entrances from vehicles
- cages can be used to protect equipment
- important in shared data centers
- lighting increases intruder detection and acts as a deterrent
- signage can provide legal recource
- inducstrial camoflauge
- useful for making data centers non-descript
- drones and UAVs make it important that buildings are camoflauged from the ground and from the air
- visitor management procedures protect against intrusions
- visitor procedures should
- describe allowable visit purposes
- explain visitation approval authority
- describe requirements for unescorted access
- role of vistor escorts
- all visitors should be logged
- all visitors should be idendified w/ distinct badges
- if necessary badges should include “ESCORT REQUIRED”
- cameras can provide extra monitoring of visitor areas
-
security guards are important to physical security
- receptionists can act as security guards
- menacing looking guards can also be used
-
robotic sentinels may be used in place of humans
-
two-person integrity
- requires two people to enter a sensitive are together
- discourages malicious activity in that area
- requires collusion with other person
- think of two people needing to enter a bank vault
-
two-person control
- aka dual control
- requires authorization of two individuals to perform a sensitive action
- examples
- missile launches
- checks that require two signators
- examples
- event: change in state
- incident: series of events that has a negative impact on an organization or their security
- incident response focuses on containing damage and restoring normal operations
- minimize damage, minimize downtime!
- investigations focus on the gathering evidence of the attack with a goal of prosecution
- framework
- response capability
- incident response / handling
- recovery and feedback
- corporate incident response policy, procedures and guidelines should be in place
- legal, HR, senior management, key business units must be involved
- if in-house, incident response team should be in place
- incident response team should have:
- list of agencies and resources to contact / report to
- list of experts to contact
- steps for searching for, securing and preserving evidence
- list of items to include on reports
- lists of how items on various systems should be treated
- incident response team should have:
- triage
- detect
- identify
- notify
- investigate
- contain
- analyze and track
exam tip:
unless specified by an exam question, always assume that you are not on the incident response team.
you should report, contain, and not touch / interfere!
- restore system to operations
- provide greater security afterwards
- most important
- often overlooked
- document, document, document!
- discipline of proven methods of collection, preservation, validation, identification, analysis, interpretation, documentation and presentation of digital evidence
- forensic principles must be followed
- can’t alter evidence as doing so can make it inadmissible in court
- individuals dealing with original evidence should be trained in evidence handling
- all activity with evidence should be documented — chain of custody
- individuals are responsible for all of their actions while in possession of evidence
- evidence must be:
- authentic
- accurate
- complete
- convincing
- admissible
- Locard’s Theory of Exchange
- for everything taken, something is left behind
- what’s left behind can help identify the attacker
- chain of custody must be well documented
- this is a history of how the evidence was:
- collected
- analyzed
- transported
- preserved
- necessary because digital evidence can easily be manipulated
- this is a history of how the evidence was:
- hashing algorithms are used during process to show that data hasn’t been changed
-
document!
-
minimize movement / handling
-
work on copies
-
work from most volatile to least
- CPU caches → RAM → HDD
-
capture an accurate image of the system
- need three hashes
- hash of original hard drive from read-only system
- hash of bit-level copy of hard drive used for analysis
- hash of analyzed data from the copy
- all three hashes need to match
- need three hashes
-
steps to collect evidence
- photo area, record what is on screens
- dump contents of memory
- power off system
- photo interior of system
- label all pieces of evidence
- record location, who collected, how collected, date and time
- have legal and HR involved
-
Forth Amendment considerations
- protects citizens from illegal searches and seizures
- only applies to law enforcement or those acting on their behalf
- citizens may also be subject to the Electronic Communications Privacy Act
- police can only gather evidence with:
- subpoena
- search warrants
- voluntary consent
- exigent circumstances
- yields data (analysis yields information)
- document what is seen
- look for signatures of known attacks
- review audit logs
- perform hidden data recovery
- yields information (examination yields data)
- what’s the root cause?
- what files were altered?
- what files / applications were installed?
- what communications channels were used?
- interpret results of investigation and present in an appropriate format
- document findings
- provide expert testimony
- result of the investigation
- what to do with suspects?
- corrective action
- legal response?
- direct:
- can prove a fact
- doesn’t need backup information
- information provided by the five senses of a reliable witness
- real:
- physical evidence
- objects used in a crime / objects left behind at crime scene
- best:
- most reliable
- signed contract
- secondary
- supporting evidence
- expert opinion / testimony
- corroborative:
- supporting evidence
- doesn’t stand on its own
- circumstantial
- x
- hearsay
- secondhand written / spoken testimony
- usually not admissible
- demonstrative
- presentation-based
- diagrams, x-rays, demonstations
- presentation-based
- do they have the available skills to perform an investigation?
- bound by Forth Amendment, jurisdiction, Miranda rights, privacy laws
- more restricted than a citizen investigator
- information is not controlled by the organization
- redundant hardware
- available in the event that the primary device in unusable
- often associated with HDD
- hot
- warm
- cold
- SLAs
- MTBF and MTTR
- disk striping
- writes to both disks
- no redundancy or fault tolerance
- provides performance improvements
- primary server mirrors a secondary server
- if the primary fails, roll over to the secondary
- provides server fault tolerance
- group of servers that act as a single system
- looks like a single server to users
- provides high availability
- provides scalability
- easier to manage
- may provide redundancy, load balancing or a combination of both
- active/active
- active/passive
- provide temporary battery-based power to systems in event of power loss
- usually long enough for systems to execute a graceful shutdown
- good features
- long battery life
- remote diagnostic software
- surge protection and line conditioning
- spike
- sag
- brown out
- EMI / RFI filtering
- allowance for shutting down of systems attached to it
- issues to consider
- load that the UPS / battery can support
- battery duration
- speed to take load during outage
- physical space needed
- backups of software and data
- having backup hardware is a large part of network availability
- important to be able to restore data if:
- HDD fails
- disaster takes place
- software corruption
- archive bit reset
- backs up everything
- takes a lot of time and space
- impractical to do daily
- usually done on weekends
- archive bit reset
- backs up all files that have been modified since last backup
- restoration process is to restore all backups
- full → incremental → incremental
- archive bit not reset
- backs up all files that have been modified since last full backup
- restoration is to restore two backups
- full → differential
- eliminate single points of failure — lol
- cross train
- job rotations
- train and educate
- BCP
- long term focus
- focus on sustaining operations and protecting the business
- umbrella term that includes many other plans, including DRP
- DRP
- short term focus
- often IT focused
- goal is to minimize effects of disaster and to take steps to resume operations in a timely manner
- deals with immediate aftermath of disaster
- BCP acts as a safety net for risk management
- BCP acts in the event that risk mitigation steps in risk management fail
- risk management: covers identified risks
- BCP: covers gaps after
- non-disaster
- an inconvenience (disruption of service, device malfunctions)
- emergency / crisis
- urgent immediate event where there is a potential for injury / loss of life / property
- disaster
- entire facility is unusable for longer than a day
- catastrophe
- destroys a facility
- organization should be prepared for each category
- emergency declaration: anyone should be able to make the call (i.e. pull a fire alarm)
- disaster declaration: BCP coordinator makes the call
- standards help with inconsistancy in terms, definitions, documents
- organizations and guidance on BRP / DRP:
- DRII
- NIST SP 800-34 rev1
- ISO 27031
- BCI GPG
- (ISC)² Four Processes of Business Continuity
(create graphic)
- acquire BCP Policy Statement from senior mangement
- conduct Business Organizational Analysis: structured analysis of business’s organizational assets
- create BCP team
- should be cross-functional
- includes a project manager
- includes HR and legal reps.
- analyze legal and regulatory issues related to the organization’s response to a catastrophic event
- provides foundation for rest of BCP process
- provides a groundwork to identify members of BCP team
- evaluates:
- which departments responsible for core services
- what critical support services are needed
- senior management and other key individuals essential to the viability of the organization
- gather representatives from:
- departments responsible for core services
- key support departments identified by organization analysis
- IT staff
- security staff
- legal
- HR
- senior management
-
BCP development
- BCP team will need resources to perform all four steps of BCP process
- major resources needed:
- effort of BCP team
- assistance of supporting teams called to help with development of the plan
-
BCP testing, training, maintenance
- will require some hardware and software
- main resource is still manpower of employees involved
-
BCP implementation
- in the event of full-scale BCP being conducted:
- significant use of resources and manpower
- BCP will become focus of most—if not all—of the organization
- BCP team will use resources judiciously, yet decisively
- in the event of full-scale BCP being conducted:
- senior management:
- has the ultimate legal responsibility
- may be:
- held responsible / liable under various laws and regulations
- sued by:
- stockholders, if due care / diligence is not used in managing
- employees / families, in the event of injury / death
- legal and financial repercussions are a major way to attain senior management buy-in
- identifies and prioritizes all business processes / resources based on criticality
- risk identification
- internal vs. third party assessment
- probability and impact
- defines quantitative metrics to assist with prioritizing recovery focus
- BIA helps prioritize recovery priorities
- create a list of business procedures and their impact on the organization
- often delegated to departments for accuracy and buy-in
- criticality driven by the amount of loss to the organization if a resource is unavailable
- maximum tolerable downtime / maximum tolerable outage (MTD / MTO)
- longest time a function can be down before causing a loss that’s unacceptable to senior management
- recovery time objective (RTO)
- estimated time to recover a function in the event of a disruption
- should be less time than MTD / MTO
- recovery point objective (RPO)
- tolerance for data loss
- evaluate a CSP’s BCP
- examine SLAs
- verify that controls are in place in person or through a third party audit (SOC)
Reports On | Visibility | |
---|---|---|
SOC 1 | financial reporting | private |
SOC 2 | security and technology | private |
SOC 3 | security and technology | public |
-
asset value × probability × impact = total risk
-
total risk × controls gap = residual risk
-
AV = asset value
-
probability = ARO
-
impact = EF
-
SLE = single loss expectancy
-
ALE = annual loss expectancy
-
remember that some losses can’t be quantified (i.e. loss of reputation)
- qualitative analysis can be used to prioritize risk
- quantitative analysis is needed to…
- perform cost/benefit analysis
- justify mitigation steps
- examine BIA for metrics and to map controls to meet objectives
- determine appropriate responses to risk
- reduce
- assign / transfer
- accept
- reject
- some risk will have to be accepted, while others require an active strategy
- designs specific procedures to mitigate risk to a level acceptable to senior management
- three assets types
- people — always the first priority
- buildings / facilities
- hardening provisioning: mitigating harm to a building / facility
- alternate sites
- mirrored site
- leased sites
- cold site
- warm site
- hot site
- infrastructure
- redundancy of critical systems and services
- recovery strategies
- failover / failback
- failover: moving to secondary device/server/system
- failback: resuming operations of primary
- mirrored site: dedicated site owned/operated by the organization
- reciprical agreement with an internal/external entity
- commercially leased site:
- cold
- warm
- hot
- MOAs or SLAs
Cold Site | Warm Site | Hot Site |
---|---|---|
secondary location | secondary location | secondary location |
equipment at location | equipment at location | |
connectivity at location | connectivity at location | |
active before failover | ||
outage measured in weeks | outage measured in day/hours | outage measured in hours/minutes |
- supports critical elements of business
- servers
- systems
- routers
- switches
- processes
- architecture
- high availability
- redundancy
- resiliency
- fault tolerance
- hardened systems
- if possible, approved by CEO or other C-level individual
- indicates the business’s dedication to BCP
- create implementation guide/schedule
- deploy resources
- supervise implementation plan
- distribute plan on need-to-know basis
- everyone should get an overview
- three main purposes
- protect
- recover
- sustain
- Crisis Communications Plan
- dissemination of necessary information
- Occupant Emergency Plan (OEP)
- procedures for minimizing injury, loss of life, property damage in the event to an emergency
- Business Recovery (/Resumption) Plan (BRP)
- procedures for business operations after a disaster
- Continuity of Support / IT Contingency Plan
- procedures for recovering major applications or general systems
- Cyber Incident Response Plan
- procedures to detect, respond to and limit consequences of a cyber incident
- Disaster Recovery Plan
- procedures to recover capabilities at an alternate site
- Continuity of Operations Plan (COOP):
- senior executive management
- consistent support
- final plan approvals
- prioritization of critical business functions
- allocation of resources and personnel
- oversight and approval of BCP
- directing and reviewing test results
- ensuring maintenance of current plan
- senior functional management
- develop and document maintenance and testing strategies
- identify and prioritize mission critical systems
- monitor plan development and execution
- ensure that periodic testing takes place
- create various teams needed to execute BCP
- BCP steering committee
- conduct BIA
- coordinate with department reps
- should include:
- business units
- senior management
- IT
- secuirty
- communciations
- legal
- HR
- management should appoint members
- each member must:
- undersand goals of the BCP
- be familiar with the department they are responsible for
- agree before event:
- who talks to the media, customers and stakeholders
- whoever is trained in communications — can be PR rep, doesn’t have to be the CEO
- who will set up alternate communications methods
- who will set up offsite facility
- who will work on the primary facility
- who talks to the media, customers and stakeholders
- copies of plans given to departments
- functional managers review
- no risk associated with test
- don’t get a good assessment
- department reps sit down and go through the plan together
- more like a talk-through, am I right?
- no risk associated with test
- don’t get a good assessment
- go through a disaster scenario
- continues up to the point of actually moving to a secondary site
- some systems are moved to the alternate site and processes work from alternate site
- functionality still remains at primary site
- original site is shut down
- all processes moved to alternate site
- after a test or disaster:
- focus on improvement
- what should have happened and didn’t
- what went well
- what happens next / what can be improved
- not a blame game — not productive use of review
- focus on improvement
- keep plan up-to-date
- make it a part of business meetings and decisions
- centralize responsibility of updating the plan
- make the plan a part of
- job descriptions
- personnel evaluations
- report on BCP status regularly
- if the plan is revised, original copies of the plan should be retrieved and destroyed
- you don’t want to work off of an old plan