CISA Module 4 of 5 - M A Fazal & Co.

CISA Module 4 – IS Operations & Business Resilience | Study Guide

📑 Table of Contents

01IT Operations Management & Controls 02IT Service Management (ITSM) & ITIL 03Hardware, Network & Infrastructure 04Capacity & Performance Management 05IT Asset & Configuration Management 06Backup, Recovery & Data Management 07Business Continuity Planning (BCP) 08Disaster Recovery Planning (DRP) 09Physical & Environmental Controls 10Cloud Operations & Resilience 11Problem, Incident & Event Management 📝Interactive MCQ Bank (103 Questions)

IT Operations Management & Controls

▼

IT Operations Overview

IT Operations encompasses the daily activities required to deliver reliable, secure, and efficient IT services. The IS auditor evaluates whether operations controls are adequate to ensure availability, integrity, and confidentiality of IT services and data.

Key IT Operations Functions

Function	Description	Key Controls
Job Scheduling	Automated scheduling of batch jobs, scripts, and processes	Scheduling authorization, error handling, restart/recovery procedures
Output Management	Controlling and distributing system outputs (reports, print jobs)	Output distribution lists, sensitivity labels, secure disposal
Storage Management	Managing disk, tape, SAN, and cloud storage resources	Capacity monitoring, tiered storage, retention policies
Print Management	Controlling physical output and sensitive document handling	Secure print release, printer access controls, shredding policies
Help Desk / Service Desk	First point of contact for user IT issues and requests	Ticket logging, SLA adherence, escalation procedures
System Monitoring	Real-time monitoring of systems, performance, and security events	Alerting thresholds, on-call procedures, SIEM integration

Segregation of Duties in IT Operations

Computer operators should NOT perform system programming
Operators should NOT have access to modify production programs
Operators should NOT perform security administration
No single operator should have unrestricted access to all systems
Job scheduling changes should require separate authorization

Operator Activity Logs

All operator activities on critical systems must be logged. IS auditors review logs to detect:

Unauthorized commands or system access
Bypassing of normal job scheduling
Access to sensitive data outside normal job duties
Overriding of system controls or error messages

IT Operations Audit Procedures

Review operator logs for unusual activity and unauthorized commands
Verify job scheduling is authorized and monitored for failures
Test restart and recovery procedures for critical jobs
Verify output is distributed only to authorized recipients
Review shift handover procedures for continuity of operations
Verify sensitive output (payroll, financial reports) is controlled and disposed of securely

🎯 CISA Key Point: Operator logs are critical audit evidence. The IS auditor should review logs for completeness, evidence of tampering, and whether logs are reviewed regularly by management.

IT Service Management (ITSM) & ITIL

▼

ITIL 4 Service Management Practices

ITIL 4 replaces processes with 34 practices across three categories:

Category	Key Practices
General Management	Risk management, information security management, continual improvement, knowledge management, portfolio management
Service Management	Incident management, problem management, service desk, change enablement, service level management, availability management, capacity & performance management, IT asset management, monitoring & event management, release management, service continuity management
Technical Management	Infrastructure & platform management, software development & management, deployment management

Service Level Management

SLA (Service Level Agreement): Agreement between IT and the business customer on service levels
OLA (Operational Level Agreement): Internal IT team-to-team agreement supporting an SLA
UC (Underpinning Contract): External vendor contract supporting SLA delivery

Availability Management

Key availability concepts the IS auditor must know:

Term	Definition / Formula
Availability %	(Agreed Service Time − Downtime) / Agreed Service Time × 100
MTTR	Mean Time To Repair — average time to restore a failed service
MTBF	Mean Time Between Failures — average time between service failures
MTTF	Mean Time To Failure — average time until first failure (non-repairable)
MTBSI	Mean Time Between Service Incidents — includes all incidents

🔵 Exam Formula: Availability = MTBF / (MTBF + MTTR). Higher MTBF and lower MTTR = higher availability. To improve availability: increase MTBF (prevent failures) or decrease MTTR (faster recovery).

Continual Service Improvement (CSI)

ITIL's approach to continuously improving services using the CSI register and the 7-Step Improvement Process: Define → Measure → Gather → Process → Analyze → Present → Implement improvements.

Service Catalogue Management

The service catalogue documents all IT services offered, their descriptions, SLAs, and dependencies. IS auditors verify the catalogue is maintained, accurate, and used for service request management.

Hardware, Network & Infrastructure Controls

▼

Network Architecture & Security Zones

Zone / Component	Purpose	Key Controls
DMZ (Demilitarized Zone)	Hosts public-facing services (web, email, DNS) between internet and internal network	Dual firewalls, no direct internet-to-internal traffic
Internal Network	Core business systems and user workstations	Firewall, IDS/IPS, NAC, VLAN segmentation
Management Network	Out-of-band management of IT infrastructure	Restricted access, separate VLAN, strong authentication
Guest Network	Internet access for visitors/contractors	Isolated from internal network, captive portal, bandwidth limits

Network Security Controls

Firewall: Packet filtering, stateful inspection, next-generation (application-aware) firewalls
IDS (Intrusion Detection System): Monitors and alerts on suspicious traffic — passive, detection only
IPS (Intrusion Prevention System): Monitors and actively blocks suspicious traffic — inline, active prevention
WAF (Web Application Firewall): Protects web applications from OWASP Top 10 attacks (SQLi, XSS)
VPN (Virtual Private Network): Encrypted tunnels for remote access (IPsec, SSL/TLS)
NAC (Network Access Control): Validates device compliance before granting network access
Proxy Server: Intermediary for outbound web traffic; content filtering and caching
SIEM (Security Information & Event Management): Aggregates, correlates, and analyzes security logs

Network Protocols & Audit Considerations

Protocol	Use	Audit Note
HTTPS / TLS	Encrypted web communication	Verify TLS 1.2+ enforced; certificates valid and managed
SSH	Secure remote server management	Preferred over Telnet; verify key management practices
SNMP v3	Network device monitoring	v1/v2 are insecure; verify v3 with authentication is used
DNS	Domain name resolution	DNSSEC, split DNS, DNS logging for threat detection
NTP	Time synchronization	Accurate time is critical for log correlation and audit trails

Wireless Network Security

WPA3 is the current standard (WEP and WPA are insecure)
Enterprise wireless uses 802.1X with RADIUS authentication
Rogue access point detection is critical
Guest WiFi must be isolated from corporate networks
Regular wireless security assessments (site surveys)

Hardware Controls

Redundant hardware: RAID, redundant power supplies, dual NICs, clustering
Hardware lifecycle: Procurement, inventory, maintenance, secure disposal (data sanitization)
Server hardening: Disable unused ports/services, apply patches, remove default accounts
Endpoint controls: EDR, full disk encryption, USB port control, patch management

⚠️ CISA Exam Point: NTP (time synchronization) accuracy is a foundational control — inaccurate system clocks make log correlation impossible and invalidate audit trails used in forensic investigations.

Capacity & Performance Management

▼

Capacity Management

Capacity management ensures IT infrastructure can meet current and future business demand in a cost-effective manner. Three sub-processes:

Business Capacity Management: Translates business requirements into future IT capacity needs
Service Capacity Management: Ensures services deliver agreed performance levels
Component Capacity Management: Monitors and optimizes individual components (CPU, memory, disk, network)

Capacity Planning Process

Monitor current utilization and performance baselines
Analyze trends and forecast future demand
Model scenarios (organic growth, new applications, acquisitions)
Identify capacity gaps and thresholds
Plan infrastructure additions or optimizations
Review and update the Capacity Plan regularly

Performance Metrics

Metric	Description	Threshold Action
CPU Utilization	Percentage of processor capacity used	Alert at 80%; action at sustained 90%+
Memory Utilization	RAM usage and paging/swapping frequency	Excessive paging indicates memory pressure
Disk I/O	Read/write throughput and latency	High latency impacts application response time
Network Bandwidth	Traffic volume vs. available capacity	Congestion causes packet loss and latency
Response Time	Time for system to respond to a user request	Compare against SLA targets
Throughput	Number of transactions processed per unit time	Degradation may indicate bottlenecks

Performance Monitoring Tools

SNMP-based network monitoring (Nagios, PRTG, SolarWinds)
Application Performance Monitoring (APM) — Dynatrace, New Relic, AppDynamics
Log aggregation and analytics — Splunk, ELK Stack
Cloud-native monitoring — AWS CloudWatch, Azure Monitor, Google Cloud Operations

IS Auditor's Review of Capacity Management

Is a formal capacity plan documented and reviewed regularly?
Are performance baselines established and monitored?
Are alert thresholds defined and acted upon?
Does the capacity plan incorporate business growth projections?
Is capacity data used to support IT investment decisions?

IT Asset & Configuration Management

▼

IT Asset Management (ITAM)

ITAM tracks and manages IT assets (hardware and software) throughout their lifecycle to optimize utilization, control costs, and ensure compliance.

Lifecycle Stage	Key Controls
Procurement	Approved vendor list, purchase authorization, receiving verification
Deployment	Asset tagging, inventory recording, configuration baseline applied
In-Use / Maintenance	Patch management, license compliance, periodic physical verification
Disposal	Data sanitization (NIST SP 800-88), destruction certificates, asset deregistration

Configuration Management Database (CMDB)

The CMDB is the authoritative repository of all Configuration Items (CIs) and their relationships. It is the foundation for change management, incident management, and capacity planning.

Configuration Item (CI): Any component that needs to be managed (servers, apps, network devices, services, documents)
CI Attributes: Owner, version, status, location, relationships to other CIs
Configuration Baseline: Approved configuration state at a specific point in time
Configuration Audit: Verification that actual configurations match CMDB records

Software License Management

Maintain accurate inventory of licenses purchased and deployed
Types: perpetual, subscription, concurrent, per-seat, per-CPU
Software Asset Management (SAM) tools automate discovery and compliance
Under-licensing risk: fines and legal exposure
Over-licensing risk: unnecessary costs

Patch Management

Timely patching is one of the most effective security controls. IS auditors review:

Patch identification: subscriptions to vendor advisories and CVE feeds
Patch assessment: risk rating, applicability testing
Patch deployment: testing in non-production → production via change management
Patch verification: confirmation patches applied successfully
SLAs: critical patches within 24-72 hours; high within 7-14 days; medium within 30 days

🎯 CISA Key: The CMDB is only valuable if it is kept accurate and up-to-date. Outdated CMDB records undermine incident resolution, change impact analysis, and capacity planning — making configuration audits essential.

Backup, Recovery & Data Management

▼

Backup Types

Type	What Is Backed Up	Restore Time	Storage
Full Backup	All data every time	Fastest restore (single backup set)	Highest storage use
Incremental Backup	Only data changed since last backup (full OR incremental)	Slowest restore (need full + all incrementals)	Lowest storage use
Differential Backup	All data changed since last FULL backup	Medium restore (need full + last differential)	Medium storage use
Continuous Data Protection (CDP)	Every change captured in real-time	Near-zero RPO; very fast restore	Very high storage

Recovery Objectives (Critical CISA Definitions)

Term	Definition	Who Sets It
RTO (Recovery Time Objective)	Maximum acceptable time to restore a service after a disruption	Business (based on business impact)
RPO (Recovery Point Objective)	Maximum acceptable data loss measured in time (e.g., 4 hours of data)	Business (based on data value)
RCO (Recovery Consistency Objective)	How consistent the data must be after recovery (for distributed systems)	Business / IT jointly
MTPD (Maximum Tolerable Period of Disruption)	Maximum time a business function can be unavailable before unacceptable impact	Business

🔵 Key Relationship: RTO must be less than MTPD. RPO determines backup frequency — if RPO is 4 hours, backups must occur at least every 4 hours.

Backup Best Practices (3-2-1 Rule)

3 copies of data
2 different storage media types
1 copy offsite (or in cloud)
Backups must be tested regularly — untested backups cannot be relied upon
Backup media must be encrypted (especially offsite/cloud)
Backup access must be restricted to authorized personnel

Data Retention & Archiving

Retention policies must comply with legal, regulatory, and business requirements
Data classification drives retention periods (e.g., financial records 7 years, HR records varies)
Legal holds override normal retention/destruction schedules
Archived data must remain accessible for the retention period
Secure disposal at end of retention period (NIST SP 800-88 guidelines)

Data Replication

Type	Description	RPO
Synchronous Replication	Write committed to primary and secondary simultaneously	Near-zero (no data loss)
Asynchronous Replication	Write committed to primary first; secondary updated after	Seconds to minutes of potential data loss

Business Continuity Planning (BCP)

▼

BCP Overview

Business Continuity Planning (BCP) ensures an organization can continue critical business functions during and after a disruptive event. BCP is broader than DRP — it covers all business functions, not just IT.

✅ Key Distinction: BCP covers ALL business functions (people, processes, facilities, suppliers). DRP is the IT subset of BCP focused on recovering IT systems and infrastructure.

BCP Development Process

Project Initiation: Management commitment, scope definition, team formation
Business Impact Analysis (BIA): Identify critical functions, dependencies, RTO, RPO, MTPD
Risk Assessment: Identify threats, vulnerabilities, and likelihood of disruptions
Strategy Development: Select recovery strategies for each critical function
Plan Development: Write BCP documenting procedures, responsibilities, and resources
Testing & Exercises: Validate the plan through various test types
Maintenance & Review: Update plan regularly and after significant changes

Business Impact Analysis (BIA)

The BIA is the foundation of BCP — it identifies what matters most and how quickly it must be restored. BIA outputs:

List of critical business functions ranked by priority
RTO and RPO for each function
MTPD (Maximum Tolerable Period of Disruption)
Resource requirements (people, systems, data, facilities)
Internal and external dependencies
Financial impact of disruption over time

⚠️ CISA Exam Hot Topic: The BIA is performed BEFORE risk assessment and strategy development. The BIA tells you WHAT to protect; risk assessment tells you WHAT threats to protect against. BIA is the starting point.

BCP Testing Types

Test Type	Description	Disruption Risk
Document Review / Checklist Test	Review plan for completeness and currency	None
Structured Walkthrough (Tabletop)	Team verbally walks through scenarios — discussion-based	None
Simulation Test	Realistic scenario simulated; teams respond as if real (no actual failover)	Low
Parallel Test	Recovery systems activated alongside production; both run simultaneously	Medium — resource intensive
Full Interruption Test	Production systems shut down; full failover to recovery systems	HIGH — most thorough; highest risk

BCP Strategies

Do nothing (accept disruption): For non-critical functions only
Manual workarounds: Paper-based processes during IT outage
Reciprocal agreements: Mutual aid agreements with other organizations (unreliable)
Third-party hot/warm/cold sites: Dedicated recovery facilities
Cloud-based recovery: Cloud infrastructure for failover

Disaster Recovery Planning (DRP)

▼

DRP Recovery Site Strategies

Site Type	Description	RTO	Cost
Hot Site	Fully equipped, powered, staffed facility with real-time data replication. Ready within hours	Hours (1-4 hrs)	Highest
Warm Site	Partially equipped facility; hardware ready but needs data restoration and configuration	Hours to days (12-72 hrs)	Medium
Cold Site	Shell facility with power and connectivity; no equipment — must be procured and installed	Days to weeks	Lowest
Mobile Site	Portable recovery facility (trailer/container) that can be deployed to any location	Days	Medium
Cloud Recovery	On-demand cloud infrastructure; scales rapidly; pay-per-use	Minutes to hours	Variable (low fixed cost)
Mirrored Site	Identical duplicate of primary site; fully synchronous; near-zero RTO/RPO	Near-zero (minutes)	Very Highest

DRP Key Concepts

Failover: Automatic or manual switch from primary to recovery systems
Failback: Return to primary systems after disaster is resolved
Switchover: Planned transition (vs. unplanned failover)
Recovery Procedures: Step-by-step instructions for restoring systems in priority order

DRP Plan Components

Disaster declaration criteria and authority (who can declare a disaster)
Emergency contact lists (staff, vendors, regulators)
System recovery priority list (based on BIA)
Step-by-step recovery procedures for each system
Data recovery procedures (backup restoration or replication failover)
Communication plan (internal and external)
Recovery team roles and responsibilities
Return-to-normal (failback) procedures

DRP Testing

IS auditors assess whether DRP testing is:

Conducted at least annually (more frequently for critical systems)
Using documented test plans and test cases
Covering all critical systems identified in the BIA
Measuring actual RTO and RPO achievement against targets
Resulting in documented test results and lessons learned
Driving plan updates to address identified gaps

🎯 Critical Audit Point: An untested DRP is not a reliable DRP. The IS auditor must verify that recovery procedures have been tested, and that actual recovery times were measured against RTO targets — not just assumed to be achievable.

High Availability (HA) Technologies

Clustering: Multiple servers share a workload; automatic failover if one fails
Load Balancing: Distributes traffic across multiple servers for performance and redundancy
RAID (Redundant Array of Independent Disks): Disk redundancy and performance (RAID 1 mirror, RAID 5 parity, RAID 10 stripe+mirror)
UPS (Uninterruptible Power Supply): Protects against short power outages; buys time for generator startup
Generator: Long-term backup power (diesel or gas)

Physical & Environmental Controls

▼

Physical Security — Defense in Depth

Physical security uses layered controls to protect IT assets and facilities:

Perimeter Security: Fencing, barriers, security lighting, CCTV, security guards
Building Access: Badge readers, key locks, visitor management, reception control
Data Center Access: Mantrap (airlock), biometrics, multi-factor authentication, access logs
Server Room / Rack: Locked cabinets, cable management, equipment tags

Data Center Environmental Controls

Control	Purpose	Target / Standard
Air Conditioning / CRAC	Maintain temperature and humidity for equipment reliability	ASHRAE: 18-27°C (64-81°F); 40-60% humidity
Fire Suppression	Detect and suppress fires without damaging equipment	FM-200, Novec 1230 (gaseous); water mist; NOT standard sprinklers near equipment
UPS (Uninterruptible Power Supply)	Continuous power during brief outages and voltage fluctuations	Sufficient capacity for graceful shutdown or generator startup
Generator	Extended backup power during prolonged outages	Regular testing; adequate fuel supply (72+ hours)
Raised Floor	Cable routing and cold air distribution underneath	Hot aisle / cold aisle containment configuration
Water/Leak Detection	Detect water ingress from flooding, HVAC condensation, pipes	Sensors under raised floor and near HVAC units

Mantrap (Airlock)

A mantrap consists of two interlocking doors where the first must close before the second can open. It prevents tailgating/piggybacking — one of the most important physical security controls for data centers.

CCTV & Physical Access Logs

CCTV provides deterrence and forensic evidence — cameras must cover all entry/exit points
Video must be retained for sufficient period (typically 90 days minimum)
Electronic access logs record who entered/exited and when — must be reviewed regularly
Physical access must be reviewed periodically — revoke access for leavers immediately

IS Auditor's Physical Control Review

Walk through the data center — observe controls in practice
Review access logs for unauthorized access or after-hours access
Verify visitor log is maintained and visitors are escorted
Test fire suppression system records and maintenance logs
Verify UPS and generator test records (frequency, load tested)
Check temperature and humidity monitoring and alert records

Cloud Operations & Resilience

▼

Cloud Service Models & Operational Responsibility

Model	Customer Manages	Provider Manages
IaaS	OS, middleware, runtime, apps, data, access	Physical, network, hypervisor, storage hardware
PaaS	Applications, data, access management	Physical, network, OS, middleware, runtime
SaaS	Data classification, access management, user activity	Everything else including application

Cloud Deployment Models

Public Cloud: Shared infrastructure (AWS, Azure, GCP) — most cost-effective; regulatory concerns
Private Cloud: Dedicated cloud for one organization — greater control; higher cost
Hybrid Cloud: Combination; critical workloads private, general workloads public
Multi-Cloud: Using multiple public cloud providers — avoids vendor lock-in; complexity
Community Cloud: Shared by organizations with common concerns (healthcare, government)

Cloud Resilience Concepts

Availability Zones (AZs): Isolated data centers within a region; independent power/networking
Regions: Geographic areas containing multiple AZs — data sovereignty compliance
Auto-Scaling: Automatically adjusts capacity based on demand
Multi-Region Deployment: Ultimate resilience — survives complete regional outage
Chaos Engineering: Intentionally injecting failures to validate resilience (Netflix Chaos Monkey)

Cloud Operations Audit Considerations

Shared Responsibility Model: Auditor must assess controls on both sides
Data Residency: Verify data stored in legally compliant jurisdictions
Cloud Access Security Broker (CASB): Visibility and control over cloud app usage
Cloud Security Posture Management (CSPM): Continuous misconfiguration detection
Provider Assurance Reports: SOC 2, ISO 27001, CSA STAR certification
Exit Strategy: Data portability, format, migration plan

Cloud-Native Resilience Patterns

Circuit Breaker: Stops cascading failures by breaking connections to failing services
Bulkhead: Isolates failures to prevent them spreading across services
Retry with Backoff: Retries failed requests with increasing delays
Health Checks & Self-Healing: Kubernetes restarts failed containers automatically

Problem, Incident & Event Management

▼

Event vs. Incident vs. Problem (ITIL Definitions)

Term	ITIL Definition	Example
Event	Any change of state with significance for service management	CPU utilization reaches 85% threshold
Incident	Unplanned interruption or reduction in quality of an IT service	Email server down; application slow
Problem	The underlying cause of one or more incidents	Memory leak causing repeated application crashes
Known Error	A problem with a documented root cause and workaround	Known bug with a documented restart workaround pending vendor patch

Incident Management Process

Detection & Logging: Incident identified and logged with timestamp, reporter, description
Classification & Prioritization: Impact × Urgency = Priority; P1-P4 typically
Investigation & Diagnosis: Identify symptoms and potential cause
Resolution & Recovery: Apply fix or workaround; restore service
Closure: Confirm with user; document resolution; update knowledge base

Major Incident Management

Separate process for high-impact incidents (P1/P2)
Major Incident Manager coordinates response
War room / crisis bridge established
Regular stakeholder updates (every 30-60 mins)
Post-Incident Review (PIR) / Root Cause Analysis (RCA) mandatory after major incidents

Problem Management

Problem management identifies and eliminates root causes to prevent incident recurrence. Two modes:

Reactive: Triggered by recurring or major incidents
Proactive: Analysis of trends to prevent future incidents

Root Cause Analysis (RCA) Techniques:

5 Whys: Repeatedly ask "why?" to drill to root cause
Fishbone / Ishikawa Diagram: Visual cause-and-effect analysis
Fault Tree Analysis: Top-down logical diagram of failure paths
Timeline Analysis: Chronological event reconstruction

Security Incident Response (NIST SP 800-61)

Phase	Key Activities
Preparation	IRP development, tools, training, communication plans
Detection & Analysis	Identify incident, classify severity, collect evidence
Containment	Short-term (isolate) and long-term containment strategies
Eradication	Remove threat (malware, attacker access, vulnerabilities)
Recovery	Restore systems; verify normal operation; monitor closely
Post-Incident Activity	Lessons learned, report, improve controls

⚠️ Evidence Preservation: During security incidents, evidence must be preserved using forensically sound methods (chain of custody, disk imaging, write blockers). Improper handling destroys admissibility in legal proceedings.

📝

Interactive MCQ Bank — 103 Questions

▼

Score: 0/0

Information Systems Operations & Business Resilience