CISA Module 4 – IS Operations & Business Resilience | Study Guide
CISA 2025 EXAM PREP — MODULE 4

Information Systems Operations & Business Resilience

Comprehensive Study Guide with 103 Interactive MCQs · Based on ISACA CISA Review Manual 2025

📋 11 Topic Areas 📝 103 MCQs ⚡ Instant Feedback 🎯 Exam-Level Difficulty 📊 Score Tracker 🔀 Shuffle Mode
01

IT Operations Management & Controls

IT Operations Overview

IT Operations encompasses the daily activities required to deliver reliable, secure, and efficient IT services. The IS auditor evaluates whether operations controls are adequate to ensure availability, integrity, and confidentiality of IT services and data.

Key IT Operations Functions

FunctionDescriptionKey Controls
Job SchedulingAutomated scheduling of batch jobs, scripts, and processesScheduling authorization, error handling, restart/recovery procedures
Output ManagementControlling and distributing system outputs (reports, print jobs)Output distribution lists, sensitivity labels, secure disposal
Storage ManagementManaging disk, tape, SAN, and cloud storage resourcesCapacity monitoring, tiered storage, retention policies
Print ManagementControlling physical output and sensitive document handlingSecure print release, printer access controls, shredding policies
Help Desk / Service DeskFirst point of contact for user IT issues and requestsTicket logging, SLA adherence, escalation procedures
System MonitoringReal-time monitoring of systems, performance, and security eventsAlerting thresholds, on-call procedures, SIEM integration

Segregation of Duties in IT Operations

  • Computer operators should NOT perform system programming
  • Operators should NOT have access to modify production programs
  • Operators should NOT perform security administration
  • No single operator should have unrestricted access to all systems
  • Job scheduling changes should require separate authorization

Operator Activity Logs

All operator activities on critical systems must be logged. IS auditors review logs to detect:

  • Unauthorized commands or system access
  • Bypassing of normal job scheduling
  • Access to sensitive data outside normal job duties
  • Overriding of system controls or error messages

IT Operations Audit Procedures

  • Review operator logs for unusual activity and unauthorized commands
  • Verify job scheduling is authorized and monitored for failures
  • Test restart and recovery procedures for critical jobs
  • Verify output is distributed only to authorized recipients
  • Review shift handover procedures for continuity of operations
  • Verify sensitive output (payroll, financial reports) is controlled and disposed of securely

🎯 CISA Key Point: Operator logs are critical audit evidence. The IS auditor should review logs for completeness, evidence of tampering, and whether logs are reviewed regularly by management.

02

IT Service Management (ITSM) & ITIL

ITIL 4 Service Management Practices

ITIL 4 replaces processes with 34 practices across three categories:

CategoryKey Practices
General ManagementRisk management, information security management, continual improvement, knowledge management, portfolio management
Service ManagementIncident management, problem management, service desk, change enablement, service level management, availability management, capacity & performance management, IT asset management, monitoring & event management, release management, service continuity management
Technical ManagementInfrastructure & platform management, software development & management, deployment management

Service Level Management

  • SLA (Service Level Agreement): Agreement between IT and the business customer on service levels
  • OLA (Operational Level Agreement): Internal IT team-to-team agreement supporting an SLA
  • UC (Underpinning Contract): External vendor contract supporting SLA delivery

Availability Management

Key availability concepts the IS auditor must know:

TermDefinition / Formula
Availability %(Agreed Service Time − Downtime) / Agreed Service Time × 100
MTTRMean Time To Repair — average time to restore a failed service
MTBFMean Time Between Failures — average time between service failures
MTTFMean Time To Failure — average time until first failure (non-repairable)
MTBSIMean Time Between Service Incidents — includes all incidents

🔵 Exam Formula: Availability = MTBF / (MTBF + MTTR). Higher MTBF and lower MTTR = higher availability. To improve availability: increase MTBF (prevent failures) or decrease MTTR (faster recovery).

Continual Service Improvement (CSI)

ITIL's approach to continuously improving services using the CSI register and the 7-Step Improvement Process: Define → Measure → Gather → Process → Analyze → Present → Implement improvements.

Service Catalogue Management

The service catalogue documents all IT services offered, their descriptions, SLAs, and dependencies. IS auditors verify the catalogue is maintained, accurate, and used for service request management.

03

Hardware, Network & Infrastructure Controls

Network Architecture & Security Zones

Zone / ComponentPurposeKey Controls
DMZ (Demilitarized Zone)Hosts public-facing services (web, email, DNS) between internet and internal networkDual firewalls, no direct internet-to-internal traffic
Internal NetworkCore business systems and user workstationsFirewall, IDS/IPS, NAC, VLAN segmentation
Management NetworkOut-of-band management of IT infrastructureRestricted access, separate VLAN, strong authentication
Guest NetworkInternet access for visitors/contractorsIsolated from internal network, captive portal, bandwidth limits

Network Security Controls

  • Firewall: Packet filtering, stateful inspection, next-generation (application-aware) firewalls
  • IDS (Intrusion Detection System): Monitors and alerts on suspicious traffic — passive, detection only
  • IPS (Intrusion Prevention System): Monitors and actively blocks suspicious traffic — inline, active prevention
  • WAF (Web Application Firewall): Protects web applications from OWASP Top 10 attacks (SQLi, XSS)
  • VPN (Virtual Private Network): Encrypted tunnels for remote access (IPsec, SSL/TLS)
  • NAC (Network Access Control): Validates device compliance before granting network access
  • Proxy Server: Intermediary for outbound web traffic; content filtering and caching
  • SIEM (Security Information & Event Management): Aggregates, correlates, and analyzes security logs

Network Protocols & Audit Considerations

ProtocolUseAudit Note
HTTPS / TLSEncrypted web communicationVerify TLS 1.2+ enforced; certificates valid and managed
SSHSecure remote server managementPreferred over Telnet; verify key management practices
SNMP v3Network device monitoringv1/v2 are insecure; verify v3 with authentication is used
DNSDomain name resolutionDNSSEC, split DNS, DNS logging for threat detection
NTPTime synchronizationAccurate time is critical for log correlation and audit trails

Wireless Network Security

  • WPA3 is the current standard (WEP and WPA are insecure)
  • Enterprise wireless uses 802.1X with RADIUS authentication
  • Rogue access point detection is critical
  • Guest WiFi must be isolated from corporate networks
  • Regular wireless security assessments (site surveys)

Hardware Controls

  • Redundant hardware: RAID, redundant power supplies, dual NICs, clustering
  • Hardware lifecycle: Procurement, inventory, maintenance, secure disposal (data sanitization)
  • Server hardening: Disable unused ports/services, apply patches, remove default accounts
  • Endpoint controls: EDR, full disk encryption, USB port control, patch management

⚠️ CISA Exam Point: NTP (time synchronization) accuracy is a foundational control — inaccurate system clocks make log correlation impossible and invalidate audit trails used in forensic investigations.

04

Capacity & Performance Management

Capacity Management

Capacity management ensures IT infrastructure can meet current and future business demand in a cost-effective manner. Three sub-processes:

  • Business Capacity Management: Translates business requirements into future IT capacity needs
  • Service Capacity Management: Ensures services deliver agreed performance levels
  • Component Capacity Management: Monitors and optimizes individual components (CPU, memory, disk, network)

Capacity Planning Process

  1. Monitor current utilization and performance baselines
  2. Analyze trends and forecast future demand
  3. Model scenarios (organic growth, new applications, acquisitions)
  4. Identify capacity gaps and thresholds
  5. Plan infrastructure additions or optimizations
  6. Review and update the Capacity Plan regularly

Performance Metrics

MetricDescriptionThreshold Action
CPU UtilizationPercentage of processor capacity usedAlert at 80%; action at sustained 90%+
Memory UtilizationRAM usage and paging/swapping frequencyExcessive paging indicates memory pressure
Disk I/ORead/write throughput and latencyHigh latency impacts application response time
Network BandwidthTraffic volume vs. available capacityCongestion causes packet loss and latency
Response TimeTime for system to respond to a user requestCompare against SLA targets
ThroughputNumber of transactions processed per unit timeDegradation may indicate bottlenecks

Performance Monitoring Tools

  • SNMP-based network monitoring (Nagios, PRTG, SolarWinds)
  • Application Performance Monitoring (APM) — Dynatrace, New Relic, AppDynamics
  • Log aggregation and analytics — Splunk, ELK Stack
  • Cloud-native monitoring — AWS CloudWatch, Azure Monitor, Google Cloud Operations

IS Auditor's Review of Capacity Management

  • Is a formal capacity plan documented and reviewed regularly?
  • Are performance baselines established and monitored?
  • Are alert thresholds defined and acted upon?
  • Does the capacity plan incorporate business growth projections?
  • Is capacity data used to support IT investment decisions?
05

IT Asset & Configuration Management

IT Asset Management (ITAM)

ITAM tracks and manages IT assets (hardware and software) throughout their lifecycle to optimize utilization, control costs, and ensure compliance.

Lifecycle StageKey Controls
ProcurementApproved vendor list, purchase authorization, receiving verification
DeploymentAsset tagging, inventory recording, configuration baseline applied
In-Use / MaintenancePatch management, license compliance, periodic physical verification
DisposalData sanitization (NIST SP 800-88), destruction certificates, asset deregistration

Configuration Management Database (CMDB)

The CMDB is the authoritative repository of all Configuration Items (CIs) and their relationships. It is the foundation for change management, incident management, and capacity planning.

  • Configuration Item (CI): Any component that needs to be managed (servers, apps, network devices, services, documents)
  • CI Attributes: Owner, version, status, location, relationships to other CIs
  • Configuration Baseline: Approved configuration state at a specific point in time
  • Configuration Audit: Verification that actual configurations match CMDB records

Software License Management

  • Maintain accurate inventory of licenses purchased and deployed
  • Types: perpetual, subscription, concurrent, per-seat, per-CPU
  • Software Asset Management (SAM) tools automate discovery and compliance
  • Under-licensing risk: fines and legal exposure
  • Over-licensing risk: unnecessary costs

Patch Management

Timely patching is one of the most effective security controls. IS auditors review:

  • Patch identification: subscriptions to vendor advisories and CVE feeds
  • Patch assessment: risk rating, applicability testing
  • Patch deployment: testing in non-production → production via change management
  • Patch verification: confirmation patches applied successfully
  • SLAs: critical patches within 24-72 hours; high within 7-14 days; medium within 30 days

🎯 CISA Key: The CMDB is only valuable if it is kept accurate and up-to-date. Outdated CMDB records undermine incident resolution, change impact analysis, and capacity planning — making configuration audits essential.

06

Backup, Recovery & Data Management

Backup Types

TypeWhat Is Backed UpRestore TimeStorage
Full BackupAll data every timeFastest restore (single backup set)Highest storage use
Incremental BackupOnly data changed since last backup (full OR incremental)Slowest restore (need full + all incrementals)Lowest storage use
Differential BackupAll data changed since last FULL backupMedium restore (need full + last differential)Medium storage use
Continuous Data Protection (CDP)Every change captured in real-timeNear-zero RPO; very fast restoreVery high storage

Recovery Objectives (Critical CISA Definitions)

TermDefinitionWho Sets It
RTO (Recovery Time Objective)Maximum acceptable time to restore a service after a disruptionBusiness (based on business impact)
RPO (Recovery Point Objective)Maximum acceptable data loss measured in time (e.g., 4 hours of data)Business (based on data value)
RCO (Recovery Consistency Objective)How consistent the data must be after recovery (for distributed systems)Business / IT jointly
MTPD (Maximum Tolerable Period of Disruption)Maximum time a business function can be unavailable before unacceptable impactBusiness

🔵 Key Relationship: RTO must be less than MTPD. RPO determines backup frequency — if RPO is 4 hours, backups must occur at least every 4 hours.

Backup Best Practices (3-2-1 Rule)

  • 3 copies of data
  • 2 different storage media types
  • 1 copy offsite (or in cloud)
  • Backups must be tested regularly — untested backups cannot be relied upon
  • Backup media must be encrypted (especially offsite/cloud)
  • Backup access must be restricted to authorized personnel

Data Retention & Archiving

  • Retention policies must comply with legal, regulatory, and business requirements
  • Data classification drives retention periods (e.g., financial records 7 years, HR records varies)
  • Legal holds override normal retention/destruction schedules
  • Archived data must remain accessible for the retention period
  • Secure disposal at end of retention period (NIST SP 800-88 guidelines)

Data Replication

TypeDescriptionRPO
Synchronous ReplicationWrite committed to primary and secondary simultaneouslyNear-zero (no data loss)
Asynchronous ReplicationWrite committed to primary first; secondary updated afterSeconds to minutes of potential data loss
07

Business Continuity Planning (BCP)

BCP Overview

Business Continuity Planning (BCP) ensures an organization can continue critical business functions during and after a disruptive event. BCP is broader than DRP — it covers all business functions, not just IT.

Key Distinction: BCP covers ALL business functions (people, processes, facilities, suppliers). DRP is the IT subset of BCP focused on recovering IT systems and infrastructure.

BCP Development Process

  1. Project Initiation: Management commitment, scope definition, team formation
  2. Business Impact Analysis (BIA): Identify critical functions, dependencies, RTO, RPO, MTPD
  3. Risk Assessment: Identify threats, vulnerabilities, and likelihood of disruptions
  4. Strategy Development: Select recovery strategies for each critical function
  5. Plan Development: Write BCP documenting procedures, responsibilities, and resources
  6. Testing & Exercises: Validate the plan through various test types
  7. Maintenance & Review: Update plan regularly and after significant changes

Business Impact Analysis (BIA)

The BIA is the foundation of BCP — it identifies what matters most and how quickly it must be restored. BIA outputs:

  • List of critical business functions ranked by priority
  • RTO and RPO for each function
  • MTPD (Maximum Tolerable Period of Disruption)
  • Resource requirements (people, systems, data, facilities)
  • Internal and external dependencies
  • Financial impact of disruption over time

⚠️ CISA Exam Hot Topic: The BIA is performed BEFORE risk assessment and strategy development. The BIA tells you WHAT to protect; risk assessment tells you WHAT threats to protect against. BIA is the starting point.

BCP Testing Types

Test TypeDescriptionDisruption Risk
Document Review / Checklist TestReview plan for completeness and currencyNone
Structured Walkthrough (Tabletop)Team verbally walks through scenarios — discussion-basedNone
Simulation TestRealistic scenario simulated; teams respond as if real (no actual failover)Low
Parallel TestRecovery systems activated alongside production; both run simultaneouslyMedium — resource intensive
Full Interruption TestProduction systems shut down; full failover to recovery systemsHIGH — most thorough; highest risk

BCP Strategies

  • Do nothing (accept disruption): For non-critical functions only
  • Manual workarounds: Paper-based processes during IT outage
  • Reciprocal agreements: Mutual aid agreements with other organizations (unreliable)
  • Third-party hot/warm/cold sites: Dedicated recovery facilities
  • Cloud-based recovery: Cloud infrastructure for failover
08

Disaster Recovery Planning (DRP)

DRP Recovery Site Strategies

Site TypeDescriptionRTOCost
Hot SiteFully equipped, powered, staffed facility with real-time data replication. Ready within hoursHours (1-4 hrs)Highest
Warm SitePartially equipped facility; hardware ready but needs data restoration and configurationHours to days (12-72 hrs)Medium
Cold SiteShell facility with power and connectivity; no equipment — must be procured and installedDays to weeksLowest
Mobile SitePortable recovery facility (trailer/container) that can be deployed to any locationDaysMedium
Cloud RecoveryOn-demand cloud infrastructure; scales rapidly; pay-per-useMinutes to hoursVariable (low fixed cost)
Mirrored SiteIdentical duplicate of primary site; fully synchronous; near-zero RTO/RPONear-zero (minutes)Very Highest

DRP Key Concepts

  • Failover: Automatic or manual switch from primary to recovery systems
  • Failback: Return to primary systems after disaster is resolved
  • Switchover: Planned transition (vs. unplanned failover)
  • Recovery Procedures: Step-by-step instructions for restoring systems in priority order

DRP Plan Components

  • Disaster declaration criteria and authority (who can declare a disaster)
  • Emergency contact lists (staff, vendors, regulators)
  • System recovery priority list (based on BIA)
  • Step-by-step recovery procedures for each system
  • Data recovery procedures (backup restoration or replication failover)
  • Communication plan (internal and external)
  • Recovery team roles and responsibilities
  • Return-to-normal (failback) procedures

DRP Testing

IS auditors assess whether DRP testing is:

  • Conducted at least annually (more frequently for critical systems)
  • Using documented test plans and test cases
  • Covering all critical systems identified in the BIA
  • Measuring actual RTO and RPO achievement against targets
  • Resulting in documented test results and lessons learned
  • Driving plan updates to address identified gaps

🎯 Critical Audit Point: An untested DRP is not a reliable DRP. The IS auditor must verify that recovery procedures have been tested, and that actual recovery times were measured against RTO targets — not just assumed to be achievable.

High Availability (HA) Technologies

  • Clustering: Multiple servers share a workload; automatic failover if one fails
  • Load Balancing: Distributes traffic across multiple servers for performance and redundancy
  • RAID (Redundant Array of Independent Disks): Disk redundancy and performance (RAID 1 mirror, RAID 5 parity, RAID 10 stripe+mirror)
  • UPS (Uninterruptible Power Supply): Protects against short power outages; buys time for generator startup
  • Generator: Long-term backup power (diesel or gas)
09

Physical & Environmental Controls

Physical Security — Defense in Depth

Physical security uses layered controls to protect IT assets and facilities:

  1. Perimeter Security: Fencing, barriers, security lighting, CCTV, security guards
  2. Building Access: Badge readers, key locks, visitor management, reception control
  3. Data Center Access: Mantrap (airlock), biometrics, multi-factor authentication, access logs
  4. Server Room / Rack: Locked cabinets, cable management, equipment tags

Data Center Environmental Controls

ControlPurposeTarget / Standard
Air Conditioning / CRACMaintain temperature and humidity for equipment reliabilityASHRAE: 18-27°C (64-81°F); 40-60% humidity
Fire SuppressionDetect and suppress fires without damaging equipmentFM-200, Novec 1230 (gaseous); water mist; NOT standard sprinklers near equipment
UPS (Uninterruptible Power Supply)Continuous power during brief outages and voltage fluctuationsSufficient capacity for graceful shutdown or generator startup
GeneratorExtended backup power during prolonged outagesRegular testing; adequate fuel supply (72+ hours)
Raised FloorCable routing and cold air distribution underneathHot aisle / cold aisle containment configuration
Water/Leak DetectionDetect water ingress from flooding, HVAC condensation, pipesSensors under raised floor and near HVAC units

Mantrap (Airlock)

A mantrap consists of two interlocking doors where the first must close before the second can open. It prevents tailgating/piggybacking — one of the most important physical security controls for data centers.

CCTV & Physical Access Logs

  • CCTV provides deterrence and forensic evidence — cameras must cover all entry/exit points
  • Video must be retained for sufficient period (typically 90 days minimum)
  • Electronic access logs record who entered/exited and when — must be reviewed regularly
  • Physical access must be reviewed periodically — revoke access for leavers immediately

IS Auditor's Physical Control Review

  • Walk through the data center — observe controls in practice
  • Review access logs for unauthorized access or after-hours access
  • Verify visitor log is maintained and visitors are escorted
  • Test fire suppression system records and maintenance logs
  • Verify UPS and generator test records (frequency, load tested)
  • Check temperature and humidity monitoring and alert records
10

Cloud Operations & Resilience

Cloud Service Models & Operational Responsibility

ModelCustomer ManagesProvider Manages
IaaSOS, middleware, runtime, apps, data, accessPhysical, network, hypervisor, storage hardware
PaaSApplications, data, access managementPhysical, network, OS, middleware, runtime
SaaSData classification, access management, user activityEverything else including application

Cloud Deployment Models

  • Public Cloud: Shared infrastructure (AWS, Azure, GCP) — most cost-effective; regulatory concerns
  • Private Cloud: Dedicated cloud for one organization — greater control; higher cost
  • Hybrid Cloud: Combination; critical workloads private, general workloads public
  • Multi-Cloud: Using multiple public cloud providers — avoids vendor lock-in; complexity
  • Community Cloud: Shared by organizations with common concerns (healthcare, government)

Cloud Resilience Concepts

  • Availability Zones (AZs): Isolated data centers within a region; independent power/networking
  • Regions: Geographic areas containing multiple AZs — data sovereignty compliance
  • Auto-Scaling: Automatically adjusts capacity based on demand
  • Multi-Region Deployment: Ultimate resilience — survives complete regional outage
  • Chaos Engineering: Intentionally injecting failures to validate resilience (Netflix Chaos Monkey)

Cloud Operations Audit Considerations

  • Shared Responsibility Model: Auditor must assess controls on both sides
  • Data Residency: Verify data stored in legally compliant jurisdictions
  • Cloud Access Security Broker (CASB): Visibility and control over cloud app usage
  • Cloud Security Posture Management (CSPM): Continuous misconfiguration detection
  • Provider Assurance Reports: SOC 2, ISO 27001, CSA STAR certification
  • Exit Strategy: Data portability, format, migration plan

Cloud-Native Resilience Patterns

  • Circuit Breaker: Stops cascading failures by breaking connections to failing services
  • Bulkhead: Isolates failures to prevent them spreading across services
  • Retry with Backoff: Retries failed requests with increasing delays
  • Health Checks & Self-Healing: Kubernetes restarts failed containers automatically
11

Problem, Incident & Event Management

Event vs. Incident vs. Problem (ITIL Definitions)

TermITIL DefinitionExample
EventAny change of state with significance for service managementCPU utilization reaches 85% threshold
IncidentUnplanned interruption or reduction in quality of an IT serviceEmail server down; application slow
ProblemThe underlying cause of one or more incidentsMemory leak causing repeated application crashes
Known ErrorA problem with a documented root cause and workaroundKnown bug with a documented restart workaround pending vendor patch

Incident Management Process

  1. Detection & Logging: Incident identified and logged with timestamp, reporter, description
  2. Classification & Prioritization: Impact × Urgency = Priority; P1-P4 typically
  3. Investigation & Diagnosis: Identify symptoms and potential cause
  4. Resolution & Recovery: Apply fix or workaround; restore service
  5. Closure: Confirm with user; document resolution; update knowledge base

Major Incident Management

  • Separate process for high-impact incidents (P1/P2)
  • Major Incident Manager coordinates response
  • War room / crisis bridge established
  • Regular stakeholder updates (every 30-60 mins)
  • Post-Incident Review (PIR) / Root Cause Analysis (RCA) mandatory after major incidents

Problem Management

Problem management identifies and eliminates root causes to prevent incident recurrence. Two modes:

  • Reactive: Triggered by recurring or major incidents
  • Proactive: Analysis of trends to prevent future incidents

Root Cause Analysis (RCA) Techniques:

  • 5 Whys: Repeatedly ask "why?" to drill to root cause
  • Fishbone / Ishikawa Diagram: Visual cause-and-effect analysis
  • Fault Tree Analysis: Top-down logical diagram of failure paths
  • Timeline Analysis: Chronological event reconstruction

Security Incident Response (NIST SP 800-61)

PhaseKey Activities
PreparationIRP development, tools, training, communication plans
Detection & AnalysisIdentify incident, classify severity, collect evidence
ContainmentShort-term (isolate) and long-term containment strategies
EradicationRemove threat (malware, attacker access, vulnerabilities)
RecoveryRestore systems; verify normal operation; monitor closely
Post-Incident ActivityLessons learned, report, improve controls

⚠️ Evidence Preservation: During security incidents, evidence must be preserved using forensically sound methods (chain of custody, disk imaging, write blockers). Improper handling destroys admissibility in legal proceedings.

📝

Interactive MCQ Bank — 103 Questions

Score: 0/0
0%
M A Fazal & Co.
Logo