Re-Writing the Playbook — A detection-driven approach to Incident Response

Re-Writing the Playbook — A detection-driven approach to Incident Response

When was the last time you looked at one of your incident response playbooks?

“Playbooks” is one of those terms that gets used in a lot of different contexts within cybersecurity. It’s an amorphous word that shifts and changes depending on the audience — If you’re talking to an engineer, they might think of a SOAR automation. If you talk to a CISO, security manager or cyber insurer, you might think of a 180-page monolithic document you paid a very expensive consultant for. If you’re a SOC analyst, it could be literally any one (or combination) of those things.

One of the key factors when designing a detection is the expected triage response an analyst should have if they need to investigate an alert. Before a detection gets moved to production, we as the detection engineers (in this hypothetical scenario) should be giving the analyst some guidance as to what they should look for. This is to orient your analysts and give them direction, straight from the engineer who designed and published it.

I work in a big SOC, and so I’m constantly paying attention to how analysts are reacting and responding to the detections we publish for them. Recently, I’ve been dipping my toes into the world of Incident Response by way of designing “Playbooks” an analyst can follow from detection to containment. This is a managed SOC environment, so there’s an additional customer dynamic that exists beyond the analyst that needs to be taken into consideration when we build a detection and a set of response actions. There’s more communication channels, more approvals, more sign-offs needed by tech and non-tech folk alike to authorise an analyst (human or agentic) to do anything. This is where my involvement in designing and implementing incident response playbooks began, and where I felt we could take advantage of the term “playbook” and its amorphous definitions.

This tweet by solst/ICE highlights a pattern of behaviour I’ve seen play out many times before in enterprise cybersecurity. Company A has paid for playbook development by a four-letter acronym consultancy. The compliance people are happy, the leadership team feel assured, and the technical teams are left with a monolithic document (or multiple) that act as the official incident response plan. All good, right?

These plans are then swiftly forgotten about and never seen by another human until the next cyber audit occurs (or until they’re ejected from the building the second a real incident occurs, assuming anyone remembers they still exist). Clearly, there is a need for businesses to have defined incident response playbooks. They should be tabletopped and tested. But, they need to be actionable to really stick — and the majority I’ve seen flat out are just not.

I want to highlight an approach to doing incident response playbooks that I believe can be solved by applying a Detection Engineering lens to the problem — a structure (if you can call it that) that could spark some inspiration for your own detection engineering and incident response teams.

The Approach

I’d like to introduce to you what I have begun calling “The Incident Response Defense Diamond™️” (Patent pending).

I am not an artist.

The diamond is comprised of 4 major layers — The Compliance layer, High Level Procedures, Detection Response Plans, and Detailed Response Steps (or as some call them, Runbooks). These sections are divided evenly based on whether they serve primarily technical or non-technical audiences. The non-technical audience here would be your C-suite, SOC management team, and policy writers. The technical audience would be engineers, analysts, and incident responders.

These technical layers could also be interpreted as human and non-human layers. A human may write the detection response plans, but an agentic AI or SOAR automation could carry out the detailed response steps and report the results to a human analyst.

The Compliance Layer

This layer is optional and exists for the organisations that have a large monolithic document that describes the organisation’s response to cyber incidents holistically. In my experience, these documents tend to be extraordinarily large and focused entirely on satisfying compliance requirements, instead of providing any kind of actual technical guidance. These documents may reference the existence of other non-technical playbooks, or collate them all inside one large document. I find that these playbooks typically get referred to with terms such as:

  • High level procedures
  • Standard Operating Procedures
  • Operations Plans

If you have any kind of control over what these documents look like, I highly recommend slicing the content out into their own scenario-based (and Defense-Diamond™️approved) “High Level Procedures” (e.g. Phishing, Account Takeover, someone used our cloud for cryptomining, etc). This allows for much greater agility when designing policy and procedure and will lessen the overall documentation-drift and change management needed to keep these policies up to date. Once you’ve done this, this document should be no longer than 10–20 pages, max.

High Level Procedures

This layer represents probably what most organisations would refer to as their playbooks. These are sets of procedures based around specific scenarios that dictate how a business will respond to that kind of threat. These kinds of scenarios, like the compliance layer above, tend to fall into broad categories such as ‘Phishing’ and ‘Ransomware’. In the diamond, these documents are a high-level set of procedures that gets signed off by management, acting as your written contract allowing you to take remedial actions in case of an incident.

Your phishing playbook, for example, should dictate that we are allowed to disable user accounts, executive or standard user, under any circumstance, at any time, in XYZ systems, for any qualified incident.

Every individual alert or detection can usually be categorised into one (or more) of these high level procedures. This is because, depending on the severity of an incident, a detection, or series of detections, could require the invocation of multiple procedures to contain and remediate.

Detection Response Plans

Response plans are where the rubber hits the road and Detection Engineering fully crosses over into Incident Response territory. This is where the detection engineer puts into writing the steps an analyst should take to properly triage and analyse the alert. These response plans should live in a centralised wiki alongside all the other detection metadata, but it can also be bedded into the incident itself as tasks (Splunk ES, Microsoft Sentinel, etc. all support this kind of functionality). Depending on the detection and skill of the engineer writing it, these may be high-level, or incredibly specific:

1. Review the IP addresses ASN and run XYZ query to determine if legitimate.
2. Check the user’s history in Entity Behaviour Analytics to determine if other suspicious activity occurred around the alert triggering.
3. Revoke the user session (hyperlink to runbook/detailed response steps)
… so on and so forth.

Depending on the maturity of your SOC, this is a prime section for an analyst to review prior to a detection rule being deployed. After all, SOC analysts are the ones who will be dealing with the detection should it fire, and they (should) know best. By putting these into a wiki, you also gain the benefit of having SOC analysts be able to review and update these sections with additional investigation steps and queries based on learned experience, further enhancing the SOC’s analysis capability. The above example can also highlight use-cases where you can automate repetitive tasks. By being forced to write a response plan for new detections, you’re constantly forced to ask yourself if an action can be automated.

Detailed Response Steps (Or, Runbooks)

A detailed response step (or runbook) are the micro-detail, technology and client-specific steps required for an analyst to go and carry out an action. It could be osmething as simple as “how to login to a firewall and validate a configuration”, all the way to a collection of common response actions for a particular environment. An example of how simple a single runbook can be is:

1. Connect to XYZ jumpbox with your XYZ account.
2. Login to the customer’s firewall (with a hyperlink)
3. Click “Firewall Rules -> Add new rule”
4. Fill in the malicious IP address
5. Click apply.

Or you can have a generic collection of runbooks for each client. You may have 10 customers all using Microsoft Defender, but each have their own Playbooks with unique idiosyncracies or rules of engagement. Your detailed response step could say “If the detection appears to be legitimate, initiate the customer’s phishing playbook”, which would then link off to that customer’s set of runbooks for revoking access to a user account.

This is the real technical nitty-gritty, and as a Detection Engineer, the expectation should be for the platform engineer or Service Delivery function to negotiate these processes separately. Your detection and accompanying response plan should be *generic* and be able to hook into each customer seamlessly, regardless of their individual runbooks.

Wrapping Up

Detection Engineering is more than a framework for building and deploying detection rules — it’s the fusion center between CTI, Incident Response, SOC analysis, and the defensive goals of the business you’re defending. By enhancing your detection rules with response plans, you can bridge the gap between technical and non-technical teams, creating cohesive, actually useful incident response playbooks that are more than just expensive wastes of paper and laminate.


Re-Writing the Playbook — A detection-driven approach to Incident Response was originally published in Detect FYI on Medium, where people are continuing the conversation by highlighting and responding to this story.

原始链接: https://detect.fyi/re-writing-the-playbook-a-detection-driven-approach-to-incident-response-5269e2eb33ca?source=rss----d5fd8f494f6a---4
侵权请联系站方: [email protected]

相关推荐

换一批