Secrets and Tokens: Rotation SLAs, Blast Radius, and Attacker Dwell Time

Published by

on

Security analyst working across multiple monitors in a live operations environment, suggesting active investigation, credential risk, and access-related incident response.

Three things before you read further

  • Long-lived secrets are not a convenience feature. They are a deferred blast radius. Every day a secret does not rotate is a day an attacker who already has it can use it without you knowing.
  • Rotation SLAs only work if you treat them like operational SLAs. Not a policy PDF. Not a quarterly review. A number, an owner, and an alert when the number is missed.
  • Blast radius is a scope problem, not an incident problem. By the time you are calculating blast radius in an incident, the scope decision that set it was made weeks or months earlier. Make it then, not under pressure.

The secret that never rotated

The secret that ruins your week did not arrive that morning. It was provisioned months or years ago, worked fine, and became invisible.

A service account gets provisioned for an integration. The developer sets a client secret, notes the expiry somewhere, and ships. The integration works. Six months later the secret is still the same one, but the developer who set it has moved to a different team. A year later nobody remembers where the secret lives or what it touches. Two years later the secret is still valid, still in use, and the answer to “what can this credential access” is a shrug followed by a Jira ticket that closes unresolved.

This is not an edge case. It is the default state of most non-human identity, or NHI, credential management programs that have not been deliberately built otherwise.

Rotation is not the exciting part of identity security. It does not get a conference talk. It does not have a compelling demo. It is just the operational discipline that determines how bad the next incident is. And most programs treat it like a nice-to-have until the moment they need it to have been a habit.


What a rotation SLA should look like in practice

A rotation SLA is not a compliance checkbox. It is a commitment about how long a compromised credential can be used before it stops working.

That framing matters. Most teams think about rotation as a hygiene task, something you do on a schedule because policy says so. The more useful frame is attacker dwell time. If your rotation SLA for a production API key is 90 days, you are implicitly accepting that an attacker who obtained that key has up to 90 days of valid access before it naturally expires. That is the deal you are making, whether you have thought about it that way or not.

Different secrets warrant different SLAs based on three variables: sensitivity of what they access, exposure surface, and reversibility of the actions they can take.

A rough working model:

  • Human-facing OAuth tokens: Interactive access tokens should be short-lived, usually measured in minutes or hours, not weeks. Refresh tokens should be rotated on use where the platform supports it. Sender-constraining is the stronger control where available. RFC 9700 is explicit on the need for stronger replay protections for public clients.
  • Service-to-service API keys with write access: 30 days or less. These are the credentials with the most blast radius potential and the most likely to be long-lived by default.
  • Service-to-service API keys with read-only access: 90 days is defensible as a starting point. Google Cloud recommends rotating service account keys at least every 90 days, and AWS Config’s access-keys-rotated rule defaults to 90 days. Read-only limits what an attacker can do. It does not make the credential unimportant.
  • Database credentials for production systems: 30 days is a reasonable ceiling for static credentials. Shorter is better. Dynamic or leased credentials are better still. AWS Secrets Manager supports much tighter automation, and HashiCorp Vault-style dynamic database credentials reduce the lifetime of the credential itself. If rotation is disruptive, that is usually telling you something true about the fragility of the deployment process.
  • Infrastructure credentials and cloud provider keys: 30 days or less. These are often the most powerful credentials in the environment and frequently the least rotated.
  • Agent and automation tokens: Match the blast radius tier of the agent. A read-only assistant is different from a workflow agent with write access to CRM and ticketing.

These numbers are starting points, not absolutes. The right SLA for your environment depends on your threat model, your detection capability, and your operational maturity. OWASP makes the broader point well: secret lifetime is contextual and should follow what the credential protects, while privileges stay at the minimum required. What matters more than any specific number is that the number exists, is documented, has an owner, and triggers an alert when it is missed.


Blast radius is set before the incident

Blast radius is one of those terms that gets used heavily during incident response and almost never during provisioning. That is exactly backwards.

By the time you are in an incident asking “what can this credential access,” the blast radius was already set. It was set when someone decided what scopes to request, what data to expose to the integration, and what actions to permit without an approval gate. The incident does not create the blast radius. It reveals it.

The practical implication is that blast radius reduction is a provisioning and review discipline, not a response discipline.


Scope at provisioning

The principle here is simple and almost universally ignored: request the minimum scope required for the integration to function today, not the maximum scope you might need eventually.

“We might need write access later” is not a reason to provision write access now. It is a reason to have a process for requesting write access when that day comes. The difference between those two approaches is the difference between a credential compromise that is embarrassing and one that is catastrophic.

For agent credentials specifically, this means separating read from write at the identity level, not the application level. A credential that can both read customer data and write to production systems is a privilege bundle. A bundle means one compromise unlocks everything. Split them. Different credentials, different scopes, different rotation schedules.


Effective access vs provisioned access

One of the most consistent findings in identity security assessments is the gap between what a credential is supposed to be able to do and what it can actually do. Role drift, overly permissive policies, inherited access from group memberships, and forgotten direct grants all compound over time.

This is where the rotation SLA argument and the blast radius argument meet. A credential with a 90-day rotation SLA and materially more effective access than was provisioned is a bigger problem than either number suggests on its own. The SLA bounds the exposure window. The effective access determines what an attacker does inside that window.

Regular access reviews that verify effective access, not just provisioned access, are the control that keeps those two numbers in alignment. Without it, your rotation SLA is accurate about time but wrong about exposure.


Making rotation operational

Rotation policy is only as good as the operational muscle behind it. A policy that says secrets rotate every 30 days and a system that does not enforce or monitor that policy is not a 30-day rotation program. It is a document with good intentions.


Secrets management infrastructure

If secrets live in environment variables, CI/CD pipeline configuration, developer dotfiles, or Slack messages, you do not have a secrets management program. You have a scavenger hunt waiting to become an incident.

A secrets manager, whether HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, or Google Cloud Secret Manager, is the minimum infrastructure. The value is not just storage. It is the audit trail, the access controls, the automatic rotation support for supported credential types, and the ability to answer “who accessed this secret and when” in a way that a spreadsheet cannot.

The operational test is simple: if a secret is compromised right now, how long does it take to rotate it and push the new value to every system that needs it? If the answer involves manual steps, a communication chain, and a maintenance window, that process will be skipped or delayed under incident pressure. Build the automation before you need it.


Expiry alerting

Every secret should have an expiry date. Every expiry date should trigger an alert before it lapses. An expired secret that is still in use because nobody noticed is not a rotation success. It is a rotation failure with extra steps.

The alert threshold matters. A seven-day warning on a secret that takes three days of coordination to rotate is not useful. Build the lead time into the threshold. A 30-day credential should alert at 20 days. A 90-day credential should alert at 60. The goal is to remove urgency from the rotation task. Urgency is how shortcuts happen.


Ownership that survives offboarding

This is the failure mode that produces the scenario at the top of this post. A secret gets provisioned, it works, and then the person who provisioned it leaves. The secret lives on. Its ownership does not.

Every credential needs a named owner and a backup owner. When the primary owner offboards, there is an automated trigger to reassign ownership and review the credential before the next rotation cycle. Without that, you build an environment where the answer to “who owns this” is always a dead email address and a closed ticket.

This is the same point I made in the NHI Ownership Security Checklist: ownership is not administrative overhead. It is the control that makes every other control possible. Rotation, review, incident response, and offboarding all depend on having a live human accountable for each credential.


Agent credentials deserve their own treatment

Most of what has been written here applies to all non-human credentials. Agent credentials have a few specific properties worth calling out.

Agents often accumulate credentials over time. A new connector gets added, a new tool gets integrated, and the agent’s credential surface grows. Unlike a static service account where the scope is set at provisioning and rarely revisited, an active agent deployment tends to expand. That expansion needs to trigger a blast radius review, not just a “does it still work” test.

Agents also frequently hold credentials on behalf of users, including OAuth tokens delegated from human accounts for tools the agent is authorized to use. Those delegated tokens need the same rotation treatment as direct agent credentials, with the additional complexity that rotating them may require user re-authorization. They also inherit user-side blast radius and revocation complexity, which means they cannot be treated like ordinary service secrets. Build that flow before you need it.

Finally, agents are among the most likely systems to have long-lived credentials that were “temporary” at provisioning. The pilot becomes production. The prototype never gets rebuilt with proper secrets management. The demo environment becomes a live environment. If your agent inventory or registry does not include credential hygiene as part of the registration checklist, add it now. The window for that discipline is at launch, not six months later.


What to do this week

If you manage or own non-human credentials and want to reduce blast radius without a multi-quarter program, start here.

Start with age. Pull every service account credential, API key, agent token, and automation secret older than 90 days. That is not just an inventory. It is a map of where your current blast radius is quietly sitting. Then assign an owner to every one of them. No owner means no review, no rotation, and no real response path when something breaks or gets abused.

From there, put expiry dates on anything that does not already have one. Define rotation SLAs by blast-radius tier, not by credential label. What matters is not whether something is called an API key or a token. What matters is what it can read, write, and trigger. Then test the runbook on one non-critical credential end to end. Rotate it, push the new value everywhere it is used, and time the whole thing. That number is your real operational cost. It should shape the SLA, not the other way around.

None of this requires new tooling. It requires discipline and a decision that rotation SLAs are operational commitments, not compliance paperwork.


Supporting references

RFC 9700 on OAuth 2.0 Security Best Current Practice is the current OAuth security baseline for refresh token rotation and privilege restriction. OWASP’s Secrets Management Cheat Sheet is a good practical reference for lifecycle, expiry, and least privilege. Google Cloud guidance on service account key rotation and AWS Config’s access-keys-rotated rule are useful anchors for the 90-day discussion. AWS Secrets Manager rotation schedules and HashiCorp Vault guidance on dynamic database credentials are useful references when you need to turn policy into actual rotation mechanics.


Related reading

NHI Ownership Security Checklist. The ownership control that makes rotation, review, and incident response possible. If you cannot answer who owns this credential, you cannot operationalize an SLA for it.

Identity Is the New Control Plane. Blast radius is determined by effective access, not provisioned intent. This is the framing behind every scope and rotation decision.

Leave a comment

Is this your new site? Log in to activate admin features and dismiss this message
Log In