A new paradigm and operating system for security work

Earlier this year I joined Clay, an automation and orchestration SaaS for RevOps, Sales, and Marketing. Clay enables these teams to experiment with and execute different go-to-market motions. The core primitive is a table where columns are data enrichments from various sources or LLM-based actions (including browsing the web). An example table starts with a target account's company and domain, uses LLMs to research industry, size, geographical focus, fundraising data or SEC filings, key stakeholders and recent conferences they attended, and generates talking points for approaching this lead and account.

This structure is powerful for GTM motions. Though Clay has specific ICPs in mind and builds its use cases and data partners around them, I believe a similar approach can transform security and compliance work.

Security teams struggle to separate meaningful from performative work. Impact and risk can be hard to measure. You must raise the floor across the board while raising the ceiling in areas core to your offering. You serve multiple teams, balancing deviations within centralized frameworks. As companies grow, removing implementations becomes harder: you lose track of the compliance controls that depend on them. People avoid removing anything, resulting in bloat.

I propose a paradigm and operating system that helps security and compliance work, by first collecting all task sources, enriching each task based on internal and external knowledge to quantify marginal security and compliance value, then using rules to orchestrate how each should be prioritized, performed, documented, and automated. Over time, the system improves by learning from previous work. Filtered views can be generated for each stakeholder. Think Superhuman the email client, where you see historical context for each correspondent, construct various filter rules, and even have AI draft emails.

Sources of Work

Aggregate all tasks into one constantly updated database: roadmap items, incoming requests, recurring compliance reviews, vulnerability scanners, posture management tools (e.g., CSPM, ASPM), SIEM alerts, and compliance platforms like Vanta. Start small and connect more systems over time. The more systems connected, the better the operating system performs.

Enrichment

For each item, enrich from three sources:

Self-enrichment: Correlate with other tasks in the database, including similar, related, and ones forming parent-child hierarchies. This gives insight to how similar tasks were handled in the past, including any possible automations. Hierarchies enable grouping, summaries, and effective browsing into this database.
Internal enrichment: Add context from controls, procedures, documentation, playbooks, discussions, decisions, upcoming roadmaps, and accepted risks. Creative enrichment sources include read-only access to your AWS environment, Github analysis for attribution, and even chatting with other team members over Slack.
External enrichment: Pull from compliance frameworks, security news, threat intel, and vulnerability databases. More creatively, it can research open-source libraries on Github, or read security advisories and interpret if they are applicable.

This enrichment provides necessary context for both humans and LLMs in later steps. It also quantifies the cost of delay--whether a contractual commitment, regulatory obligation, or exploitability of a vulnerability. Enrichments are stored in a structured format, with fields dependent on task type, plus free text for LLMs.

Orchestration and Automation

With context established, route tasks to different outcomes: prioritized versus de-prioritized (risk accepted), individual fixes versus one project to fix the whole class, which team to route to or if AI agents should try first. Rules plus AI recommendations ensure focus on the most important work at the right abstraction.

Continuous Retrospection

The operating system continuously analyzes past work. When a task type increases in volume, it proposes a project to eliminate them wholesale. When a control becomes thin or sloppy in evidence, it increases the priority. When it detects redundancy, it suggests ways to keep operations lean and meaningful.

Custom Views

To work effectively with so much data, you can create custom filter-based views for different stakeholders, similar to database views. This allows you to prune irrelevant tasks, whether horizontally, vertically, or using more complex filters.

Key Differentiators

This paradigm introduces four core features:

Global visibility enables maximum context and prioritization
Enrichment adds internal and external context, important especially as more work shifts to AI agents
Routing engine triages and routes all tasks, reducing mental burden and maintaining team focus
Self-improvement through continuous retrospection

To be clear, nothing has been built and I don't think there is a readily available off-the-shelf software without a bunch of custom work. Would love to get feedback.

On AI agents using browsers and use cases in compliance

Agents and Browsers

In recent months, both Anthropic and OpenAI have shown how an AI agent can independently use a web browser to interact with the Internet. I am optimistic in having agents use the browser to enable new use cases, and want to share some thoughts and observations based on my recent learnings in this area.

Browser vs API

The most obvious is the dual option of accessing websites through the browser UI or through the API. Historically, machine access leaned on the API side, where all interactions are under specific protocols, and outcomes exact. However, API support is rarely complete, leaving the browser path as the first class citizen.

Now that AI agents can use a browser to access the web, just like how a human would, the complete set of information available can now be accessed. Without the rigor that API offers, there is a layer of uncertainty e.g. from hallucination. But, arguably this is not specific to AIs as humans can also get lost on a website, click on a wrong link, or generally take a while to figure something out.

An adaptive model is likely the answer here, where APIs are used for the mainstream sites and interactions, and the browser for the long tail. In this realm, we are seeing new crawlers such as Firecrawl that can convert a website to a YAML structure, and Stainless which can generate APIs automatically. So it’s possible that the future will still be API-based, but dynamically generated on-demand.

I also think that browser-based interactions will deal a blow to companies whose moats are their collection of API integrations. It used to be the case that maintaining APIs is a tedious job and once you have a collection of them it makes a potential competitor harder to enter into the race. Browsers reset the race, and the library of APIs are now worth way less than previously thought.

Screen vs DOM vs Accessibility Tree

There are a few different ways to have an AI agent interact with a browser. The most classical is the DOM, which is generated by the browser and Javascript engine. This is how past automated QA testing tools such as Selenium, Playwright, and Puppeteer use. Accessibility Tree is a related layout provided by the browser to make websites more accessible. Lastly, the Anthropic Computer Use demo, from late 2024, uses screenshots and asks the LLM for the next action, for example typing and clicking at specific locations on the screen.

I am optimistic on the Accessibility Tree but it has a few main features to overcome: performing actions through the Accessibility Tree, as I believe it is currently read-only; and a component-level mapping between the Accessibility Tree and the DOM, allowing fast switching between the two maps.

No clear winner has emerged yet this early, but this is an interesting space to watch.

Browser enhancements as assistance

I look forward to browser-level enhancements to help AI agents navigate the browser. For example, one problem that I am seeing today is that LLMs interpreting a screenshot have trouble realizing dropdowns, scroll bars, scroll bars within dropdown boxes, and other UI components that humans have grown accustomed to. It is probably possible to do fine tune training on UI components, but I suspect there is also a long tail of custom styles (or maybe I shouldn’t underestimate AI?).

Since screenshots lose too much information on UI components, we will see annotation of UI components (I believe Stagehand is trying this using CSS), or ways to take smart screenshots that are not just 2D graphics. The browser knows exactly what and where the UI components are, so it shouldn't be difficult to relay that information to the LLM.

Agents and Compliance

Somewhat related, there are some very good use cases in (security) compliance that are now within reach, and we’ve seen some companies starting to tackle them. No doubt more companies will follow, as well as established companies expanding into these.

Audit support agent to collect evidence: Historically already possible through companies like Vanta and Drata, but browser-based approach will remove auditor concerns for data integrity, and more importantly no longer rely on vendors’ API integrations. Since there are so many screenshots and spreadsheets to provide for a security audit of a non-trivial environment, there are many hours to be saved here using automation.

Crafting best controls based on existing process/evidence: In the opposite direction, an agent can observe what is actually being done in terms of process and configurations, and come up with the best and leanest set of controls that represent the given environment. Two use cases that this would help: in many startups which are moving too quickly to properly document, their security practice may actually be more advanced than what is documented. Secondly, many complex environments have redundant and overlapping controls through historical growth and M&A activities, to the point where people are afraid of changing key configurations or processes in the fear of non-compliance. Such an agent can assess how a proposed change will impact compliance controls.

Continuous cross-checking of actual state vs written policies & procedures: This is the general case of the two use cases above, and in a way an AI-native solution doing what Vanta and Drata are already doing, but more suitable for complex environments. Think of a GRC agent who can continuously perform housekeeping tasks. I believe Zania is heading towards this direction.

An auditor that is actually AI: Most people who talk about “AI auditor” today probably mean an audit on an AI system. I am personally most excited about replacing human auditors with AI. Let’s put aside auditing accreditation bodies such as AICPA for a moment, there is so much that can already be automated when reviewing audit evidence such as documents, spreadsheets, and screenshots, and the last 10-20% is also within reach with some development. In fact, there are only so many email providers, cloud providers, and code hosting providers that companies use, that I am surprised we have not seen more automation and AI used in evidence review. LLM, by definition, is about languages, and compliance should be the perfect use case for it within the cybersecurity sector.

I really would like to see an AI agent performing security audits (maybe with a senior human to override and allow exceptions in the rare cases), as a truly impartial entity that is also not constrained by scheduling logistics and turnaround times. You also won’t need to teach a junior auditor what is Kubernetes only for them to then audit if you’ve configured your Kubernetes correctly, as I have experienced. I actually think this AI auditor can send a stronger signal than human auditors because of this true independence. Ideally, for turf and commercial reasons, this AI auditor performs assessments on a neutral framework, likely NIST if the company is based in the US. My favorite is NIST 800-171.

AI-based auditor to connect into a system and assess, skipping the evidence stage altogether: This is the continuous compliance version of the use case above, but skipping all the intermediary artifacts such as screenshots. Though it sounds nice, there are practical barriers such as the auditee not wanting an auditor watching in their environment 24/7.

LVSS: A framework to prioritize vulnerabilities

In this post I present a framework to prioritize vulnerabilities and how urgently one needs to be fixed. The motivation is that I find scoring systems such as CVSS only showing one part of the picture, and that prioritization methods such as <impact x likelihood> being too subjective to scale in a company and to be answered consistently by different people. To be clear, I am not trying to replace CVSS, but to complement it. To reflect the local nature of this scoring system as opposed to CVSS's global nature, I will call this framework Local Vulnerability Scoring System, or LVSS.

The framework

The framework is as follows: For each pair of malicious actor and sensitive resource that you care about, a Local Vulnerability Score (LVS) is to be calculated as:

LVS = # of hops + # of detectors

where a lower score is more urgent/severe.

I will now explain in detail. An actor is the subject that is performing an access. It can be a real human (stranger or employee) or a service. A resource can be a server, a database, a network, customer data, or PII that you hold. While you can calculate an LVS for each individual resource, it is useful to group them based on type or classification, and start from the more important resources. Similarly, you can group actors by roles or department, with the general public being another group. The need for pairs of actor and resource will quickly generate many pairs to track if you do not group as much as possible.

The number of hops is the number of steps for the specified actor to get to the specified resource in the system. Exploiting the vulnerability is one step, but many times additional steps are needed if defense-in-depth is practiced. For example, if a vulnerability is only present in an internal system that is accessible only to admins, and that vulnerability leads to sensitive data, then 2 hops are needed for a non-admin actor: one to become the admin, then the second to exploit the vulnerability.

The number of detectors is the number of "tripwires" that would alert between the specified actor and the specific resource, including but not exclusively any alerts that would be triggered in the course of exploiting the vulnerability. In the example above, a suitable alert is when an actor becomes an admin, independent of the approval process to become admin. Another suitable alert is a tripwire that would be triggered should the vulnerability be exploited. Note that the detectors here assume a timely and capable incident response function, as logging alone is insufficient without a timely response.

A reasonable question is whether this framework can be gamed, leading to padding numbers for the sake of numbers. I believe it is aligned with meaningful security. To game this framework, one needs to either add more hops between an actor and a resource, which are not likely to be exploitable by the same vulnerability therefore leads to more work by an attacker, or add more alerts which will help detect an attacker on their way to the resource. At the end of the day, this framework incentivizes improvement by putting the actor logically farther away from the resource, or by adding meaningful detection points that are monitored.

An example

Let's walk through an example scenario with log4shell. Log4shell is a vulnerability in a number of log4j versions that allows a party who can trigger a log message to perform arbitrary code execution on the log processing server, by specially-crafted log messages.

Let's say log4shell is present in two places in your company: a public-facing web application and an internal/employee-facing web application. Let's further assume that the internal web application is directly exploitable and, through remote code execution, directly gives an attacker enough privileges to connect to a database with customer data, considered a resource of interest (let's call it DB1). Let's say the public-facing web application contains nothing interesting (no resources of interest), but is hosted on the same network as DB1, though not normally connected and does not have credentials into DB1. Lastly, let's assume no alerts are set up. Let us now calculate the relevant scores:

For {employee, DB1} pair, the LVS is 1+0=1 via the internal web application. There is only one thing to exploit, which is the vulnerability itself, and exploiting it will then give access to the internal-facing web application server which is already set up to have a direct access to DB1.

For {general public, DB1} pair, the LVS is 2+0=2 via the public-facing web application. It is insufficient to just exploit the public-facing web application, but the attacker will also need to move laterally within the network to DB1. Assuming they are already on the same network, then moving into DB1 likely consists of guessing DB1's credentials or bypassing it somehow.

(Employees can also pose as the general public and try to exploit the vulnerability through the public-facing application. Similarly, the general public can also try to impersonate as an employee and authenticate into the internal web application. In this example these two scenarios are no worse than the ones considered above so I will omit them.)

Given the two scores above, it is more imperative to address the internal web application, because the {employee, DB1} pair results in the lowest score. One way to do so is by adding alerts: if employees can only access their web application over VPN, and if all VPN traffic is examined in near-real-time, then an alert could be written to detect exploit payloads in VPN traffic. On the other hand, the general public has two hops to get to DB1, and since only one of the hops is exploitable (this should be confirmed based on the impact of the vulnerability), we can depend on the additional layer of defense while we solve the more urgent LVS=1 case.

It is also possible to extend this exercise to calculate both a pre- and a post-vulnerability scores, which would more accurately report the state of multiple layers of defense if the vulnerability is exploitable at multiple hops.

Future work

Conceptually, a vulnerability is a failure in access control. By ensuring additional points of access control between an attacker and a resource, or by "lighting up" a failed path by way of alerts and tripwires, we compensate for individual failures and make it hard for a resource to be accessed after accounting for everything else.

One area I would like to explore more is applying access control models to formalize and reason vulnerabilities. For example, if you have a system that is fully represented by an RBAC model, can you model a vulnerability simply as changing some evaluation functions to be "allow", and let the model re-calculate and report what, if any, end-to-end policies have changed? This would allow vulnerabilities to be incorporated into access control, as special policies.

Separating CI from CD

The CI/CD tool sits at an interesting position in a modern web application stack: it is one of the few places, if not the only one, that has controlling access into both production and non-production environments. As a result, it has a big responsibility enforcing the boundary between the production and the non-production systems.

As an example, here is a typical CI/CD setup in a software startup: source code is managed in an online git repository, and is connected to a combined CI/CD tool that is either self-hosted or online as-a-service. Development work is performed on ephemeral development branches, which anyone in the team can create and push into; hooks are also set up in development branches so that unit tests are run each time code is pushed in. Once code is peer reviewed and approved, and unit tests pass, the development branch can be merged to one of the special branches (e.g. 'release') where more tests are run and, if they all pass, a deployment occurs to a corresponding environment. These special branches are usually gated against direct actions, and can only be pushed or merged into after unit tests and human approval.

I want to discuss two potential security concerns here, as a result of combining CI (non-production) and CD (production). The first is the ability to influence tests and what counts as 'passing', and the second is access control of secrets. Both are possible paths for an insider, who normally does not have the permission to directly deploy to production, to bypass these security controls.

In the first case, an attack scenario works as follows: a developer wishes to commit a piece of code that would not pass necessary tests (e.g. code that would fail a security-related unit test), by also removing the offending test cases in the commit. When the CI tool runs its tests on the development branch, if its behavior is to run tests using the version that was just committed, then it will see that all tests pass. There are legitimate needs to modify or remove tests in the course of software development, so it may be hard to notice such behavior to a casual reviewer. As a result, a developer on the unprivileged side (CI) is able to modify the gate that is placed upon them and escalate into the privileged side (CD).

There are no good countermeasures against this attack, primarily because there are too many legitimate cases for modifying or removing test cases, that it is hard to notice when an attack is happening. The only consolation is that the human review step is not bypassed and can catch if something is off. Also, all commits are logged and tied to individuals so post hoc investigation is possible (there is a handwaving assumption here that the git hosting system can do authentication correctly).

The second security concern arises if the CI/CD system treats secrets stored with it equally. A CD system naturally needs production secrets in order to perform a deploy. Such secrets may also be accessible to a not-yet-approved pull request, and a development branch can then use production secrets either directly or exfiltrate it to a developer. While secrets can be hidden to developers trying to access through normal means, the fact that these secrets need to be used during a deploy makes it impossible to completely hide them, for example by printing one character at a time. If a developer, who does not and should not have access to production secrets, is able to retrieve them through a development branch, they can probably use those secrets and privilege escalate themselves into the production environment.

The solution for this case is quite simple: control which branches (or another similar logic) can access which secrets. To be proper, secrets should not be the only aspect that needs separation between non-production CI and production CD, the whole execution environment should be separate, or ideally ephemeral with a new environment provisioned from a golden image for each run.

Javascript supply chain attack scenarios and mitigation methods

There are a few ways to include Javascript code into a web site. Depending on who writes the code and where it is hosted, there are different supply chain security considerations and mitigation methods. I will briefly cover each case and discuss what I think are gaps in current solutions.

Third-party libraries

At the top (or leftmost) of the code lifecycle there are likely third-party libraries that are included as dependencies, such as ones from NPM. One way to ensure consistency is to follow available versioning data (e.g. semver), and consciously update these third-party libraries at a reasonable frequency. Once a new set of versions is picked, you can run them through vulnerability management tools, static and dynamic code analysis tools, and testing.

It is a harder problem to verify that NPM itself has not been compromised to host malicious library code. One way to detect against this attack vector is to fetch the same set of libraries at fixed version sets from multiple vantage points on the Internet, or temporally over a period of time. However it does not completely solve the issue. I am not aware of a standard solution, or heard of people actively monitoring for this attack scenario.

Including Javascript Files

Javascript files are mostly hosted as static assets, either on the same machine as the web server, or more commonly in CDNs. Sometimes, you are also including other people's Javascript files directly from them, such as Google Analytics. In the context of Single Origin Policy, these included files are now acting in the parent origin. Regardless of where the file is hosted, there are two methods to declare intent.

The first is Content Security Policy (CSP), which is a HTTP header sent on a per-page basis. CSP can do quite a few things, but in this context it can declare a list of domains allowed to be included in the page being served. Any Javascript files outside of these domains would be blocked by the browser. However, the level of granularity stops at the domain level.

The second is Subresource Integrity (SRI), which is a per-include (<script>) pre-computed hash that indicates the expected file in the include. This is a fairly fine grained capability to ensure consistency, especially if the included file is versioned. This does not work if the owner of the included file is not aware of this restriction and intends to update their files without versioning at the same path, or if the included Javascript file further includes other files which can change without reflecting in the hash.

One immediate improvement to SRI would be to allow hashes of all recursive dependencies, which is mostly straightforward to generate (unless there are dynamic construction of URLs). It would be even better if such external Javascript files all move to a versioned scheme found in libraries and software, so better consistency can be achieved.

There are use cases where an included Javascript changes each time it is served, or are dynamically generated depending on the environment. In these cases, if SRI were to be used, these external parties need to have an out-of-band channel to dynamically generate and deliver SRI hashes so they can be included when including these external files. This creates engineering and timing problems, and still does not solve the problem of a compromised external server that is both serving malicious Javascript files and computing its hash.

An even more ambitious approach is to template these external Javascript files, and separate code from data. This way, the Javascript files become less dynamic but with pre-defined variables where dynamic data can be populated. This is not dissimilar to prepared statements in SQL to prevent SQL injection attacks. The data component can be fetched without SRI and fed into templated Javascript files which are served with SRI. This approach further increases SRI coverage.

Inline Javascript Code

Inline Javascript code are not recommended these days because blocking all inline scripts systematically prevents XSS attacks. However, if inline code is really needed, CSP also offers declaration of intent with either a pre-computed hash or a random nonce in the CSP HTTP header.

User-generated Javascript

This is a case where a website is intentionally allowing users to input Javascript for a later presentation either to the same user or to a different consumer, most likely through an eval(). I do not think there is a general solution that can allow this securely. In the few instances where I have seen this happening, blacklist or whitelist approaches need to be used to reduce to a subset of all Javascript to ensure security. Of course, there are many Javascript obfuscation methods that make blacklisting or whitelisting difficult to get right.

I am still fairly new to some of these capabilities, so please let me know if there are errors in this post.

GDPR: the hard parts are still hard

I took a trip to France and England a week before GDPR goes into effect, and did not really get an impression that the implementations of GDPR were underway, in the way they were in US. The signs were the little things, such as asking for names and email addresses when logging into public wifi, and SNCF automatically signing me up to their marketing emails after a train ticket purchase (which ironically gave an error when I tried to unsubscribe).

To me, as a consumer, there are two main parts of GDPR. The first is marketing emails. This part has been most visible in the past few weeks as companies send consent emails asking permission for marketing communications after GDPR takes effect. Most of these emails, before and after consent, are already properly implementing the unsubscription workflow. The rise in popularity of commercial off-the-shelf marketing and email tools make unsubscription more standard. The result is win-win for both consumers (being able to unsubscribe) and websites (getting a higher quality mailing list).

The other part is the use of personal information. This aspect arguably is more of the spirit of GDPR, but is harder to enforce. As a consumer, and without a court order, it is almost impossible to know if my information is being used, or when and where if they are, or to verify that it is not being used if I do not give the proper consent. This is more of a problem when the information is processed for others to consume, as the recipient has little incentive to ensure that my rights are being protected. For example, if a company is selling market reports, will the buyers really ask for the legal basis on which the underlying data is collected, and make purchasing decisions based on the answer?

In the end, I think GDPR is still a good thing. There is a fine balance between the size of the EEA market and the requirements of the regulation. Too small the market, or too strict the regulation, and businesses will simply walk away. It is also likely the most publicised regulation relating to security and privacy, bringing some healthy spotlight to the industry.