Why I spend my nights correcting CISA's vulnerability data

Ruben Bos
24 Feb, 2026
- Comparisons

It is 2:00 AM. Most of the Netherlands is asleep, but I am staring at a CVE record that is supposed to be boring.

A CVSS vector. A few fields of metadata. The kind of data you import without thinking, because it is supposed to be the stable part. The kind of data that silently powers dashboards, alert rules, and patch queues across the industry.

And it is wrong.

Not “debatable” wrong. Not “depends how you interpret the spec” wrong. Just wrong in a way that is not defensable, and will set off alarms in the morning to send real people chasing a fake emergency.

That is the part that keeps me awake. Public vulnerability data is treated like ground truth, but too often it is incorrect. Worse, it is often impossible to audit. You get a number, you get a vector, and you are expected to trust it without seeing the reasoning that produced it.

The mess we all inherited

Public CVE data is messy right now.

On one side, you have the vendors (CNAs). Many are excellent engineers, but CVSS is a specification with sharp edges. If you do not live in it, it is easy to score by vibes. Often, everything ends up scored as critical. But if everything is critical, then nothing is.

On the other side, you have the public enrichment bodies. The NVD has the volume problem. CISA Vulnrichment has the volume problem too, just with a much healthier feedback loop. Either way, humans are trying to keep up with a rising number of CVEs under real time pressure.

Then fuel was poured on it with bad automation. Not the kind that lives by logic, and refuses to guess when it can’t justify the result. The other kind. The kind that confidently produces a vector that looks legitimate until you read it twice. It has become so simple nowadays to vibe code something yourself. Something that looks impressive, but fails when you scale it or try to make it behave deterministicly.

This is a growing problem. Our data will suffer under this new hype of vibe coding, while it was already suffering from previous quality issues. The result is not just a slightly inaccurate dataset. It means that every security product and vendor building on top of this data starts off on the wrong foot, or is forced to spend immense resources gathering accurate and timely data themselves, which is neither feasible nor cheap.

From watching to fixing

I started off by trying to make people aware of this problem. For months, I was posting examples on LinkedIn, pointing out the absurdities in the scores I was seeing. It got engagement, but it didn’t change the data. It was just noise complaining about noise.

After some time, and suggestions from others in the community, I realized that pointing fingers wasn’t enough. If I wanted better data, I had to help build it. I decided to stop just watching and actually bring change.

The CISA Vulnrichment project turned out to be the perfect place for this. With NIST currently dropping the ball (leaving thousands of CVEs unenriched or deferred) CISA has stepped up to fill that gap. They are now enriching large portions of the NVD backlog, and essentially becoming the de facto standard for actionable vulnerability intelligence.

Crucially, they welcome the community to give input. Unlike the black box of other agencies, CISA manages their enrichment publicly. They have a clear path for reporting problems via their public repository.

So I stopped posting screenshots and started filing issues. If you browse their tracker, you will see a user named 003random. That is me.

These corrections are all powered by my co-founder (Karel Knibbe) and I spending the last three years building this in the dark: discussing specifications, finding weird edge cases, and writing thousands of tuning and evaluation samples. Along the way, we invested (not kidding) thousands of hours in CVSS specification work and thousands more hardening our analysis pipeline.

We can now compare a traditional, human-first analysis against our automated vulnerability analysis pipeline, and we can do it in public, with receipts.

We are presenting our approach at #VulnCon26 in Scottsdale, Arizona ☀️🌵

Man vs. Machine: The results

Our approach at Volerion is weird. We did not train a model to guess severity or predict vectors, which is the direction most papers currently suggest. We deliberately did not take that route, because you would be training on the exact data we are trying to correct.

Instead, we spent a long time going back to the CVSS standard and asking a more basic question: what actually drives each metric value? This led us to a graph-based approach, where we model all realistic attack scenarios as a connected graph, so we can then programmatically derive CVSS, instead of asking the model for the vector.

So yes, we use AI to beat manual analysis, but no, we never ask the AI for the conclusion. We derive that from the facts.

This graph-based modeling is superior because it forces explicit definitions of every step in an attack chain. When you model all steps within an attack, you cannot “forget” requirements anymore, as that would break the scenario. After we have modelled all possible attack paths, the traversal engine simply follows these paths, and the CVSS score is a mathematical consequence of the path, instead of a checklist based on vibes.

When we run this against public data, we find mistakes. A lot of them. In total, we submitted 50+ issues, correcting hundreds of vulnerabilities. Each and every vulnerability so far had its vector changed. Below are some examples of how our graph catches what humans miss.

Example 1. Actions & Systems

Our graph models logical attack scenarios as distinct steps: attacker, user, and system actions. This structure is critical because it forces us to identify who must perform an action at what part of the attack.

Across several Cross-Site Scripting (XSS) vulnerabilities we analyzed, CISA data originally listed user interaction as none (UI:N) in cases where a victim still needed to render attacker-controlled content.

However, when our pipeline constructs the graph, it maps the flow of the attack. An attacker injects a payload, but that payload does nothing until a victim browses to the affected page or clicks a link. Only then does the script execute in the victim’s browser context and achieve its impact.

Our traversal engine (going over those paths) simply notices the user interaction node being the mandatory bridge between the injection and the execution. Without that user action, the path to the impact node is broken. Therefore, user interaction is required (UI:R).

This correction (issue #242) is vital because it changes the CVSS severity from high to medium, but more importantly, it tells downstream consumers that this vulnerability can not be exploited without a victim being present during the attack.

Example 2. Metadata

Modeling steps is just the beginning. The power of our approach comes from the metadata we attach to each node in the graph. These properties, such as requires privileges, the proximity to the vulnerable system, or what the kill chain phase is of that action (delivery, exploitation), are specially curated to allow a combination of every CVSS metric value.

In the case of CVE-2025-69429 (and similar vulnerabilities), the original CISA analysis treated it as a network-based attack (AV:N).

However, when our pipeline constructs the graph, it shows an attacker action with a physical proximity to the vulnerable system (inserting USB). Only after that action has been completed, the attackers proximity can be relaxed to adjacent, because the attacker can then access the connected USB drive via the web interface of the NAS. When our traversal engine walks this attack path, it notes the physical requirement of the attacker, and determines that the attack vector metric should be set to physical (AV:P).

This correction (issue #279) is super important, because the risk of the vulnerability decreases significantly between a network-based and physical-based attack. Getting these properties right is essential for any prioritization done downstream.

Example 3. Conditions

Our graph does not just track actions and properties; it also models conditions. These are constraints that must be satisfied before the attack can continue. In CVSS terms, these conditions map directly to Attack Complexity (AC) or Attack Requirements (AT).

For CVE-2026-24071, the initial rating missed a crucial detail: the race condition. The vulnerability relied on winning a specific timing window between a check and a use. Human analysts often overlook this or struggle to quantify it, defaulting to AC:L and AT:N.

However, when our pipeline constructs the graph, it explicitly includes the race condition as a distinct constraint. Our traversal engine simply notices that the success of the transition is not guaranteed and depends on factors outside the attacker’s direct control (timing). Because of this modeled constraint, the engine automatically flags this path as having a high attack complexity (AC:H) (or AT:P in CVSS 4.0).

This correction (issue #269) demonstrates how our engine removes the guesswork. If a race condition is required during an attack, the resulting vector must reflect that complexity. This auditable approach also makes sure that our vectors can be verified and are evidence-based.

For the interested reader: A fun vulnerability with many conditions (of various types) is CVE-2025-67647. Note that we also model configuration conditions, but that those do not (as decided by our traversal engine) contribute towards a high attack complexity.

Example 4. Multiple Scenarios

Up until this point, we have covered what an attack path could look like and why our conclusions often diverge from other vectors. And the coolest part has yet to be introduced.

Sometimes, a vulnerability isn’t just one story. It can be exploited in multiple ways, under different circumstances, and with different impacts across different systems. A common pitfall in manual analysis is “scenario mixing”. Basically, cherry-picking the worst metrics from different, mutually exclusive attack paths to create a “Frankenstein” vector that describes an impossible attack.

For CVE-2024-54192 (and many others), the original CISA vector was a mix: it listed low privileges (PR:L) but also that user interaction was required (UI:R). This creates a logical contradiction by merging two distinct scenarios. The first scenario involves an attacker tricking a victim into opening a malicious file, which requires user interaction but no attacker privileges. The second scenario involves an attacker with existing privileges executing the file themselves, requiring no user interaction. Mixing these creates a vector that represents an impossible and unrealistic attack path.

However, when our pipeline constructs the graph, it separates the logical paths into two distinct scenarios. The traversal engine scores these paths independently and allows us to clearly see that mixing them violates the logic of the attack. For this vulnerability, you cannot score the penalty of user interaction if you already score a privileged attacker executing the code.

This correction (issue #284) ensured the final vector reflected a coherent, realistic attack path, rather than a mix of two conflicting stories.

Why we actually do this

We aren’t up at 2 AM because we enjoy arguing about CVSS strings. We do it because vulnerability feeds are the invisible foundation that every downstream security decision rests on. When that foundation is cracked, everything built on top of it wobbles. Dashboards, prioritization engines, patch queues, automated responses. They all rely on that single source of truth.

This year, FIRST forecasts we might hit 100,000 published vulnerabilities (see Vulnerability Forecast 2026). The industry’s answer to this volume is automation. AI SOCs, autonomous patching, and agentic workflows. But the hard truth is simple: if the data is wrong, your product will be wrong too.

If your platform tells a customer to patch a “critical” and it turns out to be a physical-only edge case, you lose credibility fast. And the frustrating part is that it is often not even your team’s fault. Your logic is only as good as the data you base it on.

This is why we obsess over deterministic analysis and auditable scoring. Generating a score that looks right on a few samples is easy. Building a system that remains consistent, accurate, and explainable at the scale of 100k+ vulnerabilities a year is hard. But it is necessary.

For us, this is not a thought exercise or a marketing stunt. It is production reality. Every vector difference between our analysis and public data is a quality checkpoint, and we continuously compare against the CISA Vulnrichment project to harden our pipeline.

What is the solution?

Funny that you ask. We now make our high-fidelity data available through an NVD-compatible API (or via our enriched scheme if you want the full depth). In most stacks, rollout is a host swap. Same ingestion flow, better source.

The payoff is straightforward. You get results you can reproduce, explain, and defend. We do this so your team does not have to burn valuable resources trying to build it in-house. Maintaining an internal enrichment pipeline is a massive distraction that will likely never reach the same accuracy, and simply is not worth the cost.

If you want to evaluate us with minimal risk, send us some CVE IDs from your current backlog that caused noise for customers. We will send you the data you could have pulled from our API, and walk you through the differences together.

If you are ready to switch, then please email me at ruben@volerion.com, or schedule a chat directly at https://calendly.com/ruben-volerion/lets-chat.

Apart from auditable CVSS, we also provide SSVC, as well as automated CPE enrichment, together with affected versions (semver), per-product remediation. And executive summaries. Standards/Specifications such as ATT&CK and CAPEC are coming soon.