Autonomous Penetration Testing: Now available

Apr 2, 2026

17 min read

Share:

Autonomous Penetration Testing: Now available

The problem, the solution, the result.

Security testing has always been a race against the clock. Today, we stopped the clock.

Every consultant knows how much knowledge, creativity, depth, and effort goes into an expert-level penetration test. The majority of clients cannot afford a full-blown expert-level penetration test, and eventually we're forced to work on a shoestring budget, which means we're forced to either cut corners and deliver a subpar product, turn down the client, or work overtime and at a loss.

At Cyberware, we've delivered hundreds of penetration tests for clients across the globe, and we've never been more proud of the work we've done, but we've often had to work overtime and out of budget to deliver quality work. Another massive issue is the engagement window. A typical engagement window is 1-2 weeks, which in some cases where the attack surface is large is simply not enough time to conduct a thorough penetration test.

Human offsec consultants are good at what they do, and without a doubt everyone is passionate about it, but as a consultant with 10 years in the industry, I can tell you that the engagement window is not the only constraint. The real deadline is the moment an attacker discovers what your last assessment missed. Cyberware Autonomous Penetration Testing is designed around that deadline.

We've worked hard to build something that can replicate the ingenuity and novelty of human penetration testers, work around the engagement window and budget constraints, and ultimately solve the primary constraints every client and consultant is facing in the industry.

After 2 years of development and three months of closed-beta testing on active production environments, the Cyberware Autonomous Penetration Tester is now fully operational and available across three distinct tiers. The Engine is an asynchronous, cross-domain, evidence-gated, reasoning, and context-aware security expert designed to be on par with human experts.

Affordable. Scalable. Novel.

Scale without compromising.

For decades, organizations have been forced to choose between the scale of automation and the reasoning of a human.

Automated scanners process fast and scale indefinitely, but they operate entirely on pattern matching. They cannot reason about business logic, nor can they deduce what a developer assumed. The result is an avalanche of false positives and a silent failure on complex authorization bypasses.

Manual penetration testing provides exactly what scanners lack - contextual intelligence. However, it is structurally constrained. Coverage ends when the engagement window closes or when consultant fatigue sets in.

Cyberware Autonomous Pentesting is a new operational tier. It applies human-like logic and reasoning at machine speed.

No proof, no noise.

The engine operates on a proprietary progression logic that prioritizes verified impact over theoretical risk. The architecture is explicitly designed to eliminate the single most expensive and annoying problem in automated security testing: false positives.

Unverified findings waste engineering hours, delay sprint cycles, and drain triage budgets on false alarms. A report bloated with theoretical risks guarantees alert fatigue. Here, every finding is confirmed exploitable before it is submitted.

Audit-ready evidence out of the box.

Cyberware Autonomous Pentesting produces audit-ready, evidence-backed findings appropriate for submission under:

  • PCI-DSS 4.0

    Requirement 11.3 penetration testing methodology.

  • SOC 2 Type II

    Control testing evidence for CC6 and CC7 domains.

  • ISO/IEC 27001:2022

    Annex A.8.8, management of technical vulnerabilities.

  • NIS2 Directive

    Article 21 technical security measures.

  • DORA (EU 2022/2554)

    ICT risk testing requirements.

Every proof-of-concept is reproducible. Every finding includes an exact replication path. No translation is required between the technical output and the evidence your auditors need.

Confirmed results across 40+ live environments.

In security, proof is the only currency. White-box engagements spanning web, mobile, and smart contracts accounted for 38% of high and critical findings, while black-box assessments identified 21.5% of vulnerabilities in those same severity tiers, but in different engagement scopes.

In more than one in three confirmed cases, an attacker operating against the same surface would have had a path to a material impact event like data exfiltration, privilege escalation to administrative control, financial drain or governance takeover, before any existing control would have triggered.

Zero of these findings were theoretical. All required a functional, executable proof-of-concept before they were committed to the report. This is the only honest definition of confirmed risk.

The following are selected few case studies from live production engagements.

  • The Invisible Admin: Privilege Escalation via API Parameter Tampering

    A classic trust disconnect between a secure-looking interface and the server behind it. The platform assumed safety because administrative controls were visually hidden from lower-privilege users, as they simply did not appear in the frontend.

    By reverse-engineering the application's API communication, the engine identified an authenticated endpoint that accepted an explicit role-elevation parameter directly in the request body, a parameter entirely absent from any user-facing UI interaction. It submitted a request as a standard user, appending the administrative role value. The server validated the authentication token correctly, then honored the elevated role without any secondary permission check. The response returned a full administrative payload. The engine captured the differential between the baseline low-privilege response and the administrative response as the proof-of-concept artifact.

    The configuration file exposure was discovered independently during surface enumeration: the engine, having mapped the application's naming conventions during reconnaissance, identified a non-indexed endpoint that returned a partially serialized application configuration in response to an unauthenticated GET request. The configuration contained live OAuth client credentials in plaintext. The engine verified they were active by issuing a token request directly against the OAuth endpoint and receiving a valid access token in response,confirming the credentials had not been rotated or expired. The chain then ran in strict sequence: 1. the unauthenticated config read provided the credentials; 2. the credentials provided a valid authenticated identity; 3. and that identity, combined with the role-elevation parameter, unlocked full administrative access. Three steps. Zero prior access required.

    Business impact: An attacker with no legitimate access could have assumed administrative control of every tenant account, resulting in a full organisational breach.

  • The Real-Time Puppet: Unauthenticated Real-Time Hub Takeover

    The REST API was correctly protected by token-based authentication. The real-time notification layer, operating on a separate connection path alongside the REST API and maintaining persistent connections to every active user session, required no authentication to establish a connection or push messages.

    The engine identified the real-time layer by parsing the application's JavaScript bundle during reconnaissance, locating the client-side hub connection initialization code and the endpoint it targeted. It established a connection to the hub with zero credentials and confirmed the server accepted it without challenge. The engine then verified the extent of control by injecting fabricated notification payloads into active authenticated user sessions: phantom alerts, spoofed security alarm signals, and broadcast messages to all connected clients simultaneously. The injected notifications were visually indistinguishable from legitimate platform alerts. All injections were verified live in a running browser session as the proof-of-concept artifact.

    The engine additionally confirmed that the message schema required to craft valid injections was fully visible in the client-side JavaScript, no additional recon or server-side access was needed to construct a convincing payload. Any actor with access to the application's public-facing JavaScript had everything needed to operate the attack.

    Business impact: The ability to inject false alerts into authenticated user sessions converts a technical gap into a human compromise tool with no additional technical sophistication required.

  • Structural RBAC Collapse: Systemic Authorization Bypass across 12 Endpoints

    The engineering team had applied rigorous authorization controls to write operations: every modification endpoint correctly returned a 403 to unauthorized users. No equivalent enforcement had been applied to read endpoints serving the same data.

    The engine tested authorization boundaries by replaying the API calls of a low-privilege authenticated session against endpoints observed during high-privilege session mapping. A low-privilege user received identical responses to those returned to senior administrators, including operational records, location data, and third-party integration configurations, with no error, no access differentiation, and no server-side indication that the access was unauthorized. The engine confirmed twelve distinct endpoints sharing the flaw through exhaustive authorization boundary testing across all observed access tiers.

    A further authorization check on a reporting endpoint returned an embedded analytics dashboard directly in the response body, including a hardcoded tenant identifier and access key, accessible to any authenticated session regardless of privilege level, and with no authentication at all from certain request paths.

    Business impact: Twelve access points silently serving restricted operational data to any logged-in user represents a persistent, undetected exfiltration surface. The category that produces the longest breach dwell times precisely because it generates no errors and triggers no alerts.

  • The Credential Funnel: Three-Stage Unauthenticated Admin Compromise

    The engine identified an exposed user directory via a default API path that returned paginated records including account identifiers and login names without authentication. Separately, it discovered a legacy authentication protocol still active on the target and confirmed it accepted batched credential submissions in a single request, multiplying testing throughput by nearly two orders of magnitude relative to per-connection throttle limits, rendering standard rate limiting ineffective. The engine simultaneously identified a CORS misconfiguration permitting cross-origin reads of authenticated administrative responses from arbitrary origins.

    The three findings were correlated into a sequential chain: harvested login names fed the batched credential test; confirmed credentials provided authenticated session access; the CORS misconfiguration enabled cross-origin extraction of admin session data; and enumerated hostnames from the authenticated surface resolved to management interfaces reachable from the public internet without restriction. The chain terminated at the hosting infrastructure.

    Business impact: This chain terminates at physical hosting infrastructure. The blast radius is not a data record. It is the availability, integrity, and confidentiality of the entire operating environment.

  • The Replay: Complete Airdrop Pool Drain via Proof Reuse

    A token distribution contract deployed Merkle proof verification to validate claim entitlement, and that verification functioned correctly in isolation. The contract contained no mechanism to record whether a specific proof had been previously honoured. The engine identified the absence of claim-tracking state by auditing the contract's storage layout and confirmed exploitability by executing the same valid proof in repeated calls against a forked mainnet environment, collecting the full entitlement amount on each invocation until the entire distribution pool was drained. The proof-of-concept was fully scripted and autonomously reproducible.

    A secondary finding identified that pending claim transactions were observable in the mempool before on-chain confirmation. The engine demonstrated the ability to identify an in-flight legitimate claim, construct a replacement transaction with a higher gas price targeting the same proof, and have it confirmed first, redirecting the legitimate entitlement to a controlled address. Both findings were confirmed independently against the forked environment.

    Business impact: An empty token distribution pool is a direct, quantifiable financial loss event with an on-chain audit trail. For any protocol with regulatory exposure, it is also a demonstrable failure of financial controls.

  • Flash Vote: Instant Governance Takeover Without Cooldown

    The protocol calculated voting weight at the moment of deposit with no minimum holding period before a deposit counted toward governance decisions. The engine sourced a flash loan from an on-chain lending pool within a single atomic transaction, acquiring a controlling governance stake that exceeded the combined holdings of all other participants. It cast a decisive vote within the same transaction, then repaid the borrowed capital in full before the block closed. The entire sequence (borrow, vote, repay) executed atomically, leaving no residual position and generating no on-chain trace distinguishable from a standard governance interaction. Confirmed via a Foundry fork proof-of-concept against a pinned mainnet state.

    A parallel finding in the same engagement: during a simulated L2 sequencer outage, the protocol's price oracle continued serving stale data without staleness detection. The engine demonstrated the ability to open and close positions priced against the stale oracle state, extracting value from the spread between the stale price and the true market price for the duration of the outage. Confirmed independently against a forked environment with a simulated sequencer halt.

    Business impact: A governance takeover completed in a single transaction with zero capital committed is an existential risk to any protocol where governance controls treasury, upgrade authority, or parameter settings. The attacker leaves no position and no trace.

  • The Retroactive Reward Theft: Cross-Epoch Reward Attribution Exploit

    The liquidity mining contract calculated reward distribution at payout time using each position's current pool share, rather than snapshotting each position's share at the close of the accrual epoch. The engine confirmed the flaw by opening a liquidity position after a complete accrual epoch had elapsed, then triggering a payout call against the contract. The contract attributed the full epoch's accruals to the newly opened position and transferred rewards proportional to the epoch's total pool, none of which the position had contributed to earning. The operation required no privileged access, no price manipulation, and no special tooling: a standard deposit transaction timed after epoch close was sufficient.

    A second finding in the same codebase: the engine identified that an emergency recovery function accepted a version number from the caller and wrote it directly to the protocol's system configuration without bounds validation. Supplying a value exceeding the currently deployed version permanently overwrote the valid version reference, destroying the protocol's ability to invoke future emergency recovery, resulting in irreversible lockout of a safety mechanism triggered through the mechanism itself.

    Business impact: The secondary finding eliminates the ability to respond to a future emergency, not just the current one. An irreversible operational lockout of a safety mechanism triggered through a feature designed to protect the protocol.

  • The Ghost Referral: Flash Loan Referral Qualification Bypass

    A DeFi lending protocol's referral system gated high-value referrer status on a minimum USD balance threshold, validated at the moment of registration with no minimum holding period. The engine flash-borrowed sufficient capital from an on-chain lending pool to satisfy the threshold within a single atomic transaction, called the referral registration function while the borrowed capital was in-flight, and repaid the loan within the same block, committing zero net capital. The contract recorded the controlled address as a qualifying referrer. The engine then confirmed that subsequent organic deposits under the referral address triggered fee distributions to the controlled address in perpetuity, with no mechanism to re-validate the original qualification criteria. Confirmed autonomously with a proof-of-concept demonstrating full qualification bypass at zero net position.

    Business impact: Financial reward infrastructure that can be gamed with zero committed capital creates an indefinitely exploitable drain on protocol revenue with no natural detection mechanism.

All findings above were autonomously confirmed during live engagements with scripted proof-of-concepts, response differentials, on-chain fork tests, or browser-verified state changes. No theoretical extrapolation.

Why security teams choose Cyberware.

Cross-domain testing: your web app, API, mobile client, and contracts are one connected surface.

An e-commerce platform has a web application, an API, a mobile client, and third-party payment integrations. A DeFi protocol has a frontend, an API, and on-chain contracts. A CeFi exchange has all of the above. Vulnerabilities do not respect those boundaries. A business logic flaw in the web flow may be reachable through the mobile client with a completely different surface area. An authorization gap in the API may compound an on-chain accounting error. Cyberware maps all surfaces together, tests cross-surface interaction, and correlates findings that would be invisible to tools evaluating each layer in isolation. Coverage extends to compiled and binary targets, which are fully decomposed for exposed secrets and backend logic before runtime interaction begins. Hardened security controls are bypass targets, not hard stops. Sensitive data leaking through logs, inter-process channels, or network traffic is identified across every test category. The engagement model operates regardless of access level: source code, gray-box, or fully black-box. Whether your attack surface is a single web application or a hybrid stack spanning multiple chains, APIs, and mobile interfaces, the coverage map does not close until every surface is confirmed.

Scanners leave the most dangerous gaps untouched, and are overpriced.

Whether you run a DeFi protocol, a fintech platform, or an e-commerce operation, the scanner in your current stack produces the same output: a list of known CVEs matched against known patterns. That output misses business logic flaws, authorization boundaries that collapse only under specific conditions, and attack chains built from independently low-severity observations. For a payment platform, the flaw that drains accounts is in the checkout sequence, not the CVE database. For a DeFi protocol, the exploit that empties the yield pool is in the economic logic, not the dependency list. Signature-based tooling cannot see either.

Coverage does not degrade when the attack chain gets complex.

Human testers do their best work in the first hours of an engagement. Deep multi-step chains, those requiring correlation across independently low-severity observations, demand sustained focus over time. The engine does not experience diminishing attention on complex paths. It follows every chain to its worst confirmed consequence, regardless of how many steps it takes to get there. A three-stage authorization bypass chained into a credential exposure chained into infrastructure access is handled with the same discipline as a single-step injection.

Full autonomy from start to finish.

We are distinctly transparent about what this engine is: a fully autonomous security intelligence system. It does not assist a human analyst. It operates independently, end-to-end.

Every phase of the engagement, from surface discovery through exploitation and final reporting, is executed autonomously. No task requires operator input, no finding requires human validation before it is included, and no phase requires human sign-off before the next begins. The engagement runs to completion on its configured terms.

The human is an observer, and only enters the picture in two scenarios. The first is when an external obstacle blocks access, and the agent cannot recover by itself. The agent requests the minimum intervention needed to restore it, and resumes with full autonomy.

The second is the Enterprise plan's human-on-the-loop model. This is a meaningful distinction from human-in-the-loop: the engine does not require a human in order to operate, and a human does not direct or approve its actions. Under this model, a senior security professional is available to the engine as an additive resource. The engine may surface a finding or an ambiguous signal for a second opinion, or the human may interrupt the agent to provide input or contribute context. The engagement continues either way. Human involvement here is additive, not structural.

This is not a tool that empowers the analyst. It is a system that delivers results the analyst would otherwise have to produce themselves, without the fatigue ceiling, the working-hour constraint, or the cognitive load of thinking out of the box. The analyst receives a finished, actionable report.

Strict constraints protect your production.

The engine operates within a strictly enforced constraint set. Before any test is executed, the target is rigorously verified against a defined scope. Anything outside that scope is not tested; instead, it is logged as a strategic recommendation and surfaced in the final report. Out-of-scope targets are never probed, regardless of how tempting a discovered vulnerability chain might be. This is not a configurable user setting, but a non-negotiable enforcement rule that cannot be overridden mid-engagement.

Organisations in production 24/7, including financial services, healthcare, and critical infrastructure, have run closed-beta engagements without incident. Out-of-scope systems are logged as strategic intelligence, not tested. This distinction matters most when the engine is operating autonomously: the engine cannot drift.

Crucially, the engine does not perform destructive operations. It is designed to identify, demonstrate, and evidence vulnerabilities without deleting, modifying, or damaging underlying data. All findings are verified through a precisely controlled proof-of-concept.

Furthermore, the agent is heavily hardened against the very environment it tests. Prompt injection attempts embedded in target responses, a widely recognized class of attack against AI-driven systems, are actively neutralized in strict alignment with OWASP AI Security guidance. The autonomous agent cannot be redirected, manipulated, or repurposed by any malicious content returned from a target. Its core instructions, scope, and operational boundaries are permanently fixed at the initialization of the engagement and cannot be overwritten by anything encountered during a test.

Ultimately, what you receive is a completely controlled, evidence-backed, and safely contained assessment. The system knows precisely what it is authorized to do and executes absolutely nothing beyond that boundary.

Stop guessing. Start testing.

Cyberware Autonomous Penetration Testing is available globally starting today.

Note from the Founders: When we integrated the Bulhack infrastructure into Cyberware last year, we signaled that an autonomous division was scaling in the background. Today's launch is the culmination of that quiet development.