Skip to content

Learning mode

Learning mode observes the URLs that a specific client, subnet or user-agent hits over a window of time, then synthesises a starter .policy document the operator reviews and applies. The canonical use case is first-deployment baseline discovery: "we just installed EnforceGate vX, what does our organisation actually browse?".

Configuration

Learning is initialised on engine boot from the [learning] section in engine.conf:

[learning]
database_file = "/var/lib/enforcegate/learning.db"

If the section is absent or database_file is unset, the learning subsystem stays silently disabled — any request learning … API call returns "learning subsystem not initialised". To turn learning on, add the section and restart the engine (or trigger a config reload — see configuration).

The learning database is separate from the policy database (engine.db). Wiping learning.db resets learning state without touching policy.

The workflow

The seven steps below produce a .policy document from a scoped capture window.

Step 1 — Provision a session

eghost cli
> request learning create subnet 10.1.0.0/16 5000

Three arguments:

  • <kind> — what to scope by. One of ip, subnet, ua (user-agent regex).
  • <value> — the address, CIDR or user-agent regex.
  • <uri-cap> — the hard cap on captured URIs for this session.

Recommended uri-cap:

Scenario Suggested cap
Single host audit 1,000
Department subnet, short window 5,000
Whole-organisation first-deployment baseline 10,000+

By default, query strings are stripped from captured URIs (so ?utm_source=… chaff doesn't dominate). To keep them:

> request learning create subnet 10.1.0.0/16 5000 keep-query-strings true

The session is created in configured state with an assigned numeric ID. List sessions:

> show learning sessions
#   State       Filter                       Captured / Cap   Created
12  configured  subnet 10.1.0.0/16           0 / 5000         2026-05-28T09:14:00Z

Step 2 — Start capturing

> request learning start 12

Activates the session. Only one session can be in running state at a time — starting a second one returns "another session is already running".

Captures happen post-verdict in the engine's URL handler, so the learning filter doesn't interfere with normal policy enforcement. Every request matching the filter is recorded (host, path, hit_count) until the URI cap is reached.

Step 3 — Let traffic flow

The session captures whatever URLs the filter sees. Browse the proxy or let regular users use it. Re-check progress with show learning session <id>:

> show learning session 12
Session 12:
  Filter:        subnet 10.1.0.0/16
  URI cap:       5000
  State:         running
  URIs captured: 2,847
  Created:       2026-05-28T09:14:00Z
  Started:       2026-05-28T09:16:42Z

  Top URIs (by hit count):
  hits   host                          path
  1247   www.google.com                /search
   913   mail.google.com               /
   708   www.youtube.com               /watch
   ...

Step 4 — Stop the session

> request learning stop 12

Capture stops; data persists. The session moves to stopped state.

Step 5 — Synthesise a .policy document

The synthesiser groups captured URIs by host and produces one rule per unique hostname, sorted by hit count descending.

docker exec enforcegate egctl learning-analyze 12 warn --no-stats \
    > /tmp/50-warn-learned.policy

Two arguments:

  • <id> — the session ID.
  • <action> — the action attached to each synthesised rule. One of permit, deny, warn, aup.

The --no-stats flag strips the (N hits during learning session M) provenance tag from rule descriptions — use it on policies destined for production. Without --no-stats, every rule's description carries the provenance, which is useful while iterating but reads awkwardly on captive portal pages.

The output is a .policy document written to stdout — redirect to a file under rules.d/.

Step 6 — Review and apply

Open the synthesised .policy in your editor, drop hosts you don't want covered, consolidate similar entries, and adjust descriptions:

$EDITOR /tmp/50-warn-learned.policy

Validate and apply:

docker cp /tmp/50-warn-learned.policy enforcegate:/etc/enforcegate/rules.d/
docker exec enforcegate egctl request-policy-reload --dry-run    # validate
docker exec enforcegate egctl request-policy-reload              # apply

Or — in the REPL:

> request policy reload dry-run true
> request policy reload

Step 7 — Clean up

> request learning delete 12

Drops the session and its captured URIs from learning.db. Refuses if the session is still running — stop first.

V1 constraints

Documented limitations of the current learning implementation:

  • One session at a time. A second request learning start fails until the first is stopped.
  • Per-host synthesis only. Per-path data is captured but not surfaced in generated rules — if you need path-level granularity, hand-edit the synthesised .policy to split host rules into per-path rules using match-uri-regex.
  • Output to stdout only. No request learning export <id> <path> shortcut yet; redirect request learning analyze stdout to a file.
  • Capture re-engages on engine restart. If the engine is restarted with a running session, a boot-time rescue promotes the session to stopped to avoid stale state. Restart, re-request learning start <id> if you want to continue.

Operational guidance

  • The default-permit baseline is the right place to start learning — capturing post-verdict means a default-permit posture means the engine sees the real organisation traffic. With default-deny active, learning only sees the requests that already pass policy, which is not what you want for baseline discovery.
  • For a multi-day capture, aim the URI cap somewhere comfortable above the day-1 capture rate — when the cap is reached, capture stops and the session stays at "URIs captured: " until you stop it.
  • Synthesised rules are starting points, not production policy. Always review before applying, especially the warn/deny action choices.

See the egctl reference for the full verb list, and the policies reference for the .policy file format that learning generates.