Learning mode¶
Learning mode observes the URLs that a specific client, subnet or user-agent hits over a window of time, then synthesises a starter .policy document the operator reviews and applies. The canonical use case is first-deployment baseline discovery: "we just installed EnforceGate vX, what does our organisation actually browse?".
Configuration¶
Learning is initialised on engine boot from the [learning] section in engine.conf:
If the section is absent or database_file is unset, the learning subsystem stays silently disabled — any request learning … API call returns "learning subsystem not initialised". To turn learning on, add the section and restart the engine (or trigger a config reload — see configuration).
The learning database is separate from the policy database (engine.db). Wiping learning.db resets learning state without touching policy.
The workflow¶
The seven steps below produce a .policy document from a scoped capture window.
Step 1 — Provision a session¶
Three arguments:
<kind>— what to scope by. One ofip,subnet,ua(user-agent regex).<value>— the address, CIDR or user-agent regex.<uri-cap>— the hard cap on captured URIs for this session.
Recommended uri-cap:
| Scenario | Suggested cap |
|---|---|
| Single host audit | 1,000 |
| Department subnet, short window | 5,000 |
| Whole-organisation first-deployment baseline | 10,000+ |
By default, query strings are stripped from captured URIs (so ?utm_source=… chaff doesn't dominate). To keep them:
The session is created in configured state with an assigned numeric ID. List sessions:
> show learning sessions
# State Filter Captured / Cap Created
12 configured subnet 10.1.0.0/16 0 / 5000 2026-05-28T09:14:00Z
Step 2 — Start capturing¶
Activates the session. Only one session can be in running state at a time — starting a second one returns "another session is already running".
Captures happen post-verdict in the engine's URL handler, so the learning filter doesn't interfere with normal policy enforcement. Every request matching the filter is recorded (host, path, hit_count) until the URI cap is reached.
Step 3 — Let traffic flow¶
The session captures whatever URLs the filter sees. Browse the proxy or let regular users use it. Re-check progress with show learning session <id>:
> show learning session 12
Session 12:
Filter: subnet 10.1.0.0/16
URI cap: 5000
State: running
URIs captured: 2,847
Created: 2026-05-28T09:14:00Z
Started: 2026-05-28T09:16:42Z
Top URIs (by hit count):
hits host path
1247 www.google.com /search
913 mail.google.com /
708 www.youtube.com /watch
...
Step 4 — Stop the session¶
Capture stops; data persists. The session moves to stopped state.
Step 5 — Synthesise a .policy document¶
The synthesiser groups captured URIs by host and produces one rule per unique hostname, sorted by hit count descending.
Two arguments:
<id>— the session ID.<action>— the action attached to each synthesised rule. One ofpermit,deny,warn,aup.
The --no-stats flag strips the (N hits during learning session M) provenance tag from rule descriptions — use it on policies destined for production. Without --no-stats, every rule's description carries the provenance, which is useful while iterating but reads awkwardly on captive portal pages.
The output is a .policy document written to stdout — redirect to a file under rules.d/.
Step 6 — Review and apply¶
Open the synthesised .policy in your editor, drop hosts you don't want covered, consolidate similar entries, and adjust descriptions:
Validate and apply:
docker cp /tmp/50-warn-learned.policy enforcegate:/etc/enforcegate/rules.d/
docker exec enforcegate egctl request-policy-reload --dry-run # validate
docker exec enforcegate egctl request-policy-reload # apply
Or — in the REPL:
Step 7 — Clean up¶
Drops the session and its captured URIs from learning.db. Refuses if the session is still running — stop first.
V1 constraints¶
Documented limitations of the current learning implementation:
- One session at a time. A second
request learning startfails until the first is stopped. - Per-host synthesis only. Per-path data is captured but not surfaced in generated rules — if you need path-level granularity, hand-edit the synthesised
.policyto split host rules into per-path rules usingmatch-uri-regex. - Output to stdout only. No
request learning export <id> <path>shortcut yet; redirectrequest learning analyzestdout to a file. - Capture re-engages on engine restart. If the engine is restarted with a
runningsession, a boot-time rescue promotes the session tostoppedto avoid stale state. Restart, re-request learning start <id>if you want to continue.
Operational guidance¶
- The default-permit baseline is the right place to start learning — capturing post-verdict means a default-permit posture means the engine sees the real organisation traffic. With default-deny active, learning only sees the requests that already pass policy, which is not what you want for baseline discovery.
- For a multi-day capture, aim the URI cap somewhere comfortable above the day-1 capture rate — when the cap is reached, capture stops and the session stays at "URIs captured:
" until you stop it. - Synthesised rules are starting points, not production policy. Always review before applying, especially the
warn/denyaction choices.
See the egctl reference for the full verb list, and the policies reference for the .policy file format that learning generates.