Rules & Guardrails — Ready1Go Admin

🤖 Personality & Mode 🕐 Business Hours 💬 Messages 🛡 Rules & Guardrails 🔖 Preset Queries 🚨 Violations

Rules Active

—

configured rules

Blocked This Month

—

messages intercepted

Profanity Filtered

—

last 30 days

Competitor Mentions

—

redirected

🤖 LLM Second-Pass Classifier

qwen2.5:7b

Runs after all regex and keyword checks pass. Catches paraphrased or creative violations that bypass exact-match rules — e.g. indirect crisis signals, obfuscated jailbreaks, novel profanity. Adds ~200–500ms latency per message only when all prior checks allow.

Confidence threshold

Only flag messages the model is at least this confident about. Higher = fewer false positives.

/ 1.0

Action when flagged

What to do when the LLM flags a message and the specific category has no defined action.

Categories to detect

Injection attempts — paraphrased jailbreaks regex misses Crisis signals — indirect expressions of distress / self-harm Creative profanity — obfuscated or novel offensive language Competitor mentions — semantic detection, not just exact brand names Personal data requests — indirect asks for private info

Custom instructions (optional)

🧪 Guardrail Tester

Test a message against all active rules without sending to users

RECENT AUDIT LOG

Time	Rule Triggered	Action
No recent events

Loading rules…