Curious Compass — Code Review¶
Branch: steve/prd-foundation-lock
Date: 2026-03-04
Reviewer: Friday
Scope: All Python in src/, services/, tests/, plus archived/, Docker, dependencies, architecture alignment
Executive Summary¶
| Category | Verdict | Notes |
|---|---|---|
| Security | ⚠️ PASS (with caveats) | No hardcoded secrets in production code. Static token pattern for SharePoint is a design risk. Archived .env committed (benign but sloppy). |
| Code Quality | ✅ PASS | Clean, well-structured, DRY core. Service-level boilerplate is duplicated but acceptable at this stage. |
| Architecture | ✅ PASS | Implementation matches architecture-consolidated.md faithfully. Pi-mom router, 4 agents, SQLite hot state, SharePoint cold store — all present. |
| Error Handling | ⚠️ PASS (with caveats) | Core loop handles tool errors gracefully. Three broad except Exception swallowing errors silently in service configs. No max-iteration guard on agent loop. |
| Type Safety | ⚠️ PASS (with caveats) | Good use of dataclasses and type hints overall. A few gaps: callable instead of Callable, **kwargs: Any throughout tools. |
| Testing | ⚠️ NEEDS WORK | Core loop, LLM routing, and tools tested well. Zero tests for services, router, state manager CRUD, or Docker builds. |
| Dependencies | ✅ PASS | Sensibly pinned with floor+ceiling ranges. Lock file present. Minimal dependency set. |
| Docker | ❌ FAIL | Build context mismatch — docker-compose.yml specifies build: ./services/bot but Dockerfiles reference repo-root paths. Builds will fail. |
| Smells | ⚠️ MINOR | Duplicated service boilerplate, mock tools that may linger, shared Docker volume for SQLite without WAL locking guarantees across containers. |
Overall: Solid Week 2 foundation. The architecture is clean, the code is well-structured, and the core abstractions (agent loop, LLM adapter, tool system) are genuinely good. The Docker build config is broken and needs fixing before anyone tries to docker compose up. A few other items need attention before Week 3.
Detailed Findings¶
1. Security¶
S1. GRAPH_ACCESS_TOKEN — Static Token Pattern (P1)¶
File: src/compass/tools/sharepoint.py:58, docker-compose.yml:26,45,68
The SharePoint client uses a static GRAPH_ACCESS_TOKEN from environment. Graph access tokens expire after ~60-75 minutes. In production, this will silently fail mid-run.
The archived push_to_sharepoint.py (line 22-31) correctly implements OAuth client credentials flow with acquire_token(). The production code doesn't.
Fix: Implement token acquisition via client credentials flow in create_sharepoint_client(), using TENANT_ID, CLIENT_ID, CLIENT_SECRET (the same pattern used in archived/scripts/push_to_sharepoint.py). Add token refresh logic or short-lived caching.
S2. Archived .env committed to git (P2)¶
File: archived/.env
Contains a GCP service account email (curioushealth-worker@curious-473917.iam.gserviceaccount.com) and a local credential path. No actual secrets, but the .gitleaks.toml allowlist for archived/ means this was intentionally allowed. Still, best practice is to not commit .env files at all.
Fix: Remove archived/.env from git tracking (it's already allowlisted, but it sets a bad precedent). Add a note in the allowlist that this is intentional and benign.
S3. No authentication on service HTTP endpoints (P2)¶
Files: services/*/main.py
The agent HTTP endpoints (/run, /health) have no authentication. This is acceptable since they're on an internal Docker network with no port bindings to the host, but should be explicitly documented as a design decision. If the docker-compose topology changes (e.g., external port exposure), these become attack surfaces.
Fix: Add a comment in docker-compose.yml noting that agent services are internal-only. Consider adding a shared bearer token check as defense-in-depth when moving to production.
S4. No input validation on /run task text (P3)¶
Files: services/bot/main.py:53, services/owned-agent/main.py:22, services/paid-agent/main.py:22
The task field in RunRequest has no length limit. An attacker (or a misbehaving router) could send an enormous string. Pydantic model accepts any string.
Fix: Add max_length constraint to the task field: task: str = Field(max_length=50_000).
2. Code Quality¶
Q1. Duplicated service main.py boilerplate (P3)¶
Files: services/owned-agent/main.py, services/paid-agent/main.py
These two files are nearly identical (54 lines each, ~95% overlap). The only difference is the import of build_agent_config from different config.py files. The _extract_last_text logic also duplicated across bot/main.py.
Fix: Extract a shared create_service_app(config_builder) factory function into src/compass/service.py. Each service's main.py becomes 3 lines. Not urgent — fine for Week 2 scaffolding, but should be consolidated before Week 4 when more services come online.
Q2. Duplicated Mock SharePoint tools (P3)¶
Files: services/owned-agent/config.py:82-100, services/paid-agent/config.py:96-114
MockSharePointReadTool and MockSharePointUploadTool are copy-pasted between owned and paid agent configs. They're identical.
Fix: Move mock tools to a shared location (e.g., src/compass/tools/mocks.py) or simply reuse the real SharePointReadTool/SharePointUploadTool with a try/except fallback pattern (which is already done in _sharepoint_tools() — but the mocks are defined locally in each config).
Q3. _to_block_dicts in loop.py re-serializes response content (P3)¶
File: src/compass/loop.py:15-27
The function converts TextBlock/ToolUseBlock dataclasses back to dicts immediately after client.chat() returns them. This creates a dict→dataclass→dict round trip. Not a bug, but the intermediate dataclass step is unnecessary overhead. The design is intentional (typed response from LLM client, normalized storage), so this is a minor observation rather than a fix.
Q4. No dead code detected ✅¶
The codebase is tight. Everything defined is used. No unreachable branches or orphaned functions in the production code.
3. Architecture Alignment¶
A1. Implementation matches architecture doc ✅¶
Reference: docs/canon/architecture-consolidated.md
The implementation faithfully follows the architecture spec:
- ✅ Pi-style agent loops (~82 lines in loop.py — matches "~100 lines" spec)
- ✅ Pi-mom router pattern with dispatch tool (services/bot/router.py)
- ✅ 6 Docker containers: caddy + bot + 4 agents
- ✅ SQLite hot state (src/compass/state.py)
- ✅ SharePoint cold store (src/compass/tools/sharepoint.py)
- ✅ ENV-configurable model routing with prefix-based dispatch (src/compass/llm.py)
- ✅ Three-phase model strategy (OpenRouter dev → eval → Azure prod) reflected in create_client()
- ✅ Agent tool sets match spec (state_read, state_write, sharepoint_read/upload, plus domain-specific tools)
- ✅ HTTP inter-container communication (Option B from spec)
- ✅ Docker Compose topology matches spec exactly
A2. Spec mentions Teams SDK — not yet implemented (P3 / Expected)¶
Reference: Architecture §3
The bot service is currently a CLI harness + FastAPI stub, not a Teams SDK integration. This is correct for Week 2 (Teams is Week 7 per the implementation sequence). The architecture is designed for this — the agent loop is SDK-agnostic, and the bot's /run endpoint is a clean seam for future Teams integration.
A3. brandwatch-agent and scoring-agent are stubs with profiles: [phase2] ✅¶
Exactly as specified in the architecture. They return {"status": "phase2-stub"} health checks and are behind Docker Compose profiles. Clean.
A4. Router's agent list matches architecture ✅¶
File: services/bot/router.py:23-27
The dispatch tool's enum includes owned, paid, brandwatch, scoring — matching the four specialist agents in the architecture spec.
4. Error Handling¶
E1. Agent loop has no max-iteration guard (P1)¶
File: src/compass/loop.py:37
The while True loop continues until the LLM returns no tool calls. If the LLM repeatedly returns tool calls (hallucinated or real), this loop runs indefinitely, consuming API credits and potentially timing out the HTTP request.
Fix: Add a configurable max_iterations parameter (default 20-30) to agent_loop(). After exceeding it, raise a RuntimeError or return with an error message. The architecture spec doesn't mention this, but it's a standard safety mechanism for agent loops.
MAX_ITERATIONS = 25
async def agent_loop(config, messages, on_text=None, on_tool_call=None, max_iterations=MAX_ITERATIONS):
for iteration in range(max_iterations):
...
raise RuntimeError(f"Agent loop exceeded {max_iterations} iterations")
E2. Silent exception swallowing in _sharepoint_tools() (P2)¶
Files: services/owned-agent/config.py:97, services/paid-agent/config.py:102
def _sharepoint_tools() -> list[Tool]:
try:
sp_client = create_sharepoint_client()
return [SharePointReadTool(sp_client), SharePointUploadTool(sp_client)]
except Exception:
return [MockSharePointReadTool(), MockSharePointUploadTool()]
If SharePoint config is missing, the code silently falls back to mocks with no logging. In production, this means the agent thinks it uploaded to SharePoint but actually did nothing. This is fine for Week 2 dev, but must be logged before production.
Fix: Add import logging; logger = logging.getLogger(__name__) and log a warning in the except block: logger.warning("SharePoint credentials not configured, using mock tools", exc_info=True).
E3. Silent exception in StateManager._conn() (P3)¶
File: src/compass/state.py:27
The context manager catches Exception and rolls back, but re-raises — which is correct. However, the _init_db method (line 31) uses a bare with sqlite3.connect(...) which swallows connection errors since it's called in __init__. If the DB path is invalid, the error only surfaces later.
Fix: This is fine as-is. The mkdir(parents=True, exist_ok=True) at line 19 prevents the most common failure. Low priority.
E4. DispatchTool has 600s timeout but no error message customization (P3)¶
File: services/bot/router.py:46
If a specialist agent takes >10 minutes, httpx.ReadTimeout propagates to the agent loop's generic except Exception, which returns Error: ReadTimeout: .... This is acceptable error handling — the loop catches it and feeds it back to the LLM. But the 10-minute timeout is worth documenting as a design decision.
5. Type Safety¶
T1. callable instead of Callable (P3)¶
File: src/compass/llm.py:59
model_transform: callable,
Should be Callable[[str], str] from collections.abc for proper type safety. callable (lowercase) is the builtin function, not a type annotation. Python 3.12 won't error on this, but type checkers will flag it.
Fix: from collections.abc import Callable and use model_transform: Callable[[str], str].
T2. Tool execute() uses **kwargs: Any throughout (P3)¶
Files: All tool implementations
Every tool's execute() method accepts **kwargs: Any and manually extracts/casts arguments:
async def execute(self, **kwargs: Any) -> str:
run_id = int(kwargs["run_id"])
This means the type system provides no help for callers, and KeyError is the failure mode for missing arguments (which the generic except in the loop catches).
Fix: This is a deliberate design choice (tools are invoked dynamically by the LLM), so **kwargs makes sense. Consider adding TypedDict annotations as documentation. Low priority.
T3. OnText and OnToolCall type aliases could be more precise (P3)¶
File: src/compass/loop.py:10-11
OnText = Callable[[str], Awaitable[None]] | None
OnToolCall = Callable[[str, str, dict[str, Any]], Awaitable[None]] | None
These are good but could use Protocol for better IDE support. Very minor.
6. Testing¶
TE1. No tests for service endpoints (P1)¶
Coverage gap: services/bot/main.py, services/owned-agent/main.py, services/paid-agent/main.py
Zero test coverage for:
- FastAPI route handlers (/run, /health)
- Request/response serialization
- build_router_config() integration
- Service-level error handling
Fix: Add integration tests using httpx.AsyncClient with FastAPI's TestClient:
from httpx import AsyncClient, ASGITransport
from services.bot.main import app
async def test_health():
async with AsyncClient(transport=ASGITransport(app=app)) as client:
r = await client.get("/health")
assert r.status_code == 200
TE2. No tests for StateManager CRUD operations (P2)¶
Coverage gap: src/compass/state.py
The state tools are tested (test_tools.py), but StateManager.enqueue_work(), dequeue_work(), update_run_status(), and write_tool_result() (with non-state_write tool names) are untested.
Fix: Add a test_state.py with direct StateManager tests covering the work queue and run lifecycle.
TE3. No tests for router dispatch (P2)¶
Coverage gap: services/bot/router.py
DispatchTool.execute() makes HTTP calls to agent services. No test mocks or verifies this behavior.
Fix: Add a test using httpx.MockTransport to verify dispatch routes correctly and handles errors.
TE4. Good existing test quality ✅¶
The three existing test files are well-written:
- test_llm_routing.py — covers all provider prefixes, missing env errors, unsupported prefix rejection
- test_loop.py — tests the full tool-call-then-text cycle and unknown tool handling
- test_tools.py — tests state tools with real SQLite and SharePoint tools with mock HTTP
Clean use of pytest.mark.asyncio, monkeypatch, tmp_path, and httpx.MockTransport. No flaky patterns detected.
TE5. asyncio_mode = "auto" in pyproject.toml ✅¶
Good — avoids the common pytest-asyncio configuration trap.
7. Dependencies¶
D1. Dependency pinning is sensible ✅¶
File: pyproject.toml
dependencies = [
"fastapi>=0.115,<1.0",
"httpx>=0.27,<1.0",
"pydantic>=2.7,<3.0",
"uvicorn>=0.30,<1.0",
]
Floor + ceiling ranges. Lock file (uv.lock) present. Using uv for dependency management — modern and fast.
D2. Stub Dockerfiles use pip install instead of uv (P3)¶
Files: services/brandwatch-agent/Dockerfile, services/scoring-agent/Dockerfile
The stub services use FROM python:3.12-slim + pip install while the real services use ghcr.io/astral-sh/uv:python3.12-bookworm-slim + uv pip install. Inconsistent but fine for Phase 2 stubs.
Fix: Standardize when Phase 2 stubs get real implementations.
D3. No requirements.txt for stub services (P3)¶
The brandwatch-agent and scoring-agent Dockerfiles install fastapi uvicorn without version pins. These are stubs so it doesn't matter yet, but should be pinned when they get real implementations.
8. Docker¶
DK1. Build context mismatch — builds will fail (P0 🔴)¶
Files: docker-compose.yml:13,35,55, services/*/Dockerfile
The docker-compose build contexts are set to individual service directories:
build: ./services/bot
build: ./services/owned-agent
build: ./services/paid-agent
But the Dockerfiles reference paths relative to the repo root:
COPY pyproject.toml /app/pyproject.toml # ← doesn't exist in ./services/bot/
COPY src /app/src # ← doesn't exist in ./services/bot/
COPY services/bot /app/services/bot # ← doesn't exist in ./services/bot/
This means docker compose build will fail with COPY errors. None of these images can be built as configured.
Fix (Option A — recommended): Change docker-compose to use repo root as build context with per-service Dockerfiles:
bot:
build:
context: .
dockerfile: services/bot/Dockerfile
Fix (Option B): Restructure Dockerfiles to use paths relative to their service directory and copy only what's needed. Less clean since src/ is shared.
DK2. No .dockerignore file (P2)¶
Missing file
Without .dockerignore, the Docker build context includes .git/ (large), archived/ (~extensive PoC code), test_home.png (2.1MB screenshot), docs/ (research docs), and uv.lock (37KB). This bloats build context transfer and layer cache.
Fix: Add a .dockerignore:
.git/
archived/
docs/
tests/
*.png
*.zip
run/
.env
.env.*
!.env.example
DK3. No health checks in docker-compose (P2)¶
File: docker-compose.yml
All services have /health endpoints, but no healthcheck directives in docker-compose. This means depends_on doesn't wait for service readiness, and Docker won't auto-restart unhealthy containers.
Fix: Add health checks:
owned-agent:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8001/health"]
interval: 30s
timeout: 5s
retries: 3
(Or use python -c "import httpx; ..." since curl may not be in the slim image.)
DK4. uv run in CMD is unnecessary (P3)¶
Files: services/bot/Dockerfile:10, services/owned-agent/Dockerfile:8, services/paid-agent/Dockerfile:8
The Dockerfiles install packages system-wide with uv pip install --system, then use uv run uvicorn in CMD. Since packages are installed system-wide, uvicorn is already on PATH. uv run adds unnecessary overhead (it checks/creates virtual environments).
Fix: Change CMD to ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "XXXX"].
DK5. Shared Docker volume for SQLite across containers (P2)¶
File: docker-compose.yml — agent-data:/data mounted on bot, owned-agent, paid-agent
Multiple containers share the same Docker volume and potentially the same SQLite database file (/data/state.db). SQLite with WAL mode handles concurrent reads well, but concurrent writes from different processes can cause SQLITE_BUSY errors. The StateManager does set PRAGMA journal_mode=WAL, which helps.
Fix: Either:
1. Give each agent its own state DB path (e.g., /data/owned-state.db, /data/paid-state.db), or
2. Move to a shared SQLite accessed only by one service (the bot/router), with agents reporting back via HTTP responses.
Option 2 aligns better with the architecture — agents return results via HTTP, and the bot stores state.
9. Smells¶
SM1. test_home.png — 2.1MB screenshot checked into repo (P3)¶
File: test_home.png (root)
A 2.1MB screenshot in the repo root. Presumably a test artifact from the PoC. Bloats clone size permanently (even after deletion, it stays in git history).
Fix: Remove from tracking. Add *.png to .gitignore root (it's already there for runtime outputs via run/).
SM2. Mock tools may silently persist into production (P2)¶
Files: services/owned-agent/config.py, services/paid-agent/config.py
The build_agent_config() functions include mock tools (MockWebScrapeTool, MockScreenshotTool, etc.) alongside real tools. These mocks return fake data (json.dumps({"ads_found": 3})). When real tools are implemented, someone needs to remember to remove the mocks.
Fix: Add a TODO(week3) comment on each mock, or gate mocks behind an ENV flag:
if os.getenv("USE_MOCKS", "true").lower() == "true":
tools.append(MockWebScrapeTool())
SM3. Caddy external hostname is hardcoded (P3)¶
File: Caddyfile
compass.agentflow.dev {
reverse_proxy bot:3978
}
The hostname compass.agentflow.dev is hardcoded. For deployment to the client's Azure (Weeks 18-20), this needs to change. Not a problem now, but worth noting.
SM4. archived/.env contains GCP project ID and service account email (P3)¶
File: archived/.env
While not a secret, GCP_IMPERSONATE_SERVICE_ACCOUNT=curioushealth-worker@curious-473917.iam.gserviceaccount.com reveals infrastructure details. The .gitleaks.toml allowlists the entire archived/ directory, which is sensible since it's reference material.
10. Archived PoC Comparison¶
What should be carried forward:¶
| PoC Component | Status | Notes |
|---|---|---|
OAuth token acquisition (push_to_sharepoint.py:22-31) |
⚠️ CARRY FORWARD | The acquire_token() function is the correct pattern for Graph API auth. Production sharepoint.py uses static tokens instead. |
Chunked upload (push_to_sharepoint.py:34-68) |
⚠️ CARRY FORWARD | Large file upload via upload sessions is needed for screenshots/HTML. Production SharePointClient.upload_text() only handles small text files. |
Brand discovery & validation (brand_discovery.py) |
✅ INFORMATIONAL | Rich Perplexity-powered discovery with HTTP validation. The production version uses mock tools, but this PoC code shows what the real owned-agent tools need to do. |
Website capture with Playwright (website_capture.py) |
✅ INFORMATIONAL | Sophisticated capture with stealth, WAF detection, cookie consent, fallback chain. Production mock tools need to eventually replicate this. |
Config loader & models (config_loader.py, models.py) |
✅ INFORMATIONAL | Well-structured YAML config system. Not needed in production (agents get config via system prompts + tool calls), but useful reference for project configuration. |
Summary/flow logging (summary.py) |
✅ INFORMATIONAL | Nice flow logging pattern. Production could adopt a similar structured logging approach. |
What should NOT be carried forward:¶
| PoC Component | Reason |
|---|---|
archived/app/main.py — FastAPI web UI with upload form |
Replaced by Teams interface. The web UI pattern doesn't apply. |
archived/Dockerfile — Monolithic Dockerfile with Playwright |
Replaced by per-service Dockerfiles. The monolithic approach doesn't fit the multi-container architecture. |
Global mutable caches (_API_KEY_CACHE, _PPLX_KEY_RESOLVED) in brand_discovery.py |
Global mutable state is an anti-pattern. The production code correctly uses environment lookups via get_env(). |
| GCP Secret Manager integration (various files) | Removed in production (env-only secrets). Correct decision — GCP is not in the Azure compliance boundary. |
requests library usage (throughout PoC) |
Production correctly uses httpx (async). Don't regress to synchronous HTTP. |
python-dotenv / load_dotenv (PoC __main__.py) |
Production uses pure env vars via Docker. Don't add dotenv as a dependency. |
Prioritized Fix List¶
| Priority | ID | Issue | Effort |
|---|---|---|---|
| P0 | DK1 | Docker build context mismatch — builds will fail | 10 min |
| P1 | E1 | Agent loop has no max-iteration guard | 15 min |
| P1 | S1 | GRAPH_ACCESS_TOKEN static token will expire in production |
1-2 hrs |
| P1 | TE1 | No tests for service endpoints | 2-3 hrs |
| P2 | E2 | Silent SharePoint mock fallback — no logging | 10 min |
| P2 | S3 | No auth on internal service endpoints (document decision) | 10 min |
| P2 | DK2 | Missing .dockerignore |
10 min |
| P2 | DK3 | No Docker health checks | 15 min |
| P2 | DK5 | Shared SQLite volume across containers | 30 min |
| P2 | SM2 | Mock tools may persist into production silently | 15 min |
| P2 | TE2 | No tests for StateManager CRUD | 1 hr |
| P2 | TE3 | No tests for router dispatch | 1 hr |
| P2 | S2 | Archived .env committed |
5 min |
| P3 | T1 | callable instead of Callable type annotation |
5 min |
| P3 | Q1 | Duplicated service boilerplate | 30 min |
| P3 | Q2 | Duplicated mock SharePoint tools | 15 min |
| P3 | DK4 | Unnecessary uv run in Docker CMD |
5 min |
| P3 | S4 | No input length validation on /run endpoint |
5 min |
| P3 | SM1 | 2.1MB test_home.png in repo root |
5 min |
| P3 | SM3 | Hardcoded Caddy hostname | 5 min |
Overall Assessment¶
This is a strong Week 2 foundation. The architecture is clean, the abstractions are well-chosen, and the code reads like it was written by someone who's built agent systems before. The pi-mom router, provider-agnostic LLM adapter, and tool-based agent loop are all solid patterns that will scale.
The one blocking issue is the Docker build context mismatch (P0 DK1) — docker compose up will fail as written. This is a 10-minute fix.
The one design concern to address before production is the static Graph API token (P1 S1). The PoC already has the right pattern (acquire_token() in push_to_sharepoint.py), it just needs to be ported into the production SharePointClient.
The agent loop's unbounded iteration (P1 E1) is a standard safety gap that should be patched early — it's the kind of thing that costs $50 in API calls at 3am when an LLM gets confused.
Test coverage is the biggest area for improvement. The existing tests are high quality, but they only cover the core library. The service layer, router, and state manager need tests before this codebase grows.
Line count: ~1,597 lines of production Python across src/ + services/ + tests/. Lean and purposeful. The architecture doc estimated ~2,500 total — on track.
Recommendation: Fix P0 (Docker) immediately. Address P1s this week. P2s before Week 4. P3s whenever convenient.