Methodology
How the AI 4 Society Observatory collects, classifies, reviews, and publishes AI risk intelligence — in full detail.
1. What is the Observatory?
The AI 4 Society Observatory is a continuously updated intelligence platform tracking how artificial intelligence is reshaping society across three content layers: a historical AI timeline (milestones), active risk monitoring, and solutions monitoring.
The editorial mission is anticipatory — tracking emerging societal dynamics before they become crises, not just reacting to headlines. The platform is grounded in the OECD AI Principles framework (P01–P10), adopted by 46 countries, which provides a standardized basis for classifying every signal the system ingests.
The Observatory is not an incident database, does not track individual products or companies, and does not give financial or legal advice. It maps patterns of societal impact at scale.
2. Three Content Types
The knowledge graph organizes content into five node types. Three of them have dedicated public pages; two are used internally for classification and governance.
Risk Pages
Actively monitored patterns of societal harm from AI with demonstrated real-world evidence. Each risk node tracks: summary, deep_dive narrative, score_2026 and score_2035 (0–100), velocity (Critical / High / Medium / Low), expert_severity (0–100), public_perception (0–100), timeline_narrative (near/mid/long term), mitigation_strategies, and OECD principle tags.
New risks can be proposed by the Discovery Agent when sufficient unmatched signal evidence accumulates.
Solution Pages
Track maturity, adoption, and effectiveness of countermeasures. Must have verifiable real-world deployment or legislative progress. Fields: solution_type, implementation_stage (Research → Policy Debate → Pilot → Early Adoption → Scaling → Mainstream), score_2026, score_2035, key_players, barriers, principles, timeline_narrative.
Milestone Entries
Fixed historical events (e.g. Turing Test, AlexNet, ChatGPT launch). Fields: description, date (ISO 8601 partial), significance (breakthrough / regulatory / incident / deployment), optional source_url. Milestones do not update — they are temporal anchors.
Entities affected by or shaping AI. Used for governance edges (internal only — not shown in the public graph visualization).
OECD AI Principles. Used for classification tagging and governance edges (internal only).
3. The 39 Data Sources
Signal Scout monitors 39 sources across 7 tiers, polled every 6 hours. Credibility scores are configurable per source via the admin panel. Sources can be toggled on/off without a code deploy.
Tier 0 — Regulatory
2 sources| Source | Credibility | Notes |
|---|---|---|
| EU AI Office / EUR-Lex | 0.93 | Official EU regulatory RSS feed. Allowlist-filtered for AI Act-specific terms. Contributes policy filings, regulatory updates. |
| NIST AI / Federal Register | 0.91 | US standards body. Allowlist-filtered for AI RMF, executive orders. Contributes standards updates, framework publications. |
Tier 1 — Institutional / Research
15 sources| Source | Credibility | Notes |
|---|---|---|
| arXiv CS.AI | 0.85 | Academic preprint server, API-queried for "cs.AI AND safety", max 15 items. Research findings. |
| Alignment Forum | 0.85 | AI safety research forum, karma-filtered (≥25). Pass-all keyword filter. |
| AI Safety Newsletter / CAIS | 0.85 | Center for AI Safety newsletter. Pass-all. |
| Nature Machine Intelligence | 0.90 | Peer-reviewed journal. Research findings. |
| AI Now Institute | 0.85 | AI policy research organization. |
| Future of Life Institute | 0.88 | AI safety nonprofit. |
| DeepMind Blog | 0.85 | Google DeepMind research blog. |
| MIRI Blog | 0.82 | Machine Intelligence Research Institute. |
| WHO Disease Outbreak News | 0.92 | Biosecurity/health domain. Pass-all. |
| International Crisis Group | 0.88 | Geopolitical conflict analysis. Pass-all. |
| WEF Global Risks / Agenda | 0.85 | World Economic Forum. |
| Nature Climate Change | 0.90 | Peer-reviewed climate journal. |
| RAND Corporation | 0.87 | Policy/security think tank. |
| Brookings Institution | 0.87 | AI policy research. |
| DigiChina / Stanford FSI | 0.87 | China AI policy tracker. |
Tier 2 — Journalism
11 sources| Source | Credibility | Notes |
|---|---|---|
| MIT Technology Review | 0.80 | — |
| Wired AI | 0.75 | — |
| Ars Technica AI | 0.75 | — |
| IEEE Spectrum AI | 0.80 | — |
| The Guardian AI | 0.75 | — |
| STAT News | 0.80 | Biosecurity. |
| Carbon Brief | 0.82 | Climate. |
| Climate Central | 0.78 | Climate. |
| Bellingcat | 0.78 | Geopolitical OSINT. |
| Foreign Policy | 0.78 | — |
| Platformer | 0.78 | AI accountability. |
Tier 3 — Community
4 sources| Source | Credibility | Notes |
|---|---|---|
| The Verge AI | 0.65 | — |
| TechCrunch AI | 0.60 | — |
| LessWrong | 0.68 | Karma ≥30. |
| EA Forum / 80,000 Hours | 0.72 | Karma ≥25. |
Tier 4 — Search
1 source| Source | Credibility | Notes |
|---|---|---|
| GDELT DOC API | 0.50 | Global media monitoring, API-queried for "AI risk", 12h timespan. |
Tier 5 — Newsletter
6 sources| Source | Credibility | Notes |
|---|---|---|
| TLDR AI | 0.65 | — |
| Import AI | 0.70 | Substack. |
| Last Week in AI | 0.65 | — |
| Ben's Bites | 0.65 | — |
| ChinAI Newsletter | 0.72 | Substack. |
| CDC / MMWR | 0.90 | Biosecurity. Pass-all. |
Tier 6 — Data Infrastructure
1 source| Source | Credibility | Notes |
|---|---|---|
| Semantic Scholar API | 0.65 | Academic paper search, queried for "AI safety", limit 20 results. |
Evaluated and removed
| Source | Reason for removal |
|---|---|
| OECD | No public RSS feed. |
| Anthropic Blog | No native RSS. |
| AI Incident Database | No RSS feed. |
| ProMED | RSS shutdown 2023. |
| HealthMap | Not RSS-based. |
| NewsAPI | Requires paid API key. |
| The Batch | Email-only distribution. |
4. "Curated by AI, Reviewed by Humans" Pipeline
Every piece of information visible to the public has been approved by a human reviewer. The safety invariant: if a gate is unattended, data stays quarantined — stale-but-correct over unreviewed.
Signal Scout runs every 6 hours. Fetches all enabled sources in parallel with up to 2 retries on transient errors. Extracts OG images from articles.
Each article passes: credibility threshold (0.3 minimum), 7-day recency window, URL deduplication against existing signals, Jaccard title similarity dedup (threshold 0.6), and keyword strategy per source (pass-all / allowlist / karma / api-query / shared-keyword-list).
Surviving articles classified by Gemini 2.5 Flash in batches of 25 (temperature 0.1). Each article receives: signal_type (risk/solution/both/unmatched), harm_status (incident/hazard/null), principles[] (P01–P10), related_nodes[], confidence (must exceed 0.8), and impact_score weighted by source credibility. Unmatched signals receive a proposed_topic label (3–8 words).
Classified signals stored with status: "pending" in Firestore signals collection.
Signal reviewers (role: signal-reviewer) see pending signals. Actions: Approve, Reject, Approve (Edited), Reset to Pending, Bulk approve/reject, Assign to reviewer. Approved signals enter the public feed and become evidence for nodes.
Discovery Agent (biweekly, Gemini 2.5 Pro) analyzes 6 months of signals where discovery_locked == false. Requires 5+ classified signals and 3+ unmatched signals minimum. Proposes new nodes (full skeleton) and edges (3+ signal minimum, both nodes must exist).
Discovery reviewers approve or reject node/edge proposals. On approval: node created with sequential ID, graph snapshot rebuilt, pending signals reclassified against new taxonomy. Rejected proposals are tracked to prevent re-proposing.
Scoring Agent (monthly, Gemini 2.5 Pro) batched via Cloud Tasks (5 nodes per batch). Proposes incremental changes with field-level diffs.
Scoring reviewers see field-level diffs. Approval applies changes atomically, increments node version, writes changelog entry.
Approved content is visible in the public Observatory graph, feed, and timeline.
Anti-recursion guards: classification_version is capped at 2 to prevent re-classifying the same signal endlessly. discovery_locked prevents re-discovery of already-processed signals. Rejected signals are quarantined; pending signals auto-expire after 30 days via the Data Lifecycle agent.
Filter strategies: pass-all — all articles pass (safety-focused sources: Alignment Forum, CAIS, WHO DON). allowlist — must match source-specific terms (EU AI Office, NIST). karma — pre-filtered at source via URL parameter (LessWrong ≥30, EA Forum ≥25). api-query — pre-filtered by API query parameters (arXiv, GDELT, Semantic Scholar). shared-keyword-list — checked against dynamic filter terms derived from all node names/categories in the graph.
5. Scoring System
Scores represent the review team's best current assessment of severity, trajectory, and maturity. All proposals are generated algorithmically and require human approval before applying.
Risk Node Fields
- score_2026 — current risk severity (0–100)
- score_2035 — projected severity (0–100)
- velocity — Critical / High / Medium / Low
- expert_severity — 0–100
- public_perception — 0–100, from community votes
Solution Node Fields
- score_2026 — current maturity (0–100)
- score_2035 — projected adoption (0–100)
- implementation_stage — Research → Policy Debate → Pilot → Early Adoption → Scaling → Mainstream
Scoring Agent Rules
The graph also tracks per-node trending summaries: signal counts over 7d and 30d, trending direction (rising / stable / declining), and vote totals. These are rebuilt by the Graph Builder on every approval.
6. Update Cadence
Each agent runs on a fixed schedule. The table below shows cadence and the expected lag before changes are visible to users.
| Agent | Schedule | Typical Lag |
|---|---|---|
| Signal Scout | Every 6 hours | 6–12 hours from publication to classification |
| Feed Curator | Every 6 hours | After human approval, next run adds to public feed |
| Discovery Agent | Biweekly (1st & 15th, 10:00 UTC) | New taxonomy nodes after review |
| Scoring Agent | Monthly (1st, 09:00 UTC) | Score updates after review |
| Graph Builder | On demand (triggered by approvals) | Immediate after approval |
| Data Lifecycle | Daily (03:00 UTC) | Cleanup operations |
End-to-end lag: real-world event → article published → next Signal Scout poll (0–6h) → human review (variable, depends on reviewer availability) → next Feed Curator run (0–6h) → visible on Observatory. Minimum approximately 6 hours; typical 12–48 hours for well-covered events.
7. Known Limitations
Transparency about what this platform cannot do is as important as what it can. The following limitations are structural, not bugs.
Source list is predominantly English-language. Non-English sources are limited to ChinAI Newsletter (translated Chinese AI policy) and some international organization feeds. Under-represented: Global South, non-Western regulatory frameworks.
Strong coverage of US, EU, UK, and China AI policy. Limited direct coverage of AI developments in India, Southeast Asia, Africa, and Latin America.
Gemini 2.5 Flash classification is imperfect. The 0.8 confidence threshold reduces but does not eliminate errors. All classifications are reviewed by humans before publication.
The Observatory is NOT an incident database (unlike AIID), does NOT track individual AI products or companies, and does NOT provide financial, legal, or policy advice. It tracks patterns of societal impact, not individual incidents.
Volunteer-driven project with a small review team. Review latency varies. The safety invariant ensures unreviewed data never reaches the public, but some signals may expire before review.
Despite algorithmic proposals, final scores reflect editorial judgment of the review team. The scoring rubric is transparent but inherently involves judgment calls.
All monitored sources are publicly accessible — no paywalled sources. This means some high-quality gated research may be missed.
8. How to Contribute
AI 4 Society is a volunteer-driven project. Contributions at every level are welcome.