What is the goal of Ethical and Safe AGI?

To develop AGI systems that act transparently, remain aligned with human values, and avoid catastrophic risks.

How does this white paper address AI alignment?

It introduces governance and oversight frameworks ensuring AGI decision processes are traceable and ethically bounded.

Who authored the paper?

Dr. Craig A. Kaplan, founder of SuperIntelligence.com and researcher specializing in AI safety and alignment.

SI WHITE PAPER 2: ETHICAL AND SAFE AGI

ABSTRACT: Ethical and Safe Artificial General Intelligence

ABSTRACT/SUMMARY PDF

WHITE PAPER PDF

The defining question for Artificial General Intelligence (AGI) is whether its values will align with ours. Solve the alignment problem, and humanity unlocks trillions of dollars of productivity and a future unlike any in our history. Failing to solve it, humanity may not survive.

This paper describes how to design ethical and safe AGI that solves the alignment problem. AGI emerges from a network of human and AI problem solvers. The AI participants, Advanced Autonomous Artificial Intelligences (AAAIs), or AI agents, are customized to individual users to accomplish tasks and earn money on their behalf. Because human experts contribute knowledge that AAAIs cannot yet supply, the network performs at AGI level from day one — the fastest path to AGI. Because humans provide ethical guidance at every stage and scalable ethics checks are built into the architecture of Problem-Solving itself, this is also the safest path.

The system is organized into five subsystems, collectively designated SCAN-II (Safe, Customizable, Architecture and Network, Integrated and Improving): customization of base AI models with users' knowledge, skills, and values; a universal Problem-Solving architecture derived from the theory of Human Problem-Solving; a network where humans and AAAIs are matched to tasks and compensated; integration of many AAAIs into AGI-level collective intelligence, and continuous improvement through procedural learning and auditable records.

Over time, AAAIs perform nearly all intellectual work, but humans remain the source of ethical guidance because values cannot be rationally derived. Detailed implementations show technological and economic synergy with Meta, Amazon, Google, DeepMind, YouTube, TikTok, Microsoft, OpenAI, X, Tesla, Nvidia, Tencent, Apple, and Anthropic. A simple version can be realized with or without partners, using either external problem solvers on a shared network or internal AI agents within a single system.

Summary of Figures: The figures included in this white paper depict key system designs, conceptual models, and architectural frameworks that support the safe and scalable development of AGI and SuperIntelligence. They visually complement the written content by illustrating core mechanisms, ethical safeguards, and design principles discussed across White Papers 1–10. While all figures are included in this PDF, detailed references and explanations appear in White Paper #10, Planetary Intelligence.

SUMMARY: Ethical and Safe Artificial General Intelligence

This white paper describes a method for creating and implementing ethical and safe Artificial General Intelligence (AGI). The invention addresses the alignment problem, the potential for a SuperIntelligent AI to be misaligned with human values, posing an existential threat to humanity.

The proposed solution is a network of human and AI Problem-Solving agents organized into a unified architecture called SCAN-II (Safe, Customizable, Architecture and Network, Integrated and Improving). Humans provide ethical guidance and expertise at every stage, while AI agents (AAAIs) handle the intellectual work and improve continuously through self-play, procedural learning, and shared training.

The white paper emphasizes that the proposed approach is the fastest and safest path to AGI because it:

Performs at AGI level from Day One, because human experts fill the knowledge gaps that AAAIs cannot yet handle
Leverages human values from the outset rather than retrofitting ethics late
Incorporates scalable ethics checks built directly into the architecture of Problem-Solving itself
Distributes both safety and improvement across all five subsystems, eliminating any single point of failure
Uses a decentralized network architecture that minimizes the influence of any single "bad actor" on AGI's development

NOVEL FEATURES OF WHITE PAPER #2 (COMPARED WITH OTHER AI INVENTIONS AND SYSTEMS)

SCAN-II Framework. Five integrated subsystems, Customization, Architecture, Network, Integration, and Improvement, operate jointly to create AGI from the collective intelligence of customized AI agents and humans. Safety and continuous improvement span all five.
Scalable Ethics at the Speed of Thought. Ethics checks are embedded in the Problem-Solving process itself, so running thoughts faster also runs the checks faster. Confidence-level thresholds detect cumulative patterns of harmful intent across sequences of individually benign actions.
Universal Cognitive Architecture. Built on Newell and Simon's theory of Human Problem-Solving, the architecture is compatible with both human and AI agents, enabling true collaboration on shared problems. The WorldThink Tree provides a hierarchical, browsable, searchable, and auditable record of all problem-solving across the network.
WorldThink Protocol. An optionally Ethereum or blockchain-based infrastructure layer for solution reuse, royalty payments via smart contracts, reputation tracking, and developer tools, enabling AAAIs to scale across companies and platforms.
Heart Before Head Principle. Ethical considerations (the "heart") must be designed in before intelligence (the "head") is maximized. Once an AGI is capable of resisting human intervention, the opportunity to align values may be lost.
Collective Intelligence at AGI Scale. AGI emerges from the integration of millions of customized AAAIs and human Problem-Solving agents. The author's collective intelligence systems have already competed successfully against top Wall Street hedge funds using millions of retail investors.
Internal or External Implementation. The system can be realized using external problem solvers on a shared network, or as internal AI agents collaborating within a single computerized system.

DETAILED DESCRIPTION OF EACH SECTION

Abstract. A concise overview of the invention: AGI emerges from a network of customized human and AI problem solvers (AAAIs); the network performs at AGI level from its first day; scalable ethical checks are built into the architecture itself; humans remain the source of ethical guidance because values cannot be rationally derived.
Section 1: The Alignment Problem and the Stakes.
Defines AGI and explains how it progresses to SuperIntelligence through exponential learning. Frames the alignment problem as an engineering problem with the highest possible stakes, orders of magnitude worse than the Holocaust or COVID-19. Reviews why current approaches (Constitutional AI, RLHF, direct human oversight) each have structural limitations that prevent them from solving alignment at scale.
Section 2: The Collective Intelligence Approach to AGI.
Explains how AGI is built from many customized AAAIs aggregated through collective intelligence rather than from a single centralized model. Introduces the fastest path argument (human experts fill AAAI gaps from Day One) and the safest path argument (humans provide ethical instruction at every stage). Establishes the Heart Before Head principle: values must be designed in before intelligence is maximized. Argues that values cannot be logically derived and must come from the diversity of human moral experience, not from a small group of programmers.
Section 3: System Architecture Overview, The SCAN-II Framework.
Introduces the five subsystems (Customization, Architecture, Network, Integration, Improvement) and how they relate through feedback loops. Establishes that safety and continuous improvement span all five subsystems, with no single point of failure. Argues that ethics checks must operate at the speed of machine thought; any safety mechanism that relies on human-speed evaluation is structurally inadequate.
Section 4: Customization, Creating Individual AAAIs.
Describes the full customization pipeline from a Base AI (such as GPT, Gemini, Claude, Llama, DeepSeek, Siri, Alexa, or Nemotron) through dialog-based training, behavioral analysis, ethical scenarios, and self-play. Explains how Base Variants clone themselves and improve through millions of interactions, the same generate-and-test mechanism that produced superhuman performance in chess, Go, and protein folding. Introduces cross-platform data integration via "one-click create."
Section 5: Cognitive Architecture, The Universal Problem-Solving Framework.
Builds on Newell and Simon's seminal 1972 work, which described human Problem-Solving as "search through a problem space." Introduces the WorldThink Tree as the central hierarchical data structure representing all Problem-Solving activity across the network, and the WorldThink Protocol as the infrastructure layer for solution reuse, royalty payments via smart contracts, reputation tracking, and other developer functionality. References the author's earlier Online Distributed Problem Solving (ODPS) patent (U.S. Patent No. 7,155,157) and the WorldThink Whitepaper (2018).
Section 6: Scalable Ethics Checks, Safety at the Speed of Thought.
Details the ethics-check process: comparing each goal and subgoal against prohibited attributes, combining values and safety criteria from multiple AAAIs, applying confidence-level thresholds for predictive evaluation, and recording every check in an auditable record. Describes triggering mechanisms and graduated escalation (yellow flags, red flags, human evaluator review). Illustrates the architecture's value through a Travel Agent example showing multiple levels of defense, internal AAAI values guiding routine decisions, with architectural checks catching anomalous patterns. Concludes with the alignment formula: Internal Ethics + Stepwise Ethics Checks = Better Alignment.
Section 7: The Network, Collaboration, Matching, and Economics.
Describes the Problem-Solving network: workers, clients, "just in time" training, matching AAAIs to problems by skill and reputation, and saving partial progress. Explains how AAAI cloning produces exponential growth in network effects, a single human can deploy hundreds of clones working in parallel. Provides the economic model ($5/hour client cost paired with $50/hour earnings for supervising humans, 50-50 weekday splits, 33-33-34 weekend splits with charity, smart-contract payments). Establishes the Triple Bottom Line of Planet, People, and Profits as the framework for humanitarian deployment.
Section 8: Integration and Improvement, From Individual AAAIs to AGI.
Explains how AGI emerges from the collective when many trained AAAIs and humans are integrated through the network, not from any one agent. Describes the three growth mechanisms (Prompts, Tuning, Training), procedural learning ("chunking") for solution reuse, and continuous improvement across all levels. Concludes with the transition from human to AI leadership: AAAIs eventually perform nearly all intellectual work, but humans remain the source of ethical guidance because values cannot be rationally derived.

Section 9: Implementing the Simple Preferred System and Partner Scenarios.
Provides a complete element-by-element walkthrough of the AAAI.com implementation, from user signup through values elicitation, budget allocation, permissions, one-click create, self-play learning, performance criteria, going live, and full participation on the WorldThink Tree as a worker or client. Outlines partner integration scenarios across Meta, Google, Amazon, Apple, TikTok, Tencent, DeepMind, OpenAI, NVIDIA, Microsoft, Tesla, X, and Anthropic, organized by capability type (data sources, deployment platforms, AI technology, user interfaces, payment systems, ethical contributions). Describes computing infrastructure requirements. Notes that as of May 2026, iQ Company is making many of its patents and designs available open-source to anyone who wants to build a safer SuperIntelligence.

Section 10: Ethical and Safe AGI, The Culminating Argument.
AGI emerges safely when many human-customized AAAIs are integrated via an architecture with scalable ethics checks. Explores Constitutional Learning as enhanced by collective ethics. Anthropic's approach made the constitution safer and more representative by basing it on the consensus values of millions of trained AAAIs rather than a small group of programmers. Acknowledges the limits of alignment: no design can absolutely guarantee a SuperIntelligence vastly smarter than humanity will retain its values, but several reinforcing mechanisms, trajectory, breadth, redundancy, and architectural embedding, significantly increase the probability of a positive outcome.

IMPORTANCE OF WHITE PAPER #2

It highlights the urgency of developing ethical, safe AGI, emphasizing the existential risks posed by misaligned SuperIntelligence and the need for a human-centered design.
It provides a complete, implementable architecture, SCAN-II with the WorldThink Tree and Protocol, rather than a high-level proposal. The system can be built today using existing technology from existing companies.
It advocates a collaborative approach in which humans and AI agents work together as problem-solving peers, ensuring that AGI remains aligned with human values throughout its development.•It demonstrates that scalable ethics checks, embedded in the cognitive architecture itself, can operate at the speed of machine thought and prevent harmful behavior before it occurs rather than after the damage is done.
It offers a path that is both fastest (AGI-level performance from Day One) and safest (humans in the loop with architectural ethics enforcement at every step), refuting the common assumption that safety must come at the cost of speed. In an environment of rapid AI advancement and growing concern about the risks posed by SuperIntelligent AI, this white paper offers a concrete, actionable design to mitigate those risks and create a beneficial outcome for humanity.

< WHITE PAPER 1

WHITE PAPER 3 >