erlm/README.md

# ERLM - Edge Recursive Language Model

This program is an AI assistant designed to extract information from a massive, unseen text corpus (`RAW_CORPUS`) without directly accessing the text itself. It operates within a persistent Python REPL environment, constrained by a limited context window and a need to avoid overwhelming output.

Here's a breakdown of its core functionality:

1.  **Blind Data Exploration:** The AI cannot directly view the `RAW_CORPUS`. Instead, it must infer its structure (structured, semi-structured, or unstructured) through Python code execution.

2.  **Data Engineering Approach:** The program follows a structured workflow:
    *   **Shape Discovery:** It first analyzes the `RAW_CORPUS` to determine its format (JSON, CSV, XML, log lines, etc.).
    *   **Access Layer Creation:** It then builds a system of persistent Python variables (lists, dictionaries, etc.) to efficiently access and manipulate the data. This involves splitting the text into manageable chunks, parsing log lines, or extracting relevant sections.
    *   **Dense Execution:** Finally, it uses the created access layer to perform targeted searches and extractions, avoiding redundant scanning of the entire corpus.

3.  **Limited Output:** To manage the context window, the AI is restricted to printing small snippets of output (less than 1000 characters) and is encouraged to summarize findings in Python variables.

4.  **Iterative Process:** The program operates in a series of steps, each designed to build upon the previous one. It prioritizes creating reusable tools and verifying results at each stage.

5.  **JSON Output:** All outputs are formatted as JSON, ensuring consistent and parsable data.

In essence, this is a sophisticated data extraction and analysis tool that mimics the process of a human data scientist, carefully exploring and structuring a large dataset before performing targeted queries.