Files
erlm/README.md
2026-01-26 12:40:24 +00:00

20 lines
1.9 KiB
Markdown

# ERLM - Edge Recursive Language Model
This program is an AI assistant designed to extract information from a massive, unseen text corpus (`RAW_CORPUS`) without directly accessing the text itself. It operates within a persistent Python REPL environment, constrained by a limited context window and a need to avoid overwhelming output.
Here's a breakdown of its core functionality:
1. **Blind Data Exploration:** The AI cannot directly view the `RAW_CORPUS`. Instead, it must infer its structure (structured, semi-structured, or unstructured) through Python code execution.
2. **Data Engineering Approach:** The program follows a structured workflow:
* **Shape Discovery:** It first analyzes the `RAW_CORPUS` to determine its format (JSON, CSV, XML, log lines, etc.).
* **Access Layer Creation:** It then builds a system of persistent Python variables (lists, dictionaries, etc.) to efficiently access and manipulate the data. This involves splitting the text into manageable chunks, parsing log lines, or extracting relevant sections.
* **Dense Execution:** Finally, it uses the created access layer to perform targeted searches and extractions, avoiding redundant scanning of the entire corpus.
3. **Limited Output:** To manage the context window, the AI is restricted to printing small snippets of output (less than 1000 characters) and is encouraged to summarize findings in Python variables.
4. **Iterative Process:** The program operates in a series of steps, each designed to build upon the previous one. It prioritizes creating reusable tools and verifying results at each stage.
5. **JSON Output:** All outputs are formatted as JSON, ensuring consistent and parsable data.
In essence, this is a sophisticated data extraction and analysis tool that mimics the process of a human data scientist, carefully exploring and structuring a large dataset before performing targeted queries.