Skip to main content

πŸ€– GenerateCodeNode Module

The GenerateCodeNode module dynamically generates Python code for a function named extract_data(html: str) -> dict() that extracts specific data from HTML content, adhering to a predefined output schema. It utilizes a sophisticated reasoning loop that iteratively refines the code until it successfully extracts the desired data, ensuring both syntactic correctness and semantic alignment with the user's intent.

Classes​

GenerateCodeNode​

GenerateCodeNode generates Python code to extract data from HTML based on a schema. The generated code uses the BeautifulSoup library for parsing the HTML.

Attributes​

  • llm_model: An instance of a language model client, configured for generating answers.
  • verbose (bool): A flag indicating whether to show print statements during execution.
  • output_schema: The output schema be used for the generated code returned data.
  • max_iterations (dict): Maximum number of iterations for each reasoning loop. It should have the following structure:
    {
    "overall": 10, # Maximum iterations for the overall reasoning loop
    "syntax": 3, # Maximum iterations for the syntax reasoning loop
    "execution": 3, # Maximum iterations for the execution reasoning loop
    "validation": 3, # Maximum iterations for the validation reasoning loop
    "semantic": 3 # Maximum iterations for the semantic comparison loop
    }

Methods​

  • __init__(self, input: str, output: List[str], node_config: Optional[dict] = None, node_name: str = "GenerateCode")

    • Initializes the GenerateCodeNode with the necessary configurations.
    • Args:
      • input (str): Boolean expression defining the input keys needed from the state.
      • output (List[str]): List of output keys to be updated in the state.
      • node_config (dict, optional): Additional configuration for the node.
      • node_name (str, optional): The unique identifier name for the node. Defaults to "GenerateCode".
  • execute(self, state: dict) -> dict

    • Generates Python code that extracts data from HTML based on the specified schema.
    • Leverages the analysis from the PromptRefinerNode and HtmlAnalyzerNode to improve code generation reasoning.
    • Args:
      • state (dict): The current state of the graph, containing user input, refined prompt, HTML analysis, and the reference answer.
    • Returns:
      • dict: The updated state with the generated Python code.
    • Raises:
      • KeyError: If required input keys are missing from the state.
      • RuntimeError: If the maximum number of iterations is reached without generating valid code.

Reasoning Loops​

  • overall_reasoning_loop(self, state: dict) -> dict
    • Orchestrates the iterative code refinement process, incorporating syntax, execution, validation, and semantic checks.
  • syntax_reasoning_loop(self, state: dict) -> dict
    • Iteratively refines the code until it is syntactically correct.
  • execution_reasoning_loop(self, state: dict) -> dict
    • Iteratively refines the code until it executes without errors.
  • validation_reasoning_loop(self, state: dict) -> dict
    • Iteratively refines the code until its output adheres to the specified schema.
  • semantic_comparison_loop(self, state: dict) -> dict
    • Iteratively refines the code until its output semantically aligns with the reference answer.

Note: The HtmlAnalyzerNode plays a crucial role in providing context and analysis to the GenerateCodeNode, ultimately leading to the generation of more refined and effective code.

Example Usage​

Here is an example of how to use the GenerateCodeNode class:

from generate_code_node import GenerateCodeNode

# Define a geenrate code node
generate_code_node = GenerateCodeNode(
input="user_prompt & refined_prompt & html_info & reduced_html & answer",
output=["generated_code"]
)

# Define the state
state = {
"user_prompt": "What are the attractions in Chioggia?",
"refined_prompt": "The user is asking about the attractions in Chioggia.",
"html_info": "The HTML code contains...",
"reduced_html": "<html>...</html>",
"answer": answer
}

# Execute the generate answer node
state = generate_code_node.execute(state)

# Retrieve the generated answer from the state
generated_code = state["generated_code"]

print(f"Code generated: {generated_code}")