Skip to main content

🐭 GenerateScraperNode Module

The GenerateScraperNode module implements a node responsible for generating a Python script for scraping a website using the specified library. It takes the user's prompt and the scraped content as input and generates a Python script that extracts the information requested by the user.

Classes​

GenerateScraperNode​

GenerateScraperNode is a node responsible for generating a Python script for scraping a website using the specified library.

Attributes​

  • llm_model: An instance of a language model client, configured for generating answers.
  • library (str): The Python library to use for scraping the website.
  • source (str): The website to scrape.
  • verbose (bool): A flag indicating whether to show print statements during execution.

Methods​

  • __init__(self, input: str, output: List[str], library: str, website: str, node_config: Optional[dict] = None, node_name: str = "GenerateScraper")

    • Initializes the GenerateScraperNode with a language model client, a library, a website, and a node name.
    • Args:
      • input (str): Boolean expression defining the input keys needed from the state.
      • output (List[str]): List of output keys to be updated in the state.
      • library (str): The Python library to use for scraping the website.
      • website (str): The website to scrape.
      • node_config (dict, optional): Additional configuration for the node.
      • node_name (str, optional): The unique identifier name for the node. Defaults to "GenerateScraper".
  • execute(self, state: dict) -> dict

    • Generates a Python script for scraping a website using the specified library.
    • Args:
      • state (dict): The current state of the graph. The input keys will be used to fetch the correct data from the state.
    • Returns:
      • dict: The updated state with the output key containing the generated answer.
    • Raises:
      • KeyError: If input keys are not found in the state, indicating that the necessary information for generating an answer is missing.

Example Usage​

Here is an example of how to use the GenerateScraperNode class:

from generate_scraper_node import GenerateScraperNode

# Define a generate scraper node
generate_scraper_node = GenerateScraperNode(
input="user_input & document",
output=["generated_script"],
library="requests",
website="https://example.com"
)

# Define the state
state = {
"user_input": "What information do you want to scrape from the website?",
"document": [document]
}

# Execute the generate scraper node
state = generate_scraper_node.execute(state)

# Retrieve the generated script from the state
generated_script = state["generated_script"]

print(f"Generated Script:\n{generated_script}")