• Import from "@langchain/community/document_loaders/web/imsdb" instead. This entrypoint will be removed in 0.3.0.

A class that extends the CheerioWebBaseLoader class. It represents a loader for loading web pages from the IMSDB (Internet Movie Script Database) website.

Hierarchy (view full)

Constructors

Properties

caller: AsyncCaller
selector?: SelectorType
textDecoder?: TextDecoder
timeout: number
webPath: string

Methods

  • An asynchronous method that loads the web page using the scrape() method inherited from the base class. It selects the element with the class 'scrtext' using the $ function provided by Cheerio and extracts the text content. It creates a Document instance with the text content as the page content and the source as metadata. It returns an array containing the Document instance.

    Returns Promise<Document<Record<string, any>>[]>

    An array containing a Document instance.

  • Parameters

    • Optionalsplitter: BaseDocumentTransformer<DocumentInterface<Record<string, any>>[], DocumentInterface<Record<string, any>>[]>

    Returns Promise<Document<Record<string, any>>[]>

    A Promise that resolves with an array of Document instances, each split according to the provided TextSplitter.

    Use this.load() and splitter.splitDocuments() individually. Loads the documents and splits them using a specified text splitter.

  • A static method that dynamically imports the Cheerio library and returns the load function. If the import fails, it throws an error.

    Returns Promise<{
        load: ((content:
            | string
            | Buffer
            | AnyNode
            | AnyNode[], options?: null | CheerioOptions, isDocument?: boolean) => CheerioAPI);
    }>

    A Promise that resolves to an object containing the load function from the Cheerio library.

  • Fetches web documents from the given array of URLs and loads them using Cheerio. It returns an array of CheerioAPI instances.

    Parameters

    • urls: string[]

      An array of URLs to fetch and load.

    • caller: AsyncCaller
    • timeout: undefined | number
    • OptionaltextDecoder: TextDecoder
    • Optionaloptions: CheerioOptions

    Returns Promise<CheerioAPI[]>

    A Promise that resolves to an array of CheerioAPI instances.