The size to chunk the sitemap URLs into for scraping.
Optional
headersThe headers to use in the fetch request.
Optional
selectorThe selector to use to extract the text from the document. Defaults to "body".
Optional
textThe text decoder to use to decode the response. Defaults to UTF-8.
The timeout in milliseconds for the fetch request. Defaults to 10s.
Extracts the text content from the loaded document using the selector and creates a Document instance with the extracted text and metadata. It returns an array of Document instances.
A Promise that resolves to an array of Document instances.
Optional
splitter: BaseDocumentTransformer<DocumentInterface<Record<string, any>>[], DocumentInterface<Record<string, any>>[]>A Promise that resolves with an array of Document instances, each split according to the provided TextSplitter.
Static
importsA static method that dynamically imports the Cheerio library and returns the load function. If the import fails, it throws an error.
A Promise that resolves to an object containing the load function from the Cheerio library.
Static
scrapeFetches web documents from the given array of URLs and loads them using Cheerio. It returns an array of CheerioAPI instances.
An array of URLs to fetch and load.
Optional
textDecoder: TextDecoderOptional
options: CheerioOptions & { A Promise that resolves to an array of CheerioAPI instances.
Interface representing the parameters for initializing a SitemapLoader. SitemapLoaderParams