Class SitemapLoader

Interface representing the parameters for initializing a SitemapLoader. SitemapLoaderParams

Hierarchy (view full)

CheerioWebBaseLoader
- SitemapLoader

Implements

SitemapLoaderParams

Index

Constructors

constructor

new SitemapLoader(webPath, params?): SitemapLoader
Parameters
- webPath: string
- params: SitemapLoaderParams = {}
Returns SitemapLoader
Overrides CheerioWebBaseLoader.constructor
- Defined in libs/langchain-community/src/document_loaders/web/sitemap.ts:40

Properties

allowUrlPatterns

allowUrlPatterns: undefined | (string | RegExp)[]

caller

caller: AsyncCaller

chunkSize

chunkSize: number

The size to chunk the sitemap URLs into for scraping.

Default

{300}

`Optional`headers

headers?: HeadersInit

The headers to use in the fetch request.

`Optional`selector

selector?: SelectorType

The selector to use to extract the text from the document. Defaults to "body".

`Optional`textDecoder

textDecoder?: TextDecoder

The text decoder to use to decode the response. Defaults to UTF-8.

timeout

timeout: number

The timeout in milliseconds for the fetch request. Defaults to 10s.

webPath

webPath: string

Methods

load

load(): Promise<Document<Record<string, any>>[]>
Extracts the text content from the loaded document using the selector and creates a Document instance with the extracted text and metadata. It returns an array of Document instances.

Returns Promise<Document<Record<string, any>>[]>
A Promise that resolves to an array of Document instances.
Overrides CheerioWebBaseLoader.load
- Defined in libs/langchain-community/src/document_loaders/web/sitemap.ts:142

loadAndSplit

loadAndSplit(splitter?): Promise<Document<Record<string, any>>[]>
Parameters
- Optionalsplitter: BaseDocumentTransformer<DocumentInterface<Record<string, any>>[], DocumentInterface<Record<string, any>>[]>
Returns Promise<Document<Record<string, any>>[]>
A Promise that resolves with an array of Document instances, each split according to the provided TextSplitter.

Deprecated
Use this.load() and splitter.splitDocuments() individually. Loads the documents and splits them using a specified text splitter.
Inherited from CheerioWebBaseLoader.loadAndSplit
- Defined in langchain-core/dist/document_loaders/base.d.ts:27

parseSitemap

parseSitemap(): Promise<SiteMapElement[]>
Returns Promise<SiteMapElement[]>
- Defined in libs/langchain-community/src/document_loaders/web/sitemap.ts:61

scrape

scrape(): Promise<CheerioAPI>
Fetches the web document from the webPath and loads it using Cheerio. It returns a CheerioAPI instance.

Returns Promise<CheerioAPI>
A Promise that resolves to a CheerioAPI instance.
Inherited from CheerioWebBaseLoader.scrape
- Defined in libs/langchain-community/src/document_loaders/web/cheerio.ts:125

`Static`imports

imports(): Promise<{
    load: ((content:
        | string
        | Buffer
        | AnyNode
        | AnyNode[], options?: null | CheerioOptions, isDocument?: boolean) => CheerioAPI);
}>
A static method that dynamically imports the Cheerio library and returns the load function. If the import fails, it throws an error.

Returns Promise<{
    load: ((content:
        | string
        | Buffer
        | AnyNode
        | AnyNode[], options?: null | CheerioOptions, isDocument?: boolean) => CheerioAPI);
}>
A Promise that resolves to an object containing the load function from the Cheerio library.
Inherited from CheerioWebBaseLoader.imports
- Defined in libs/langchain-community/src/document_loaders/web/cheerio.ts:154

`Static`scrapeAll

scrapeAll(urls, caller, timeout, textDecoder?, options?): Promise<CheerioAPI[]>
Fetches web documents from the given array of URLs and loads them using Cheerio. It returns an array of CheerioAPI instances.
Parameters
- urls: string[]
  An array of URLs to fetch and load.
- caller: AsyncCaller
- timeout: undefined | number
- OptionaltextDecoder: TextDecoder
- Optionaloptions: CheerioOptions & {
  headers?: HeadersInit;
  }
Returns Promise<CheerioAPI[]>
A Promise that resolves to an array of CheerioAPI instances.
Inherited from CheerioWebBaseLoader.scrapeAll
- Defined in libs/langchain-community/src/document_loaders/web/cheerio.ts:83

Class SitemapLoader

Hierarchy (view full)

Implements

Index

Constructors

Properties

Methods

Constructors

constructor

Parameters

Returns SitemapLoader

Properties

allowUrlPatterns

caller

chunkSize

Default

`Optional`headers

`Optional`selector

`Optional`textDecoder

timeout

webPath

Methods

load

Returns Promise<Document<Record<string, any>>[]>

loadAndSplit

Parameters

Returns Promise<Document<Record<string, any>>[]>

Deprecated

parseSitemap

Returns Promise<SiteMapElement[]>

scrape

Returns Promise<CheerioAPI>

`Static`imports

Returns Promise<{
    load: ((content:
        | string
        | Buffer
        | AnyNode
        | AnyNode[], options?: null | CheerioOptions, isDocument?: boolean) => CheerioAPI);
}>

`Static`scrapeAll

Parameters

Returns Promise<CheerioAPI[]>

Settings

On This Page

Class SitemapLoader

Hierarchy (view full)

Implements

Index

Constructors

Properties

Methods

Constructors

constructor

Parameters

Returns SitemapLoader

Properties

allowUrlPatterns

caller

chunkSize

Default

Optionalheaders

Optionalselector

OptionaltextDecoder

timeout

webPath

Methods

load

Returns Promise<Document<Record<string, any>>[]>

loadAndSplit

Parameters

Returns Promise<Document<Record<string, any>>[]>

Deprecated

parseSitemap

Returns Promise<SiteMapElement[]>

scrape

Returns Promise<CheerioAPI>

Staticimports

Returns Promise<{ load: ((content: | string | Buffer | AnyNode | AnyNode[], options?: null | CheerioOptions, isDocument?: boolean) => CheerioAPI); }>

StaticscrapeAll

Parameters

Returns Promise<CheerioAPI[]>

Settings

On This Page

`Optional`headers

`Optional`selector

`Optional`textDecoder

`Static`imports

Returns Promise<{
load: ((content:
| string
| Buffer
| AnyNode
| AnyNode[], options?: null | CheerioOptions, isDocument?: boolean) => CheerioAPI);
}>

`Static`scrapeAll