Class SitemapLoader

Deprecated

Import from "@langchain/community/document_loaders/web/sitemap" instead. This entrypoint will be removed in 0.3.0.

Hierarchy (view full)

CheerioWebBaseLoader
- SitemapLoader

Implements

SitemapLoaderParams

Index

Constructors

constructor

new SitemapLoader(webPath, params?): SitemapLoader
Parameters
- webPath: string
- params: SitemapLoaderParams = {}
Returns SitemapLoader
Overrides CheerioWebBaseLoader.constructor
- Defined in langchain/src/document_loaders/web/sitemap.ts:51

Properties

allowUrlPatterns

allowUrlPatterns: undefined | (string | RegExp)[]

caller

caller: AsyncCaller

chunkSize

chunkSize: number

The size to chunk the sitemap URLs into for scraping.

Default

{300}

`Optional`selector

selector?: SelectorType

The selector to use to extract the text from the document. Defaults to "body".

`Optional`textDecoder

textDecoder?: TextDecoder

The text decoder to use to decode the response. Defaults to UTF-8.

timeout

timeout: number

The timeout in milliseconds for the fetch request. Defaults to 10s.

webPath

webPath: string

Methods

load

load(): Promise<Document<Record<string, any>>[]>
Extracts the text content from the loaded document using the selector and creates a Document instance with the extracted text and metadata. It returns an array of Document instances.

Returns Promise<Document<Record<string, any>>[]>
A Promise that resolves to an array of Document instances.
Overrides CheerioWebBaseLoader.load
- Defined in langchain/src/document_loaders/web/sitemap.ts:153

loadAndSplit

loadAndSplit(splitter?): Promise<Document<Record<string, any>>[]>
Parameters
- Optionalsplitter: BaseDocumentTransformer<DocumentInterface<Record<string, any>>[], DocumentInterface<Record<string, any>>[]>
Returns Promise<Document<Record<string, any>>[]>
A Promise that resolves with an array of Document instances, each split according to the provided TextSplitter.

Deprecated
Use this.load() and splitter.splitDocuments() individually. Loads the documents and splits them using a specified text splitter.
Inherited from CheerioWebBaseLoader.loadAndSplit
- Defined in langchain-core/dist/document_loaders/base.d.ts:27

parseSitemap

parseSitemap(): Promise<SiteMapElement[]>
Returns Promise<SiteMapElement[]>
- Defined in langchain/src/document_loaders/web/sitemap.ts:72

scrape

scrape(): Promise<CheerioAPI>
Fetches the web document from the webPath and loads it using Cheerio. It returns a CheerioAPI instance.

Returns Promise<CheerioAPI>
A Promise that resolves to a CheerioAPI instance.
Inherited from CheerioWebBaseLoader.scrape
- Defined in langchain/src/document_loaders/web/cheerio.ts:122

`Static`imports

imports(): Promise<{
    load: ((content:
        | string
        | Buffer
        | AnyNode
        | AnyNode[], options?: null | CheerioOptions, isDocument?: boolean) => CheerioAPI);
}>
A static method that dynamically imports the Cheerio library and returns the load function. If the import fails, it throws an error.

Returns Promise<{
    load: ((content:
        | string
        | Buffer
        | AnyNode
        | AnyNode[], options?: null | CheerioOptions, isDocument?: boolean) => CheerioAPI);
}>
A Promise that resolves to an object containing the load function from the Cheerio library.
Inherited from CheerioWebBaseLoader.imports
- Defined in langchain/src/document_loaders/web/cheerio.ts:149

`Static`scrapeAll

scrapeAll(urls, caller, timeout, textDecoder?, options?): Promise<CheerioAPI[]>
Fetches web documents from the given array of URLs and loads them using Cheerio. It returns an array of CheerioAPI instances.
Parameters
- urls: string[]
  An array of URLs to fetch and load.
- caller: AsyncCaller
- timeout: undefined | number
- OptionaltextDecoder: TextDecoder
- Optionaloptions: CheerioOptions
Returns Promise<CheerioAPI[]>
A Promise that resolves to an array of CheerioAPI instances.
Inherited from CheerioWebBaseLoader.scrapeAll
- Defined in langchain/src/document_loaders/web/cheerio.ts:86

Class SitemapLoader

Deprecated

Hierarchy (view full)

Implements

Index

Constructors

Properties

Methods

Constructors

constructor

Parameters

Returns SitemapLoader

Properties

allowUrlPatterns

caller

chunkSize

Default

`Optional`selector

`Optional`textDecoder

timeout

webPath

Methods

load

Returns Promise<Document<Record<string, any>>[]>

loadAndSplit

Parameters

Returns Promise<Document<Record<string, any>>[]>

Deprecated

parseSitemap

Returns Promise<SiteMapElement[]>

scrape

Returns Promise<CheerioAPI>

`Static`imports

Returns Promise<{
    load: ((content:
        | string
        | Buffer
        | AnyNode
        | AnyNode[], options?: null | CheerioOptions, isDocument?: boolean) => CheerioAPI);
}>

`Static`scrapeAll

Parameters

Returns Promise<CheerioAPI[]>

Settings

On This Page

Class SitemapLoader

Deprecated

Hierarchy (view full)

Implements

Index

Constructors

Properties

Methods

Constructors

constructor

Parameters

Returns SitemapLoader

Properties

allowUrlPatterns

caller

chunkSize

Default

Optionalselector

OptionaltextDecoder

timeout

webPath

Methods

load

Returns Promise<Document<Record<string, any>>[]>

loadAndSplit

Parameters

Returns Promise<Document<Record<string, any>>[]>

Deprecated

parseSitemap

Returns Promise<SiteMapElement[]>

scrape

Returns Promise<CheerioAPI>

Staticimports

Returns Promise<{ load: ((content: | string | Buffer | AnyNode | AnyNode[], options?: null | CheerioOptions, isDocument?: boolean) => CheerioAPI); }>

StaticscrapeAll

Parameters

Returns Promise<CheerioAPI[]>

Settings

On This Page

`Optional`selector

`Optional`textDecoder

`Static`imports

Returns Promise<{
load: ((content:
| string
| Buffer
| AnyNode
| AnyNode[], options?: null | CheerioOptions, isDocument?: boolean) => CheerioAPI);
}>

`Static`scrapeAll