Class PuppeteerWebBaseLoader

Deprecated

Import from "@langchain/community/document_loaders/web/puppeteer" instead. This entrypoint will be removed in 0.3.0.

Class that extends the BaseDocumentLoader class and implements the DocumentLoader interface. It represents a document loader for scraping web pages using Puppeteer.

Example

const loader = new PuppeteerWebBaseLoader("https:exampleurl.com", {
  launchOptions: {
    headless: true,
  },
  gotoOptions: {
    waitUntil: "domcontentloaded",
  },
});
const screenshot = await loader.screenshot();

Hierarchy (view full)

BaseDocumentLoader
- PuppeteerWebBaseLoader

Implements

DocumentLoader

Index

Constructors

constructor

new PuppeteerWebBaseLoader(webPath, options?): PuppeteerWebBaseLoader
Parameters
- webPath: string
- Optionaloptions: PuppeteerWebBaseLoaderOptions
Returns PuppeteerWebBaseLoader
Overrides BaseDocumentLoader.constructor
- Defined in langchain/src/document_loaders/web/puppeteer.ts:67

Properties

options

options: undefined | PuppeteerWebBaseLoaderOptions

webPath

webPath: string

Methods

load

load(): Promise<Document<Record<string, any>>[]>
Method that calls the scrape method and returns the scraped HTML content as a Document object.

Returns Promise<Document<Record<string, any>>[]>
Promise that resolves to an array of Document objects.
Implementation of DocumentLoader.load
Overrides BaseDocumentLoader.load
- Defined in langchain/src/document_loaders/web/puppeteer.ts:114

loadAndSplit

loadAndSplit(splitter?): Promise<Document<Record<string, any>>[]>
Parameters
- Optionalsplitter: BaseDocumentTransformer<DocumentInterface<Record<string, any>>[], DocumentInterface<Record<string, any>>[]>
Returns Promise<Document<Record<string, any>>[]>
A Promise that resolves with an array of Document instances, each split according to the provided TextSplitter.

Deprecated
Use this.load() and splitter.splitDocuments() individually. Loads the documents and splits them using a specified text splitter.
Implementation of DocumentLoader.loadAndSplit
Inherited from BaseDocumentLoader.loadAndSplit
- Defined in langchain-core/dist/document_loaders/base.d.ts:27

scrape

scrape(): Promise<string>
Method that calls the _scrape method to perform the scraping of the web page specified by the webPath property.

Returns Promise<string>
Promise that resolves to the scraped HTML content of the web page.
- Defined in langchain/src/document_loaders/web/puppeteer.ts:105

screenshot

screenshot(): Promise<Document<Record<string, any>>>
Screenshot a web page and return it as a Document object where the pageContent property is the screenshot encoded in base64.

Returns Promise<Document<Record<string, any>>>
A document object containing the screenshot of the page encoded in base64.
- Defined in langchain/src/document_loaders/web/puppeteer.ts:161

`Static`imports

imports(): Promise<{
launch: ((options?: PuppeteerLaunchOptions) => Promise<Browser>);
}>
Static method that imports the necessary Puppeteer modules. It returns a Promise that resolves to an object containing the imported modules.

Returns Promise<{
launch: ((options?: PuppeteerLaunchOptions) => Promise<Browser>);
}>
Promise that resolves to an object containing the imported Puppeteer modules.
- Defined in langchain/src/document_loaders/web/puppeteer.ts:170

Class PuppeteerWebBaseLoader

Deprecated

Example

Hierarchy (view full)

Implements

Index

Constructors

Properties

Methods

Constructors

constructor

Parameters

Returns PuppeteerWebBaseLoader

Properties

options

webPath

Methods

load

Returns Promise<Document<Record<string, any>>[]>

loadAndSplit

Parameters

Returns Promise<Document<Record<string, any>>[]>

Deprecated

scrape

Returns Promise<string>

screenshot

Returns Promise<Document<Record<string, any>>>

`Static`imports

Returns Promise<{
launch: ((options?: PuppeteerLaunchOptions) => Promise<Browser>);
}>

Settings

On This Page

Class PuppeteerWebBaseLoader

Deprecated

Example

Hierarchy (view full)

Implements

Index

Constructors

Properties

Methods

Constructors

constructor

Parameters

Returns PuppeteerWebBaseLoader

Properties

options

webPath

Methods

load

Returns Promise<Document<Record<string, any>>[]>

loadAndSplit

Parameters

Returns Promise<Document<Record<string, any>>[]>

Deprecated

scrape

Returns Promise<string>

screenshot

Returns Promise<Document<Record<string, any>>>

Staticimports

Returns Promise<{ launch: ((options?: PuppeteerLaunchOptions) => Promise<Browser>); }>

Settings

On This Page

`Static`imports

Returns Promise<{
launch: ((options?: PuppeteerLaunchOptions) => Promise<Browser>);
}>