• Import from "@langchain/community/document_loaders/web/pdf" instead. This entrypoint will be removed in 0.3.0.

A document loader for loading data from PDFs.

const loader = new WebPDFLoader(new Blob());
const docs = await loader.load();
console.log({ docs });

Hierarchy (view full)

Constructors

  • Parameters

    • blob: Blob
    • __namedParameters: {
          parsedItemSeparator: undefined | string;
          pdfjs: undefined | (() => Promise<{
              getDocument: ((src:
                  | string
                  | URL
                  | ArrayBuffer
                  | TypedArray
                  | DocumentInitParameters) => PDFDocumentLoadingTask);
              version: string;
          }>);
          splitPages: undefined | boolean;
      } = {}
      • parsedItemSeparator: undefined | string
      • pdfjs: undefined | (() => Promise<{
            getDocument: ((src:
                | string
                | URL
                | ArrayBuffer
                | TypedArray
                | DocumentInitParameters) => PDFDocumentLoadingTask);
            version: string;
        }>)
      • splitPages: undefined | boolean

    Returns WebPDFLoader

Properties

blob: Blob
parsedItemSeparator: string
splitPages: boolean = true

Methods

  • Parameters

    • Optionalsplitter: BaseDocumentTransformer<DocumentInterface<Record<string, any>>[], DocumentInterface<Record<string, any>>[]>

    Returns Promise<Document<Record<string, any>>[]>

    A Promise that resolves with an array of Document instances, each split according to the provided TextSplitter.

    Use this.load() and splitter.splitDocuments() individually. Loads the documents and splits them using a specified text splitter.