UnstructuredLoader
Only available on Node.js.
This notebook provides a quick overview for getting started with
UnstructuredLoader
document
loaders. For detailed documentation
of all UnstructuredLoader
features and configurations head to the API
reference.
Overview
Integration details
Class | Package | Compatibility | Local | PY support |
---|---|---|---|---|
UnstructuredLoader | @langchain/community | Node-only | ✅ | ✅ |
Setup
To access UnstructuredLoader
document loader you’ll need to install
the @langchain/community
integration package, and create an
Unstructured account and get an API key.
Local
You can run Unstructured locally in your computer using Docker. To do so, you need to have Docker installed. You can find the instructions to install Docker here.
docker run -p 8000:8000 -d --rm --name unstructured-api downloads.unstructured.io/unstructured-io/unstructured-api:latest --port 8000 --host 0.0.0.0
Credentials
Head to unstructured.io to
sign up to Unstructured and generate an API key. Once you’ve done this
set the UNSTRUCTURED_API_KEY
environment variable:
export UNSTRUCTURED_API_KEY="your-api-key"
Installation
The LangChain UnstructuredLoader integration lives in the
@langchain/community
package:
- npm
- yarn
- pnpm
npm i @langchain/community
yarn add @langchain/community
pnpm add @langchain/community
Instantiation
Now we can instantiate our model object and load documents:
import { UnstructuredLoader } from "@langchain/community/document_loaders/fs/unstructured";
const loader = new UnstructuredLoader(
"../../../../../../examples/src/document_loaders/example_data/notion.md"
);
Load
const docs = await loader.load();
docs[0];
Document {
pageContent: '# Testing the notion markdownloader',
metadata: {
filename: 'notion.md',
languages: [ 'eng' ],
filetype: 'text/plain',
category: 'NarrativeText'
},
id: undefined
}
console.log(docs[0].metadata);
{
filename: 'notion.md',
languages: [ 'eng' ],
filetype: 'text/plain',
category: 'NarrativeText'
}
Directories
You can also load all of the files in the directory using
UnstructuredDirectoryLoader
,
which inherits from
DirectoryLoader
:
import { UnstructuredDirectoryLoader } from "@langchain/community/document_loaders/fs/unstructured";
const directoryLoader = new UnstructuredDirectoryLoader(
"../../../../../../examples/src/document_loaders/example_data/",
{}
);
const directoryDocs = await directoryLoader.load();
console.log("directoryDocs.length: ", directoryDocs.length);
console.log(directoryDocs[0]);
Unknown file type: Star_Wars_The_Clone_Wars_S06E07_Crisis_at_the_Heart.srt
Unknown file type: test.mp3
directoryDocs.length: 247
Document {
pageContent: 'Bitcoin: A Peer-to-Peer Electronic Cash System',
metadata: {
filetype: 'application/pdf',
languages: [ 'eng' ],
page_number: 1,
filename: 'bitcoin.pdf',
category: 'Title'
},
id: undefined
}
API reference
For detailed documentation of all UnstructuredLoader features and configurations head to the API reference: https://api.js.langchain.com/classes/langchain_community_document_loaders_fs_unstructured.UnstructuredLoader.html