Skip to main content

Importing WordPress Articles into Wildix Knowledge Base

· 7 min read
Vladimir Gorobets
Vladimir Gorobets
Software Engineer

If your team publishes documentation, FAQs, or support articles in WordPress, you can make that content instantly searchable through Wildix AI features by importing it into a Wildix Knowledge Base. This guide walks through a simple script that fetches all published posts from a WordPress site and loads them into a Knowledge Base in one go.

Table of Contents

How It Works

WordPress exposes its content through a built-in REST API at /wp-json/wp/v2/posts. The script:

  1. Fetches all published posts page by page from the WordPress REST API.
  2. Creates a data source and a knowledge base to hold the content.
  3. Uploads each post as an HTML document.

Prerequisites

  • Node.js 18+ (uses the built-in fetch API)
  • A valid Wildix API key with knowledge base permissions (kb:*)
  • A WordPress site with the REST API enabled (enabled by default on all modern WordPress installations)
  • The @wildix/wim-knowledge-base-client package

Installation

npm install @wildix/wim-knowledge-base-client

Key Concepts

  • Knowledge Base: A collection of data sources that can be searched together.
  • Data Source: A logical grouping of documents — in this case, all posts from one WordPress site.
  • Document: A single indexed piece of content. Each WordPress post becomes one document.

The hierarchy is: Knowledge Base → Data Sources → Documents

Setting Up the Client

import { KnowledgeBaseClient } from '@wildix/wim-knowledge-base-client';

const API_KEY = process.env.WILDIX_API_KEY!;
const WP_BASE_URL = process.env.WP_BASE_URL!; // e.g. https://yourblog.example.com

let _client: KnowledgeBaseClient | undefined;

function getClient(): KnowledgeBaseClient {
if (!_client) {
_client = new KnowledgeBaseClient({
token: { token: () => Promise.resolve(API_KEY) },
env: 'production',
});
}
return _client;
}

Fetching WordPress Posts

WordPress limits API responses to 100 posts per page, so we need to paginate through all pages to get the full list:

interface WpPost {
id: number;
slug: string;
link: string;
date: string; // ISO 8601 — when the post was first published
modified: string; // ISO 8601 — when the post was last edited
title: { rendered: string };
content: { rendered: string };
excerpt: { rendered: string };
}

async function fetchAllWordPressPosts(): Promise<WpPost[]> {
const posts: WpPost[] = [];
let page = 1;

while (true) {
const url = `${WP_BASE_URL}/wp-json/wp/v2/posts?status=publish&per_page=100&page=${page}`;
const response = await fetch(url);

// WordPress returns 400 when the requested page exceeds the total number of pages
if (response.status === 400) break;

if (!response.ok) {
throw new Error(`WordPress API error: ${response.status} ${response.statusText}`);
}

const batch: WpPost[] = await response.json();
if (batch.length === 0) break;

posts.push(...batch);
page++;
}

return posts;
}
note

The function above collects all posts into a single array before returning. This is fine for small to medium-sized sites, but on a WordPress instance with thousands of articles the full dataset can become significant in memory. In that case, avoid buffering everything upfront and instead process each page as it arrives — pass a callback or an async generator to the caller so that uploading starts while subsequent pages are still being fetched:

async function forEachWordPressPost(
callback: (post: WpPost) => Promise<void>,
): Promise<void> {
let page = 1;

while (true) {
const url = `${WP_BASE_URL}/wp-json/wp/v2/posts?status=publish&per_page=100&page=${page}`;
const response = await fetch(url);
if (response.status === 400) break;
if (!response.ok) throw new Error(`WordPress API error: ${response.status}`);

const batch: WpPost[] = await response.json();
if (batch.length === 0) break;

// Process this page immediately — no need to wait for all pages to load
for (const post of batch) {
await callback(post);
}

page++;
}
}

Usage:

await forEachWordPressPost(async (post) => {
await importPost(dataSource.id, post);
console.log(`Imported: ${post.title.rendered}`);
});

The Import Script

Step 1 — Create a Data Source

A data source groups all the WordPress posts together:

import {
CreateDataSourceCommand,
DataSourceType,
} from '@wildix/wim-knowledge-base-client';

async function createDataSource(name: string) {
const { dataSource } = await getClient().send(
new CreateDataSourceCommand({
name,
type: DataSourceType.FILES,
}),
);
return dataSource;
}

Step 2 — Create a Knowledge Base

The knowledge base links the data source to the search engine:

import { CreateKnowledgeBaseCommand } from '@wildix/wim-knowledge-base-client';

async function createKnowledgeBase(name: string, dataSourceId: string) {
const { knowledgeBase } = await getClient().send(
new CreateKnowledgeBaseCommand({
name,
description: 'All published posts from the WordPress blog',
dataSources: [dataSourceId],
}),
);
return knowledgeBase;
}

Step 3 — Import Posts

Each WordPress post is uploaded as an HTML document. The post's numeric ID is used as originalId — a stable identifier that survives slug or title changes:

import { CreateDocumentCommand } from '@wildix/wim-knowledge-base-client';

async function importPost(dataSourceId: string, post: WpPost) {
await getClient().send(
new CreateDocumentCommand({
dataSourceId,
originalId: String(post.id),
originalFormat: 'html',
originalName: `${post.slug}.html`,
title: post.title.rendered,
content: post.content.rendered,
url: post.link,
// Strip HTML tags from the excerpt to get clean plain text
description: post.excerpt.rendered.replace(/<[^>]+>/g, '').trim(),
metadata: {
createdAt: post.date,
updatedAt: post.modified,
},
}),
);
}

Complete Script

Here is the full script combining all steps above:

import {
KnowledgeBaseClient,
CreateDataSourceCommand,
DataSourceType,
CreateKnowledgeBaseCommand,
CreateDocumentCommand,
} from '@wildix/wim-knowledge-base-client';

const API_KEY = process.env.WILDIX_API_KEY!;
const WP_BASE_URL = process.env.WP_BASE_URL!;

// ── Client ────────────────────────────────────────────────────────────────────

let _client: KnowledgeBaseClient | undefined;

function getClient() {
if (!_client) {
_client = new KnowledgeBaseClient({
token: { token: () => Promise.resolve(API_KEY) },
env: 'production',
});
}
return _client;
}

// ── WordPress API ─────────────────────────────────────────────────────────────

interface WpPost {
id: number;
slug: string;
link: string;
date: string; // ISO 8601 — when the post was first published
modified: string; // ISO 8601 — when the post was last edited
title: { rendered: string };
content: { rendered: string };
excerpt: { rendered: string };
}

async function fetchAllWordPressPosts(): Promise<WpPost[]> {
const posts: WpPost[] = [];
let page = 1;

while (true) {
const url = `${WP_BASE_URL}/wp-json/wp/v2/posts?status=publish&per_page=100&page=${page}`;
const response = await fetch(url);
if (response.status === 400) break;
if (!response.ok) throw new Error(`WordPress API error: ${response.status}`);
const batch: WpPost[] = await response.json();
if (batch.length === 0) break;
posts.push(...batch);
page++;
}

return posts;
}

// ── Main import ───────────────────────────────────────────────────────────────

async function importWordPressToKnowledgeBase() {
console.log('Fetching WordPress posts...');
const posts = await fetchAllWordPressPosts();
console.log(`Found ${posts.length} published posts.`);

const { dataSource } = await getClient().send(
new CreateDataSourceCommand({
name: 'WordPress Blog',
type: DataSourceType.FILES,
}),
);
console.log(`Data source created: ${dataSource.id}`);

const { knowledgeBase } = await getClient().send(
new CreateKnowledgeBaseCommand({
name: 'WordPress Blog KB',
description: 'All published posts from the WordPress blog',
dataSources: [dataSource.id],
}),
);
console.log(`Knowledge base created: ${knowledgeBase.id}`);

let imported = 0;
for (const post of posts) {
await getClient().send(
new CreateDocumentCommand({
dataSourceId: dataSource.id,
originalId: String(post.id),
originalFormat: 'html',
originalName: `${post.slug}.html`,
title: post.title.rendered,
content: post.content.rendered,
url: post.link,
description: post.excerpt.rendered.replace(/<[^>]+>/g, '').trim(),
metadata: { createdAt: post.date, updatedAt: post.modified },
}),
);
imported++;
console.log(` [${imported}/${posts.length}] ${post.title.rendered}`);
}

console.log(`\nDone! ${imported} posts imported.`);
console.log(`Knowledge Base ID: ${knowledgeBase.id}`);
}

importWordPressToKnowledgeBase().catch(console.error);

Running the Script

Set your environment variables and run the script:

WILDIX_API_KEY=your-api-key \
WP_BASE_URL=https://yourblog.example.com \
npx ts-node import-wordpress.ts

Expected output:

Fetching WordPress posts...
Found 48 published posts.
Data source created: ds-xxxxxxxx
Knowledge base created: kb-xxxxxxxx
[1/48] Getting Started with Our Platform
[2/48] How to Reset Your Password
[3/48] FAQ: Billing and Subscriptions
...
[48/48] Release Notes — February 2026

Done! 48 posts imported.
Knowledge Base ID: kb-xxxxxxxx

What's Next: Full Synchronization

The script above is a one-time import — it creates a fresh data source and knowledge base from scratch. A production-ready synchronization script that runs on a schedule is more involved:

  • Detect new posts: compare WordPress post IDs against documents already in the data source.
  • Detect updated posts: compare the WordPress modified timestamp against the importedAt stored in document metadata.
  • Handle deletions: remove documents for posts that have been unpublished or deleted in WordPress.
  • Paginate ListDocumentsCommand: load existing documents page by page before comparing.

For now, the simplest approach for recurring imports is to delete the data source and knowledge base before re-running this script. Proper incremental sync will be covered in a follow-up guide.

For the full list of available endpoints and request/response schemas, see the Knowledge Base API Reference.