More
Сhoose
Contact us

PageAgent by Alibaba:
How to Add an AI Copilot to Any Website With a Few Lines of Code

PageAgent by Alibaba: How to Add an AI Copilot to Any Website With a Few Lines of Code
Category:  Web Development
Date:  
Author:  Joyboy Team
About the author

Joyboy Team

Joyboy's editorial team writes practical guides on software, apps, automation, and digital product delivery.

There's a category of problem that keeps coming up in complex web applications — dashboards, ERP systems, CRMs, admin panels — where the interface is technically functional but genuinely difficult to use. Multi-step workflows buried behind nested menus. Forms with twenty fields in the wrong order. Data that takes six clicks to find. Users who need training just to do basic tasks.

The traditional solution is UX redesign — expensive, time-consuming, and often incomplete. The 2026 solution is increasingly different: add an AI agent that lets users describe what they want in plain language and executes it on their behalf.

PageAgent is an open-source MIT-licensed library that embeds an AI agent directly into your frontend. The developer built it because they believe there's a massive design space for deploying general agents natively inside the web apps we already use, rather than treating the web merely as a dumb target for isolated bots.

It was reviewed and tested as recently as March 6, 2026, where it was demonstrated automatically inspecting a page, finding a search form, understanding a natural language date instruction like "show me articles from the day before yesterday", correctly calculating the date, and executing the search — all without any human clicking.

This guide covers what PageAgent is, how it fundamentally differs from other browser automation tools, which AI models it supports, and how to integrate it into your web application step by step.

What Makes PageAgent Different From Every Other Browser Automation Tool

There's been a flood of browser automation tools lately, but most of them feel like they were built for demos, not real work. They spin up headless browsers, require you to hand over your credentials, and break the moment a page layout changes. PageAgent takes a fundamentally different approach.

The core idea is simple: PageAgent is a JavaScript agent that lives inside the web page itself. No browser extension hell, no Python scripts, no headless Chrome instances. You drop it into a page — or use the browser extension — and it can understand and interact with the DOM directly. Because it runs in your actual browser session, it uses whatever login state you already have. No sharing passwords, no cookie juggling, no OAuth dance with a third-party service.

Unlike tools like Browser-Use that control the entire browser from the outside, PageAgent is designed as an embedded component that lives inside your website. You drop it into your app and your users can talk to the page directly. It takes a DOM-first approach rather than relying on visual recognition. PageAgent uses high-intensity DOM dehydration — stripping the DOM down to its essential structure — and pure text processing to understand page layouts. This makes it faster and more precise than screenshot-based alternatives.

PageAgent operates fundamentally differently from traditional browser automation frameworks by running within the browser's JavaScript execution context rather than controlling the browser from the outside.

In practical terms: it's a JavaScript library you import into your project. It reads your page's DOM, understands the structure and interactive elements, accepts a natural language instruction, and executes it — clicking buttons, filling forms, navigating menus, scrolling, searching — exactly as a human user would, but triggered by a text command.

What PageAgent Can Do — Real Use Cases

The use cases that PageAgent is most directly suited for: SaaS AI Copilot — ship an AI copilot in your product in lines of code with no backend rewrite needed. Smart Form Filling — turn 20-click workflows into one sentence, perfect for ERP, CRM, and admin systems. Accessibility — make any web app accessible through natural language, voice commands, screen readers, zero barrier. Multi-page Agent — extend your agent's reach across browser tabs with the optional Chrome extension.

For UAE businesses specifically, the most immediately valuable applications are:

ERP and business system simplification. If your team uses an ERP, accounting system, or CRM that requires navigating complex menus to accomplish routine tasks — "create a purchase order for supplier X for 50 units of product Y" — PageAgent can execute that workflow from a single sentence.

Customer-facing AI assistance. Embed PageAgent in your customer portal and let customers describe what they need help with rather than navigating your interface to find it. "Show me my last three invoices" or "update my delivery address" becomes a text command rather than a UX challenge.

Legacy system modernisation. Rather than rebuilding an outdated internal tool from scratch, embed PageAgent and give users a natural language layer on top of the existing interface. The underlying system doesn't need to change — the interaction model does.

Accessibility. Any web application with PageAgent embedded becomes usable through natural language, which makes it accessible to users who struggle with complex visual interfaces regardless of the reason.

Supported Models — The Full Picture

This is where the PageAgent documentation page you linked becomes particularly useful. PageAgent works with any OpenAI-compatible API endpoint, which means it supports a wide range of models in 2026.

Using Qwen (Alibaba's own model — recommended for free tier):

The fastest way to try PageAgent is with the free Demo LLM. The demo CDN uses a free testing LLM API — only Qwen and DeepSeek are available on the free demo.

javascript
import { PageAgent } from 'page-agent'

const agent = new PageAgent({
  model: 'qwen3.5-plus',
  baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
  apiKey: 'YOUR_DASHSCOPE_API_KEY',
  language: 'en-US',
})

await agent.execute('Click the login button')

Get your DashScope API key from Alibaba Cloud Model Studio at dashscope.aliyuncs.com. Qwen3.5-Plus was released February 16, 2026 — it's Alibaba's latest and most capable model with significant improvements in reasoning and agentic performance.

Using Claude (Anthropic — recommended for production):

javascript
import { PageAgent } from 'page-agent'

const agent = new PageAgent({
  model: 'claude-sonnet-4-6',
  baseURL: 'https://api.anthropic.com/v1',
  apiKey: process.env.ANTHROPIC_API_KEY,
  language: 'en-US',
})

await agent.execute('Find the invoice for TechCorp and download it as PDF')

Claude Sonnet 4.6 is the recommended model for PageAgent in production — its strong instruction following, large context window, and reliable tool use produce the most consistent results on complex multi-step page interactions.

Using GPT-5 (OpenAI):

javascript
import { PageAgent } from 'page-agent'

const agent = new PageAgent({
  model: 'gpt-5',
  baseURL: 'https://api.openai.com/v1',
  apiKey: process.env.OPENAI_API_KEY,
  language: 'en-US',
})

await agent.execute('Fill in the customer form with the data from the spreadsheet I uploaded')

Using DeepSeek (budget-friendly, strong performance):

DeepSeek V3.2 achieves approximately 90% of GPT-5's performance at 1/50th the cost — making it an attractive option for high-volume PageAgent deployments where cost per interaction matters.

javascript
import { PageAgent } from 'page-agent'

const agent = new PageAgent({
  model: 'deepseek-chat',
  baseURL: 'https://api.deepseek.com/v1',
  apiKey: process.env.DEEPSEEK_API_KEY,
  language: 'en-US',
})

await agent.execute('Search for all orders placed in the last 7 days and export them')

Using OpenRouter (access all models with one API key — recommended for flexibility):

OpenRouter gives access to 40+ models including Claude Sonnet 4.6, GPT-5.4, Gemini 3.1 Pro, and DeepSeek V3.2 through a single API key, making it easy to switch models or test which performs best for your specific use case.

javascript
import { PageAgent } from 'page-agent'

const agent = new PageAgent({
  model: 'anthropic/claude-sonnet-4-6',
  baseURL: 'https://openrouter.ai/api/v1',
  apiKey: process.env.OPENROUTER_API_KEY,
  language: 'en-US',
})

await agent.execute('Navigate to the reports section and generate this month\'s sales summary')

OpenRouter is particularly useful if you want to run PageAgent in a multi-model setup — using a cheaper model for simple interactions and escalating to a more capable model for complex multi-step tasks.

Step 1: Install PageAgent

PageAgent is an npm package — installation is straightforward in any JavaScript project.

bash
# npm
npm install page-agent

# yarn
yarn add page-agent

# pnpm
pnpm add page-agent

# bun
bun add page-agent

For use as a bookmarklet or CDN script (no build step required):

html
<!-- Add to any webpage via script tag for quick testing -->
<script src="https://cdn.jsdelivr.net/npm/page-agent/dist/page-agent.umd.js"></script>

The CDN approach is the fastest way to test PageAgent on any existing webpage without modifying your build setup.

Step 2: Basic Integration

The simplest possible integration takes three lines of meaningful code:

javascript
import { PageAgent } from 'page-agent'

// Initialise with your chosen model
const agent = new PageAgent({
  model: 'claude-sonnet-4-6',           // Model to use
  baseURL: 'https://api.anthropic.com/v1', // API endpoint
  apiKey: 'your-api-key',               // Your API key
  language: 'en-US',                    // Response language
})

// Execute a natural language instruction
const result = await agent.execute('Click the submit button')
console.log(result)

Never expose your API key in client-side code. For production use, proxy the API calls through your backend:

javascript
// Client-side: proxy through your own backend
const agent = new PageAgent({
  model: 'claude-sonnet-4-6',
  baseURL: '/api/ai-proxy',    // Your backend endpoint
  apiKey: 'proxy-auth-token',  // Auth token for your proxy
  language: 'en-US',
})
javascript
// Backend proxy (Node.js/Express example)
app.post('/api/ai-proxy', async (req, res) => {
  // Validate the request is from your authenticated user
  if (!req.user) return res.status(401).json({ error: 'Unauthorized' })
  
  // Forward to the actual AI provider
  const response = await fetch('https://api.anthropic.com/v1/messages', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.ANTHROPIC_API_KEY}`,
      'Content-Type': 'application/json',
      'anthropic-version': '2023-06-01'
    },
    body: JSON.stringify(req.body)
  })
  
  const data = await response.json()
  res.json(data)
})
Step 3: Building a User-Facing AI Copilot Interface

For a real product integration, you'll want a proper UI that lets users type instructions and see results. Here's a complete React component:

jsx
import { useState } from 'react'
import { PageAgent } from 'page-agent'

// Initialise agent once outside the component
const agent = new PageAgent({
  model: 'anthropic/claude-sonnet-4-6',
  baseURL: '/api/ai-proxy',
  apiKey: 'your-proxy-token',
  language: 'en-US',
})

export function AICopilot() {
  const [instruction, setInstruction] = useState('')
  const [status, setStatus] = useState('idle') // idle | running | done | error
  const [result, setResult] = useState(null)

  const handleExecute = async () => {
    if (!instruction.trim()) return
    
    setStatus('running')
    setResult(null)
    
    try {
      const output = await agent.execute(instruction)
      setResult(output)
      setStatus('done')
    } catch (error) {
      setResult({ error: error.message })
      setStatus('error')
    }
  }

  return (
    <div className="ai-copilot-panel">
      <div className="copilot-header">
        <span className="copilot-icon">🤖</span>
        <span>AI Assistant</span>
      </div>
      
      <div className="copilot-input-row">
        <input
          type="text"
          value={instruction}
          onChange={(e) => setInstruction(e.target.value)}
          onKeyDown={(e) => e.key === 'Enter' && handleExecute()}
          placeholder="Tell me what to do... (e.g. 'Show last month invoices')"
          disabled={status === 'running'}
        />
        <button
          onClick={handleExecute}
          disabled={status === 'running' || !instruction.trim()}
        >
          {status === 'running' ? 'Running...' : 'Go'}
        </button>
      </div>
      
      {status === 'running' && (
        <div className="copilot-status">
          Analysing page and executing your request...
        </div>
      )}
      
      {status === 'done' && (
        <div className="copilot-result success">
          ✓ Done
        </div>
      )}
      
      {status === 'error' && (
        <div className="copilot-result error">
          ✗ {result?.error || 'Something went wrong'}
        </div>
      )}
      
      {/* Suggested quick commands */}
      <div className="copilot-suggestions">
        {[
          'Show this month\'s summary',
          'Export current view to CSV',
          'Filter by last 7 days',
        ].map((suggestion) => (
          <button
            key={suggestion}
            onClick={() => setInstruction(suggestion)}
            className="suggestion-chip"
          >
            {suggestion}
          </button>
        ))}
      </div>
    </div>
  )
}
Step 4: Advanced Configuration

PageAgent exposes several configuration options that matter for production use:

javascript
const agent = new PageAgent({
  model: 'claude-sonnet-4-6',
  baseURL: '/api/ai-proxy',
  apiKey: 'your-proxy-token',
  language: 'en-US',           // en-US, ar-AE, zh-CN, etc.
  
  // Maximum number of steps the agent will take
  // Default changed from 20 to 40 in latest release
  maxSteps: 40,
  
  // Enable experimental llms.txt support
  // Agent fetches the site's llms.txt for extra context
  experimentalLlmsTxt: true,
  
  // Callback for observing agent steps as they happen
  onStep: (step) => {
    console.log(`Step ${step.index}: ${step.action} — ${step.description}`)
  },
  
  // Callback when execution completes
  onComplete: (result) => {
    console.log('Execution complete:', result)
  },
  
  // Callback on error
  onError: (error) => {
    console.error('Agent error:', error)
  }
})

Arabic language support for UAE applications:

PageAgent supports Arabic language instructions natively — important for UAE business applications serving Arabic-speaking users:

javascript
const agent = new PageAgent({
  model: 'anthropic/claude-sonnet-4-6',
  baseURL: '/api/ai-proxy',
  apiKey: 'your-token',
  language: 'ar-AE',  // Arabic (UAE)
})

// Users can now give instructions in Arabic
await agent.execute('أظهر لي فواتير الشهر الماضي')
// "Show me last month's invoices"
Step 5: Install the Chrome Extension for Multi-Tab Automation

The Page Agent Chrome extension adds cross-tab automation — it can operate across multiple browser tabs and pages. The extension performs DOM analysis locally in your browser. When you initiate a task, sanitized page structure is sent to the LLM API you configure. Your data is never collected or stored.

Install from the Chrome Web Store by searching "Page Agent Ext" or visiting the link from the PageAgent GitHub repository. The extension is rated 4.9 out of 5 stars and supports Bring Your Own LLM — use OpenAI, Anthropic, or any compatible API with full data control.

Once installed, configure your API key in the extension settings. You can then initiate multi-page automations directly from the extension panel — useful for workflows that span your CRM, your email, and your project management tool simultaneously.

Choosing the Right Model for Your Use Case

Not all tasks need the same model. Here's a practical guide to model selection for PageAgent in 2026:

For simple, single-step interactions (click a button, fill a field, navigate to a page): Use DeepSeek V3.2 or Qwen3.5-Plus. DeepSeek V3.2 at $0.27/$1.10 per million tokens delivers strong performance on straightforward tasks at a fraction of the cost of frontier models.

For medium complexity workflows (multi-step form filling, conditional navigation, data extraction): Use Claude Sonnet 4.6 or GPT-5. The stronger instruction following and reasoning of these models handles ambiguity and multi-step planning more reliably.

For complex, long-running automations (researching across pages, synthesising information, handling exceptions): Use Claude Opus 4.6. The larger context window and superior reasoning handles tasks with many steps and conditional branches most reliably.

For production at scale with cost sensitivity: Use OpenRouter with model fallbacks — primary model set to Claude Sonnet 4.6 for quality, with DeepSeek V3.2 as fallback for high-volume simpler tasks:

javascript
// OpenRouter supports automatic fallbacks
const agent = new PageAgent({
  model: 'anthropic/claude-sonnet-4-6',
  baseURL: 'https://openrouter.ai/api/v1',
  apiKey: process.env.OPENROUTER_API_KEY,
  language: 'en-US',
  // OpenRouter handles fallback routing automatically
  // if the primary model is unavailable or rate-limited
})
What PageAgent Is Not Good For

Honest assessment means acknowledging the limitations alongside the capabilities.

PageAgent works best on pages with well-structured, semantic HTML. Pages that rely heavily on canvas rendering, complex SVG interfaces, or heavily obfuscated DOM structures give the agent less to work with and produce less reliable results.

It is a client-side tool — it cannot perform server-side actions, access data that isn't visible in the browser, or interact with systems outside the current browser session. For backend automation, combine PageAgent with n8n or Make for the server-side components.

Complex conditional workflows with many exception branches — "if the form shows error X, do Y, but if it shows error Z, do something different" — can hit the maxSteps limit on complicated pages. For these cases, breaking the task into smaller sequential instructions produces more reliable results than one large compound instruction.

The Bigger Picture

The developer's thesis behind PageAgent is worth taking seriously: there's a massive design space for deploying general agents natively inside web apps we already use. Rather than building separate automation tools that interact with web interfaces from the outside, the future is agents that live inside the product — understanding its context, using its session, and acting on behalf of its users.

For UAE businesses investing in web applications in 2026, PageAgent represents one of the most practical ways to add genuine AI capability to an existing product without a backend rewrite. The integration is a few lines of JavaScript. The model choices are flexible and cost-scalable. The use cases — particularly for ERP, CRM, and complex business tool simplification — are immediately relevant.

It's not trying to replace your browser — it's trying to make the one you already use a lot smarter. For users who have been struggling with complex interfaces, that's a meaningful upgrade. For product teams who want to ship an AI copilot without building one from scratch, it's a significant shortcut.

Want an AI copilot built into your web application?

At Joyboy, we integrate AI agent capabilities — including PageAgent — into web applications for UAE businesses, turning complex interfaces into natural language experiences. Talk to us about your project.