零成本本地大模型！用 Next.js + Ollama + Qwen3 打造流式聊天应用

大家好！今天我来给大家分享一个非常实用的技术实现：如何在本地零成本搭建一个可以流式输出的 AI 聊天应用。不需要昂贵的 API 密钥，也不需要复杂的配置，跟着我一步步来，你也能拥有属于自己的本地 AI 助手！

一、先看效果

最终我们能实现这样一个功能：

在本地运行大模型（Qwen3:4B）
Next.js 作为后端服务，实现流式转发
前端实时展示 AI 的响应，打字机效果拉满

二、准备工作

1. 安装并启动 Ollama

Ollama 是一个非常优秀的本地大模型运行工具，支持各种主流开源模型。

下载安装： 访问 Ollama 官网下载对应系统的安装包，Windows/macOS/Linux 都支持。

验证安装： 安装完成后，打开终端运行：

bash 复制代码

ollama --version

如果看到版本号，说明安装成功啦！

2. 下载 Qwen3:4B 模型

Qwen 是阿里开源的系列模型，Qwen3:4B 体积小、速度快，非常适合在普通电脑上运行。

在终端中运行：

bash 复制代码

ollama pull qwen3:4b

等待下载完成后，我们可以测试一下：

bash 复制代码

ollama run qwen3:4b

如果能正常和 AI 对话，说明模型已经准备好了！按 Ctrl+C 退出。

三、Next.js 项目搭建

如果你还没有 Next.js 项目，可以快速创建一个：

bash 复制代码

npx create-next-app@latest my-ai-app
cd my-ai-app

四、核心实现：API 路由开发

这是最关键的一步，我们需要创建一个 Next.js API 路由来对接 Ollama，并实现流式输出。

在 app/api/ollama/route.ts 中：

typescript 复制代码

import { NextRequest } from 'next/server'

export async function POST(request: NextRequest) {
  try {
    const { prompt, model = 'qwen3:4b' } = await request.json()

    if (!prompt) {
      return new Response('Prompt is required', { status: 400 })
    }

    const ollamaResponse = await fetch('http://localhost:11434/api/generate', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        model,
        prompt,
        stream: true,
      }),
    })

    if (!ollamaResponse.ok) {
      return new Response('Failed to connect to Ollama', { status: 500 })
    }

    const encoder = new TextEncoder()
    const decoder = new TextDecoder()
    const stream = new ReadableStream({
      async start(controller) {
        const reader = ollamaResponse.body?.getReader()
        if (!reader) {
          controller.close()
          return
        }

        try {
          while (true) {
            const { done, value } = await reader.read()
            if (done) break
            
            const chunk = decoder.decode(value, { stream: true })
            const lines = chunk.split('\n').filter(line => line.trim())
            
            for (const line of lines) {
              try {
                const data = JSON.parse(line)
                if (data.response) {
                  controller.enqueue(encoder.encode(data.response))
                }
                if (data.done) {
                  break
                }
              } catch {
                continue
              }
            }
          }
        } finally {
          reader.releaseLock()
          controller.close()
        }
      },
    })

    return new Response(stream, {
      headers: {
        'Content-Type': 'text/plain; charset=utf-8',
        'Transfer-Encoding': 'chunked',
      },
    })
  } catch (error) {
    console.error('Ollama API error:', error)
    return new Response('Internal server error', { status: 500 })
  }
}

技术要点解析：

流式转发的核心 ：使用 ReadableStream 创建自定义流式响应
数据解析 ：Ollama 返回的是每行一个 JSON 对象，我们逐行解析并提取 response 字段
Transfer-Encoding: chunked：这个响应头告诉浏览器这是一个分块传输的流式响应

五、前端实现：自定义 Hook + 流式展示

为了代码的复用性和可维护性，我们把流式处理逻辑封装成一个自定义 Hook。

1. 创建 `useOllamaStream.ts`

typescript 复制代码

'use client'

import { useState, useCallback, useRef } from 'react'

interface UseOllamaStreamOptions {
  onChunk?: (chunk: string) => void
  onError?: (error: Error) => void
  onComplete?: () => void
}

export function useOllamaStream(options: UseOllamaStreamOptions = {}) {
  const [response, setResponse] = useState('')
  const [isLoading, setIsLoading] = useState(false)
  const [error, setError] = useState<string | null>(null)
  const abortControllerRef = useRef<AbortController | null>(null)

  const cancel = useCallback(() => {
    if (abortControllerRef.current) {
      abortControllerRef.current.abort()
      abortControllerRef.current = null
    }
  }, [])

  const sendMessage = useCallback(async (prompt: string) => {
    if (!prompt.trim()) return

    setIsLoading(true)
    setResponse('')
    setError(null)
    
    const controller = new AbortController()
    abortControllerRef.current = controller

    try {
      const res = await fetch('/api/ollama', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({ prompt }),
        signal: controller.signal,
      })

      if (!res.ok) {
        throw new Error(`HTTP error! status: ${res.status}`)
      }

      const reader = res.body?.getReader()
      const decoder = new TextDecoder()

      if (reader) {
        while (true) {
          const { done, value } = await reader.read()
          if (done) break
          
          const chunk = decoder.decode(value)
          setResponse(prev => prev + chunk)
          options.onChunk?.(chunk)
        }
      }
      
      options.onComplete?.()
    } catch (err) {
      if (err instanceof DOMException && err.name === 'AbortError') {
        setError('Request cancelled')
      } else {
        const errorMessage = err instanceof Error ? err.message : 'An error occurred'
        setError(errorMessage)
        options.onError?.(err instanceof Error ? err : new Error(errorMessage))
      }
    } finally {
      setIsLoading(false)
      abortControllerRef.current = null
    }
  }, [options])

  return {
    response,
    isLoading,
    error,
    sendMessage,
    cancel,
  }
}

2. 创建主页面 `page.tsx`

typescript 复制代码

'use client'

import React, { useState, useCallback, useMemo, useRef, useEffect } from 'react'
import { useOllamaStream } from './useOllamaStream'

const ResponseDisplay = React.memo(({ response, isLoading }: { response: string; isLoading: boolean }) => {
  const responseRef = useRef<HTMLDivElement>(null)

  useEffect(() => {
    if (responseRef.current) {
      responseRef.current.scrollTop = responseRef.current.scrollHeight
    }
  }, [response])

  if (!response && !isLoading) return null

  return (
    <div style={{ 
      border: '1px solid #ccc', 
      padding: '15px', 
      borderRadius: '5px',
      maxHeight: '400px',
      overflowY: 'auto',
    }} ref={responseRef}>
      <h3 style={{ marginTop: 0 }}>Response:</h3>
      <div style={{ whiteSpace: 'pre-wrap', wordBreak: 'break-word' }}>
        {response}
        {isLoading && <span style={{ opacity: 0.5 }}>▋</span>}
      </div>
    </div>
  )
})

ResponseDisplay.displayName = 'ResponseDisplay'

export default function Page() {
  const [prompt, setPrompt] = useState('')

  const { response, isLoading, error, sendMessage, cancel } = useOllamaStream()

  const handleSubmit = useCallback((e: React.FormEvent) => {
    e.preventDefault()
    if (isLoading) {
      cancel()
    } else {
      sendMessage(prompt)
    }
  }, [prompt, isLoading, sendMessage, cancel])

  const containerStyle = useMemo(() => ({
    maxWidth: '800px',
    margin: '0 auto',
    padding: '20px',
  }), [])

  const textareaStyle = useMemo(() => ({
    width: '100%',
    minHeight: '100px',
    padding: '10px',
    marginBottom: '10px',
    fontSize: '16px',
    resize: 'vertical' as const,
  }), [])

  const buttonStyle = useMemo(() => ({
    padding: '10px 20px',
    fontSize: '16px',
    cursor: isLoading ? 'not-allowed' : 'pointer',
    backgroundColor: isLoading ? '#ff4444' : '#0070f3',
    color: 'white',
    border: 'none',
    borderRadius: '5px',
    marginRight: '10px',
  }), [isLoading])

  return (
    <div style={containerStyle}>
      <h1>Hello InstantMind</h1>
      
      {error && (
        <div style={{ 
          backgroundColor: '#ffebee', 
          color: '#c62828', 
          padding: '10px', 
          borderRadius: '5px', 
          marginBottom: '15px' 
        }}>
          Error: {error}
        </div>
      )}
      
      <form onSubmit={handleSubmit} style={{ marginBottom: '20px' }}>
        <textarea
          value={prompt}
          onChange={(e) => setPrompt(e.target.value)}
          placeholder="Enter your prompt here..."
          style={textareaStyle}
          disabled={isLoading}
        />
        <div>
          <button
            type="submit"
            style={buttonStyle}
          >
            {isLoading ? 'Cancel' : 'Send'}
          </button>
        </div>
      </form>
      
      <ResponseDisplay response={response} isLoading={isLoading} />
    </div>
  )
}

六、性能优化亮点

useCallback 缓存函数：避免不必要的函数重新创建
useMemo 缓存样式 ：样式对象每次渲染都是新的，用 useMemo 可以避免子组件不必要的重渲染
React.memo 缓存子组件 ：ResponseDisplay 组件只有在 response 或 isLoading 变化时才重新渲染
AbortController 取消请求：支持中途取消生成，体验更好
自动滚动到底部：内容过长时自动跟随

七、运行项目

启动 Next.js 开发服务器：

bash 复制代码

npm run dev

然后访问 http://localhost:3000/instamind，输入问题试试看！

八、常见问题

Q: Ollama 连接失败怎么办？ A: 确保 Ollama 服务正在运行，检查 http://localhost:11434 是否可以访问。

Q: 响应速度很慢？ A: 可以尝试更小的模型，比如 qwen3:1.8b，或者升级电脑硬件。

Q: 可以更换其他模型吗？ A: 当然！在 API 路由中修改 model 参数即可，Ollama 支持的模型都可以用。

总结

今天我们实现了：

✅ 本地 Ollama 服务搭建
✅ Qwen3 模型下载和运行
✅ Next.js API 路由流式转发
✅ 前端流式响应展示
✅ 请求取消、自动滚动等优化

这一套方案完全免费，所有数据都在本地，隐私性极佳！快去试试吧！

如果你觉得这篇文章对你有帮助，别忘了点赞、收藏、关注三连！有问题也可以在评论区交流~

📦 完整代码

本博客对应的代码已发布 v0.0.1 版本：

👉 GitHub Release - v0.0.1

参考资料：

零成本本地大模型！用 Next.js + Ollama + Qwen3 打造流式聊天应用

一、先看效果

二、准备工作

1. 安装并启动 Ollama

2. 下载 Qwen3:4B 模型

三、Next.js 项目搭建

四、核心实现：API 路由开发

技术要点解析：

五、前端实现：自定义 Hook + 流式展示

1. 创建 useOllamaStream.ts

2. 创建主页面 page.tsx

六、性能优化亮点

七、运行项目

八、常见问题

总结

📦 完整代码

1. 创建 `useOllamaStream.ts`

2. 创建主页面 `page.tsx`