ブログにgeminiのgoogle検索groundingを使ったサイト内検索機能を作った

2025-03-02 | 8 min read | engineering gatsbyjs cloudflare typescript gemini google-ai-studio

自然言語でサイト内検索機能を作りたかった

いくつか方法はありそうだが、ひとまずgoogleのgen ai SDKを用いてgoogle検索でgroundingしたものを作ってみた

自然言語検索のボタンをtop画面に設置した。無料のプロジェクトにしているのでいくら検索されても料金はかからないはず

画面は以下の感じ。今どきっぽいモーダルな検索画面。ちなみに精度（狙った挙動をしてくれる確率）はとても低い。適用方法としてはよくないのだろう

2025/06/18 追記

検索結果の返り値UI（HTML）を表示しないといけないという規約ができていたため、検索機能は削除した。このへんはgoogleカスタム検索と同じか

When you use grounding with Google Search, and you receive Search suggestions in your response, you must display the Search suggestions in production and in your applications. For more information on grounding with Google Search, see Grounding with Google Search documentation for Google AI Studio or Vertex AI. The UI code (HTML) is returned in the Gemini response as renderedContent, and you will need to show the HTML in your app, in accordance with the policy.

https://google.github.io/adk-docs/tools/built-in-tools/#google-search

環境

typescript: ^5.0.0
@google/genai: 0.6.1

実装

gemini APIへのリクエストをクライアントから実行すると、API KEYがみえてしまうため、cloudflare pagesのfucntionsという機能を用いた

gemini APIへのリクエスト

指定したサイト内の記事について要約と参照元を生成する。データはgoogle検索経由で取得する

targetSite以外のURLをフィルタして返している（filterメソッドのあたり）

さすが、利用しやすいよう色々な形式や数値が返り値に格納してある

返り値の参考 https://ai.google.dev/gemini-api/docs/grounding?hl=ja&lang=javascript

// functions/api/search.ts
import {
  GoogleGenAI,
  DynamicRetrievalConfigMode,
  GenerateContentConfig,
} from "@google/genai"
 
interface ContactBody {
  query: string
}
 
export interface SearchResult {
  url?: string | undefined
  summary?: string | undefined
}
 
const API_KEY = process.env.GATSBY_GEMINI_API_KEY
const targetSite = "blog.uni-3.app"
 
const generationConfig: GenerateContentConfig = {
  temperature: 0.5,
  topP: 0.0,
  topK: 40,
  maxOutputTokens: 8192,
  responseModalities: [],
  responseMimeType: "text/plain",
  systemInstruction: `"入力クエリに関する記事を${targetSite}のサイトから取得し、関連記事として要約して回答して`,
  tools: [
    {
      googleSearch: {
        dynamicRetrievalConfig: {
          dynamicThreshold: 0.0, // 必ずgroundingされる
          mode: DynamicRetrievalConfigMode.MODE_DYNAMIC,
        },
      },
    },
  ],
}
 
async function performSearchAndSummarize(
  searchQuery: string,
): Promise<SearchResult[]> {
  try {
    const ai = new GoogleGenAI({ apiKey: API_KEY })
 
    const response = await ai.models.generateContent({
      model: "gemini-2.0-flash",
      config: { ...generationConfig },
      contents: [
        `クエリ: ${searchQuery} 対象サイト: ${targetSite}`,
      ],
    })
 
    // parse
    const metaData = response.candidates?.[0].groundingMetadata
    if (!metaData?.groundingSupports) {
      // no valid answer
      return []
    }
    const res: SearchResult[] = metaData?.groundingSupports
      ? metaData.groundingSupports
          .filter(support => {
            const groundingIndex = support.groundingChunkIndices?.[0]!
            const web = metaData.groundingChunks?.[groundingIndex].web
            return targetSite.includes(web?.title!) // 条件を満たすもののみ
          })
          .map(support => {
            const groundingIndex = support.groundingChunkIndices?.[0]!
            const web = metaData.groundingChunks?.[groundingIndex].web
            const url = web?.uri
            return {
              url: url,
              summary: support?.segment?.text,
            }
          })
      : []
    return res
  } catch (error: any) {
    console.error("検索または要約中にエラーが発生しました:", error)
    return []
  }
}
 
export const onRequest = async (context: any) => {
  const url = new URL(context.request.url)
  const searchQuery = url.searchParams.get("query")
  const API_KEY = context.env.API_KEY
 
  if (!searchQuery) {
    return new Response(JSON.stringify({ error: "Search query is required" }), {
      status: 400,
      headers: { "Content-Type": "application/json" },
    })
  }
  try {
    const result = await performSearchAndSummarize(API_KEY, searchQuery)
    return new Response(JSON.stringify(result), {
      headers: { "Content-Type": "application/json" },
    })
  } catch (error: any) {
    console.error("Error in generate content:", error)
    return new Response(
      JSON.stringify({ error: error.message || "Internal Server Error" }),
      {
        status: 500,
        headers: { "Content-Type": "application/json" },
      },
    )
  }
}

クライアント側の処理

リクエスト部分のみSearchResult typeで受けとれる

const response = await fetch(`/api/search?query=${searchQuery}`, {
        method: "GET",
        headers: {
          "Content-Type": "application/json",
        },
      })
const data = await response.json()
setSearchResults(data)

設定

Google AI Studio

APIキーの作成は https://aistudio.google.com/apikey より行う。このときGCPプロジェクトが必要になる。プロジェクトの課金設定の有無で無料枠を用いるどうかが決まる

また、モデルや、パラメータ、system promptを設定しての試行錯誤はGoogle AI Studioの画面から行うことができる。右サイドバーの下の方からGrounding with Google Searchを有効化ができる

ai studio setting

また、Get codeをクリックすると各種設定値を反映した状態のコードが取得できて便利

ai studio code

cloudflare functions

本題ではないが、使ったのでメモ

もともとgatsbyjsをcloudflare pagesでホスティングしており、手軽にAPI追加できそうだったため利用した。blogのリポジトリのルート、functionsディレクトリ以下に指定された形式の関数で作成すると、deploy時に自動的に作成される。APIのパスはfunctions以下のディレクトリ構成を反映したもの（ファイルベースルーティング）となる

pagesの設定にて、API_KEYの環境変数を設定して、あとはリポジトリにコードをpushすれば使えるようになった

参考

https://developers.cloudflare.com/pages/functions/get-started/

感想

google検索によるgroundingはSDKで対応していることもあり、実装自体は簡単にできた。が、サイト内検索で用いるにはソースの範囲が広かった。何回か検索すると対象サイト以外も回答しちゃうので難しい

出力に制限かけるのは難しいし、特定の範囲の情報を使って回答させたい場合はソースデータを絞るのが一番よさそう

また、vertex aiの機能にwebサイトのコンテンツをストアして、groundingに用いることができるらしいが課金が怖いのでやめておいた

https://cloud.google.com/vertex-ai/generative-ai/docs/grounding/overview?hl=ja#ground-private

軽く料金表をみてみたが、課金ポイントがありすぎてクラクラしてくる

https://cloud.google.com/generative-ai-app-builder/pricing?hl=ja

こういう検索機能を世間ではAI searchというっぽい。AIによる回答がメインの場合はそうなるのかな。厳密な定義までは辿れなかったのでどういう名称があってるかは不明

他の方法としてはembeddingを使ったベクトル検索を用いる、MCPでソース取得、関連記事を返すみたいなのもありうる。RAGを作って検索するよりかはストレージの管理がなくなるため仕組み上楽である。MCPがもっと普及したら、toolとして分離してstep by stepで出力させるほうが見通しよくなるか