> ## Documentation Index
> Fetch the complete documentation index at: https://lancedb-bcbb4faf-docs-hermes-agent-memory-integration.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Full-Text Search (FTS) Index

> Create and tune BM25-based full-text search indexes in LanceDB.

export const FtsIndexWait = "table_name = \"fts-index-wait\"\n\ntable = db.open_table(table_name)\ntable.create_fts_index(\"text\")\n\nindex_name = \"text_idx\"\ntable.wait_for_index([index_name])\n";

export const FtsIndexNested = "from lancedb.query import MatchQuery, PhraseQuery\n\ntable = db.open_table(\"fts-index-nested\")\n\n# Index a text leaf inside a struct column using a dotted path.\ntable.create_fts_index(\"payload.text\", with_position=True)\n\n# The same dotted path works in MatchQuery and PhraseQuery.\nmatches = (\n    table.search(MatchQuery(\"puppy\", \"payload.text\")).limit(5).to_list()\n)\nphrases = (\n    table.search(PhraseQuery(\"puppy runs\", \"payload.text\"))\n    .limit(5)\n    .to_list()\n)\n";

export const FtsIndexCreate = "table_name = \"fts-index-create\"\ntable = db.open_table(table_name)\ntable.create_fts_index(\"text\")\n";

export const FtsIndexAsync = "import asyncio\n\nimport lancedb\nimport polars as pl\nfrom lancedb.index import FTS\n\ndata = pl.DataFrame(\n    {\n        \"id\": [1, 2],\n        \"text\": [\n            \"His first language is spanish\",\n            \"Her first language is english\",\n        ],\n    }\n)\n\nasync def main(data: pl.DataFrame):\n    uri = \"ex_lancedb\"\n    db = await lancedb.connect_async(uri)\n    tbl = await db.create_table(\"my_text\", data=data, mode=\"overwrite\")\n\n    await tbl.create_index(\"text\", config=FTS(language=\"English\"))\n\n    response = await tbl.search(\"spanish\", query_type=\"fts\")\n    result = await response.limit(1).to_polars()\n    print(result)\n    return result\n\nif __name__ == \"__main__\":\n    asyncio.run(main(data))\n";

LanceDB provides performant full-text search based on BM25, allowing you to incorporate keyword-based search in your retrieval solutions. This page shows
examples on how to create and configure FTS indexes in LanceDB OSS and Enterprise, using the synchronous and asynchronous APIs.

<Note>
  In LanceDB Enterprise, `create_fts_index` API returns immediately, but index building happens asynchronously.
</Note>

## Creating FTS Indexes

### Synchronous API

Use `create_fts_index` with synchronous LanceDB connections:

<CodeGroup>
  <CodeBlock filename="Python" language="Python" icon="python">
    {FtsIndexCreate}
  </CodeBlock>
</CodeGroup>

Check FTS index status using the API:

<CodeGroup>
  <CodeBlock filename="Python" language="Python" icon="python">
    {FtsIndexWait}
  </CodeBlock>
</CodeGroup>

`wait_for_index(...)` waits until the named FTS index exists and `index_stats(...)` reports `num_unindexed_rows == 0`. It can time out if writes keep adding rows faster than the index catches up. If a table has multiple FTS indexes, specify the target text column when querying instead of relying on implicit selection.

### Asynchronous API

When using async connections (`connect_async`), use `create_index` with the `FTS` configuration:

<CodeGroup>
  <CodeBlock filename="Python" language="Python" icon="python">
    {FtsIndexAsync}
  </CodeBlock>
</CodeGroup>

<Note>
  The `create_fts_index` method is not available on `AsyncTable`. Use `create_index` with `FTS` config instead.
</Note>

## Nested field paths

FTS indexes can target text leaves inside struct columns by passing a dotted path (for example, `payload.text`). The same path works for [`MatchQuery`](/search/full-text-search) and [`PhraseQuery`](/search/full-text-search), and for the `columns` argument on async `nearest_to_text` queries.

You can point an index at any string leaf nested in a struct, regardless of depth. The struct container itself isn't indexable: you have to name a specific text field.

<CodeGroup>
  <CodeBlock filename="Python" language="Python" icon="python">
    {FtsIndexNested}
  </CodeBlock>
</CodeGroup>

LanceDB rejects paths that don't resolve to a text leaf:

* A struct container (for example, `payload`): raises `ValueError: FTS index cannot be created ...`.
* A non-text leaf such as an integer or float (for example, `payload.count`): raises the same error.
* A path that doesn't exist in the schema (for example, `payload.missing`): raises `ValueError: Field path ... not found`.

The async API accepts the same dotted paths through `create_index`:

```python Python icon="python" theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
from lancedb.index import FTS

await async_table.create_index("payload.text", config=FTS(with_position=True))
```

## Configuration Options

### FTS Parameters

| Parameter           | Type       | Default     | Description                                                                                       |
| :------------------ | :--------- | :---------- | :------------------------------------------------------------------------------------------------ |
| `with_position`     | bool       | `False`     | Store token positions (required for phrase queries)                                               |
| `base_tokenizer`    | str        | `"simple"`  | Text splitting method (`simple`, `whitespace`, `raw`, `ngram`, `jieba/*`, or `lindera/*`)         |
| `language`          | str        | `"English"` | Language for stemming/stop words                                                                  |
| `max_token_length`  | int        | `40`        | Maximum token size; longer tokens are omitted                                                     |
| `lower_case`        | bool       | `True`      | Lowercase tokens                                                                                  |
| `stem`              | bool       | `True`      | Apply stemming (`running` → `run`)                                                                |
| `remove_stop_words` | bool       | `True`      | Drop common stop words                                                                            |
| `ascii_folding`     | bool       | `True`      | Normalize accented characters                                                                     |
| `custom_stop_words` | list\[str] | `None`      | Extra stop words to drop in addition to the language defaults. Requires `remove_stop_words=True`. |
| `ngram_min_length`  | int        | `3`         | Minimum n-gram length. Applies only when `base_tokenizer="ngram"`.                                |
| `ngram_max_length`  | int        | `3`         | Maximum n-gram length. Applies only when `base_tokenizer="ngram"`.                                |
| `prefix_only`       | bool       | `False`     | Index only prefix n-grams rather than all substrings. Applies only when `base_tokenizer="ngram"`. |

<Note title="Key parameters">
  * `max_token_length` can filter out base64 blobs or long URLs.
  * Disabling `with_position` reduces index size but disables phrase queries.
  * `ascii_folding` helps with international text (e.g., “café” → “cafe”).
</Note>

Model-backed tokenizers such as `jieba/default` and `lindera/ipadic` require tokenizer model files in Lance's language model home. Lance looks under the default platform data directory for `lance/language_models`, or you can set `LANCE_LANGUAGE_MODEL_HOME` to point to another model root. For example, `jieba/default` is resolved under `<model-home>/jieba/default/...`.

### Phrase Query Configuration

Enable phrase queries by setting:

| Parameter           | Required Value | Purpose                                       |
| :------------------ | :------------- | :-------------------------------------------- |
| `with_position`     | `True`         | Track token positions for phrase matching     |
| `remove_stop_words` | `False`        | Preserve stop words for exact phrase matching |

## Indexing nested string fields

You can build an FTS index on a string field inside a struct by passing its full dotted path, like `nested.text`. The same path is used when you query the index through `fts_columns`, and the indexed column is reported back as the full path from `list_indices()`.

```python theme={"theme":{"light":"vitesse-light","dark":"catppuccin-mocha"}}
# Schema: pa.struct([pa.field("text", pa.string())]) stored under the `nested` column.
table.create_fts_index("nested.text")

results = (
    table.search("puppy", query_type="fts", fts_columns="nested.text")
    .limit(5)
    .to_list()
)
```

<Note>
  Use the canonical Lance path: dot-separate each struct field from root to leaf (for example, `metadata.author.name`). The same convention applies to scalar and vector indexes.
</Note>
