Skip to content

Conversation

robbrad
Copy link

@robbrad robbrad commented Jul 22, 2025

Pull Request: Add Strands Agents Documentation Sync and Search Functionality

Overview

This PR adds comprehensive documentation sync and search capabilities to the Strands Agents MCP Server. It introduces scripts to automatically sync documentation from the Strands Agents docs repository and builds an intelligent search index to make documentation easily accessible through the MCP protocol.

Key Changes

Documentation Management

• Added sync_docs.py script to synchronize documentation from the Strands Agents docs repository
• Implemented indexer.py to build relationships and cross-references between markdown files
• Created a comprehensive document index (document_index.json) for efficient searching

MCP Server Enhancements

• Enhanced server implementation with new search and documentation retrieval capabilities
• Added fuzzy search, smart search, and concept exploration functionality
• Implemented learning path generation for guided documentation exploration

Testing and CI/CD

• Added extensive test suite for documentation indexing and search functionality
• Implemented GitHub Actions workflow for automatic documentation synchronization
• Added comprehensive testing guides and workflows

Documentation

• Added detailed documentation on running a local MCP server
• Created guides for the documentation sync process
• Added testing guides and workflows
• Updated README with improved installation and usage instructions

Documentation Structure

The PR adds a well-organized documentation structure covering:
• API Reference
• Examples (Python, CDK, deployment options)
• User Guide (concepts, deployment, observability, security)

Testing

All new functionality is thoroughly tested with unit and integration tests:
• Document indexing and search capabilities
• MCP tool registration and execution
• GitHub Action workflows
• Complete end-to-end workflows

Impact

This PR significantly enhances the Strands Agents MCP Server by providing:

  1. Comprehensive documentation access through the MCP protocol
  2. Intelligent search capabilities for finding relevant documentation
  3. Concept exploration and learning path generation
  4. Automatic documentation synchronization from the main docs repository

Related Issues

Addresses the need for improved documentation access and search capabilities in the Strands Agents MCP Server.

@robbrad robbrad requested a review from a team as a code owner July 22, 2025 21:07
@robbrad robbrad changed the title feat: Adding scripts to sync and index strands-agents docs and toolin… feat: Add Strands Agents Documentation Sync and Search Functionality Jul 22, 2025
conftest.py Outdated
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want to remove this if its empty?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed for pytest.

Pytest uses this to base the python path at the root

@robbrad
Copy link
Author

robbrad commented Jul 22, 2025

supersedes #16

@yonib05
Copy link
Member

yonib05 commented Jul 22, 2025

This is super cool. What are your thoughts about having the documentation be pulled down from the public repo dynamically? Either cloning the repo or downloading the zip from GitHub. This way it's always upto-date.

Another option would be to use some sort of web crawler implementation to download the Strands site (probably less practical than downloading the repo).

…g to the MCP to search

fix: correct uvx command
@robbrad robbrad force-pushed the feature-docs-sync-scripts branch from 0208d2e to dacc1e6 Compare July 23, 2025 05:49
Copy link
Member

@dbschmigelski dbschmigelski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for raising this. Left some cleanliness comments but will likely come back after giving the indexer a thorough review.

I think the main question we have to answer is how the indexing should be done.

The approach now scans all markdown files. This seems to be useful especially for the related documents tool you added.

The alternative, is to leverage https://strandsagents.com/latest/llms.txt.

https://llmstxt.org/ states the following but this is not exactly applicable given we know the content is markdown. So I am curious what the quality differences are using the crawling approach compared to this predefined set.

Large language models increasingly rely on website information, but face a critical limitation: context windows are too small to handle most websites in their entirety. Converting complex HTML pages with navigation, ads, and JavaScript into LLM-friendly plain text is both difficult and imprecise.

While websites serve both human readers and LLMs, the latter benefit from more concise, expert-level information gathered in a single, accessible location. This is particularly important for use cases like development environments, where LLMs need quick access to programming documentation and APIs.

@@ -0,0 +1,67 @@
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this can go?

@@ -0,0 +1,89 @@
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need this file. Seems like this can either be deleted or added as a test if we need.

@@ -0,0 +1,232 @@
#!/usr/bin/env python3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be deleted as well and just handled by the pyproject.toml

@@ -0,0 +1,7 @@
#!/usr/bin/env python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needed?

@@ -0,0 +1,3 @@
[pytest]
markers =
asyncio: mark a test as an asyncio coroutine
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be able to be removed

@@ -0,0 +1,13 @@
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete

)
logger = logging.getLogger('docs-sync')

def compare_files(file1, file2):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checksum? or do we even need this what is the cost of writing the file even if its different since we are already spending the time reading it (and its < 100 files)


pkg_resources = resources.files("strands_mcp_server")

mcp = FastMCP(
"strands-agents-mcp-server",
"strands-agents-mcp-server-fuzzy",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: don't know if we need to change this

## Available Tools:

1. **get_document** - Retrieve a specific document by file path
2. **fuzzy_search_documents** - Fuzzy search documents with intelligent matching
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we reduce the number of tools by just having smart_search? Were you seeing worse quality?

branches:
- main
# Run on schedule (daily at midnight UTC) - keeps documentation current even without code changes
schedule:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the cron job, would rather trigger from https://github.com/strands-agents/strandsagents.com/blob/main/.github/workflows/build-deploy.yml

@dbschmigelski
Copy link
Member

Closing as #21 resolved this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants