-
Notifications
You must be signed in to change notification settings - Fork 44
feat: Add Strands Agents Documentation Sync and Search Functionality #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
conftest.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might want to remove this if its empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed for pytest.
Pytest uses this to base the python path at the root
supersedes #16 |
This is super cool. What are your thoughts about having the documentation be pulled down from the public repo dynamically? Either cloning the repo or downloading the zip from GitHub. This way it's always upto-date. Another option would be to use some sort of web crawler implementation to download the Strands site (probably less practical than downloading the repo). |
…g to the MCP to search fix: correct uvx command
0208d2e
to
dacc1e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for raising this. Left some cleanliness comments but will likely come back after giving the indexer a thorough review.
I think the main question we have to answer is how the indexing should be done.
The approach now scans all markdown files. This seems to be useful especially for the related documents tool you added.
The alternative, is to leverage https://strandsagents.com/latest/llms.txt.
https://llmstxt.org/ states the following but this is not exactly applicable given we know the content is markdown. So I am curious what the quality differences are using the crawling approach compared to this predefined set.
Large language models increasingly rely on website information, but face a critical limitation: context windows are too small to handle most websites in their entirety. Converting complex HTML pages with navigation, ads, and JavaScript into LLM-friendly plain text is both difficult and imprecise.
While websites serve both human readers and LLMs, the latter benefit from more concise, expert-level information gathered in a single, accessible location. This is particularly important for use cases like development environments, where LLMs need quick access to programming documentation and APIs.
@@ -0,0 +1,67 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this can go?
@@ -0,0 +1,89 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we need this file. Seems like this can either be deleted or added as a test if we need.
@@ -0,0 +1,232 @@ | |||
#!/usr/bin/env python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be deleted as well and just handled by the pyproject.toml
@@ -0,0 +1,7 @@ | |||
#!/usr/bin/env python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needed?
@@ -0,0 +1,3 @@ | |||
[pytest] | |||
markers = | |||
asyncio: mark a test as an asyncio coroutine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be able to be removed
@@ -0,0 +1,13 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete
) | ||
logger = logging.getLogger('docs-sync') | ||
|
||
def compare_files(file1, file2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checksum? or do we even need this what is the cost of writing the file even if its different since we are already spending the time reading it (and its < 100 files)
|
||
pkg_resources = resources.files("strands_mcp_server") | ||
|
||
mcp = FastMCP( | ||
"strands-agents-mcp-server", | ||
"strands-agents-mcp-server-fuzzy", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: don't know if we need to change this
## Available Tools: | ||
|
||
1. **get_document** - Retrieve a specific document by file path | ||
2. **fuzzy_search_documents** - Fuzzy search documents with intelligent matching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we reduce the number of tools by just having smart_search? Were you seeing worse quality?
branches: | ||
- main | ||
# Run on schedule (daily at midnight UTC) - keeps documentation current even without code changes | ||
schedule: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need the cron job, would rather trigger from https://github.com/strands-agents/strandsagents.com/blob/main/.github/workflows/build-deploy.yml
Closing as #21 resolved this |
Pull Request: Add Strands Agents Documentation Sync and Search Functionality
Overview
This PR adds comprehensive documentation sync and search capabilities to the Strands Agents MCP Server. It introduces scripts to automatically sync documentation from the Strands Agents docs repository and builds an intelligent search index to make documentation easily accessible through the MCP protocol.
Key Changes
Documentation Management
• Added sync_docs.py script to synchronize documentation from the Strands Agents docs repository
• Implemented indexer.py to build relationships and cross-references between markdown files
• Created a comprehensive document index (document_index.json) for efficient searching
MCP Server Enhancements
• Enhanced server implementation with new search and documentation retrieval capabilities
• Added fuzzy search, smart search, and concept exploration functionality
• Implemented learning path generation for guided documentation exploration
Testing and CI/CD
• Added extensive test suite for documentation indexing and search functionality
• Implemented GitHub Actions workflow for automatic documentation synchronization
• Added comprehensive testing guides and workflows
Documentation
• Added detailed documentation on running a local MCP server
• Created guides for the documentation sync process
• Added testing guides and workflows
• Updated README with improved installation and usage instructions
Documentation Structure
The PR adds a well-organized documentation structure covering:
• API Reference
• Examples (Python, CDK, deployment options)
• User Guide (concepts, deployment, observability, security)
Testing
All new functionality is thoroughly tested with unit and integration tests:
• Document indexing and search capabilities
• MCP tool registration and execution
• GitHub Action workflows
• Complete end-to-end workflows
Impact
This PR significantly enhances the Strands Agents MCP Server by providing:
Related Issues
Addresses the need for improved documentation access and search capabilities in the Strands Agents MCP Server.