RubyLLM 1.6.4: Multimodal Tools & Better Schemas 🖼️
Maintenance release bringing multimodal tool responses, improved rake tasks, and important fixes for Gemini schema conversion. Plus better documentation and developer experience!
🖼️ Tools Can Now Return Files and Images
Tools can now return rich content with attachments, not just text! Perfect for screenshot tools, document generators, and visual analyzers:
class ScreenshotTool < RubyLLM::Tool
description "Takes a screenshot and returns it"
param :url, desc: "URL to screenshot"
def execute(url:)
screenshot_path = capture_screenshot(url) # Your screenshot logic
# Return a Content object with text and attachments
RubyLLM::Content.new(
"Screenshot of #{url} captured successfully",
[screenshot_path] # Can be file path, StringIO, or ActiveStorage blob
)
end
end
# The LLM can now see and analyze the screenshot
chat = RubyLLM.chat.with_tool(ScreenshotTool)
response = chat.ask("Take a screenshot of ruby-lang.org and describe what you see")
This opens up powerful workflows:
- Visual debugging: Screenshot tools that capture and analyze UI states
- Document generation: Tools that create PDFs and return them for review
- Data visualization: Generate charts and have the LLM interpret them
- Multi-step workflows: Chain tools that produce and consume visual content
Works with all providers that support multimodal content.
🔧 Fixed: Gemini Schema Conversion
Gemini's structured output was not preserving all the schema fields and integer schemas were converted to number. Now the conversion logic correctly handles:
# Preserve description
schema = {
type: 'object',
description: 'An object',
properties: {
example: {
type: "string",
description: "a brief description about the person's time at the conference"
}
},
required: ['example']
}
# Define schema with both number and integer types
schema = {
type: 'object',
properties: {
number1: {
type: 'number',
},
number2: {
type: 'integer',
}
}
}
Also added tests to cover simple and complex schemas, nested objects and arrays, all constraint attributes, nullable fields, descriptions, property ordering for objects.
Thanks to @BrianBorge for reporting and working on the initial PR.
🛠️ Developer Experience: Improved Rake Tasks
Consolidated Model Management
All model-related tasks are now streamlined and better organized:
# Default task now runs overcommit hooks + model updates
bundle exec rake
# Update models, generate docs, and create aliases in one command
bundle exec rake models
# Individual tasks still available
bundle exec rake models:update # Fetch latest models from providers
bundle exec rake models:docs # Generate model documentation
bundle exec rake models:aliases # Generate model aliases
The tasks have been refactored from 3 separate files into a single, well-organized models.rake
file following Rails conventions.
Release Preparation
New comprehensive release preparation task:
# Prepare for release: refresh cassettes, run hooks, update models
bundle exec rake release:prepare
This task:
- Automatically refreshes stale VCR cassettes (>1 day old)
- Runs overcommit hooks for code quality
- Updates models, docs, and aliases
- Ensures everything is ready for a clean release
Cassette Management
# Verify cassettes are fresh
bundle exec rake release:verify_cassettes
# Refresh stale cassettes automatically
bundle exec rake release:refresh_stale_cassettes
📚 Documentation Updates
- Redirect fix:
/installation
now properly redirects to/getting-started
- Badge refresh: README badges updated to bust GitHub's cache
- Async pattern fix: Corrected supervisor pattern example in agentic workflows guide to avoid "Cannot wait on own fiber!" errors
🧹 Additional Updates
- Appraisal gemfiles updated: All Rails version test matrices refreshed
- Test coverage: New specs for multimodal tool responses
- Provider compatibility: Verified with latest API versions
Installation
gem 'ruby_llm', '1.6.4'
Full backward compatibility maintained. The multimodal tool support is opt-in - existing tools continue working as before.
Full Changelog: 1.6.3...1.6.4