feat(filesystem): add streaming `get_file_hash` tool for cryptographic digests (md5/sha1/sha256) #2516

Pucciano · 2025-08-09T14:32:51Z

Add a streaming file-hash tool to the filesystem server with Zod-validated
inputs, allowed-roots enforcement, and optional digest encoding.

Description

This PR adds a new tool, get_file_hash, to the filesystem MCP server.

Computes cryptographic digests via Node crypto.createHash + streaming
fs.createReadStream (efficient on large files)
Supported algorithms (policy gate): md5, sha1, sha256 (default: sha256)
Output encoding: "hex" (default) or "base64" (optional)
Rejects non-regular files (directories/devices); respects roots/realpath checks
Zod input schema + ListTools registration
README updated with a tool entry consistent with existing docs

Server Details

Server: filesystem
Changes to: tools (new tool), unit tests (new tests), docs (README tools section)

Motivation and Context

I’m a computer forensics expert; verifying file integrity is critical to chain of
custody. Standards (e.g., ISO/IEC 27037, SWGDE) emphasize hashing digital evidence.
NIST recommends collision-resistant hashes (SHA-2); SHA-1/MD5 remain for legacy
identification but not for collision-sensitive uses. This tool defaults to SHA-256
while retaining MD5/SHA-1 for interoperability. Keeping the algorithm set small
improves DFIR compatibility and simplifies model prompts.

Providing get_file_hash inside the filesystem server lets LLM-driven workflows
compute/compare hashes under the same allowed-roots and realpath/symlink controls
as other file operations—no external copying, consistent and auditable results.

How Has This Been Tested?

Environment: macOS 15.5, LM Studio 0.3.22
MCP client: LM Studio (server built via Docker and used as a Docker-mapped
MCP server; tool discovered via ListTools)
Models:
- qwen/qwen3-coder-30b (benefits from explicit “when to use” + args in prompt)
- openai/gpt-oss-120b (works with concise descriptions)
- mistralai/devstral-small-2507 (tool calls succeed)
Unit tests: text vectors ("abc", "ForensicShark"), small binary snippet,
encodings (hex/base64), non-regular paths rejected (dir/symlink/device), and
unsupported algorithms (e.g., sha512, crc32, whirlpool) rejected by policy
Manual: end-to-end via stdio within LM Studio; expected digests returned;
clear error on unavailable algorithms (FIPS/build)
Platform note: Not tested on Windows

Breaking Changes

None. Additive only.

Types of changes

New feature (non-breaking)
Documentation update
Bug fix
Breaking change

Checklist

Follows MCP security best practices (roots-restricted, realpath/symlink checks)
README updated (tool entry)
Code follows repo style guidelines
Appropriate error handling (unsupported algorithms / non-regular files)
Inputs documented (path, algorithm, encoding)
Tested with an LLM client (LM Studio)
New and existing tests pass locally
CI not included in this PR (out of scope)

Additional context

Paths should be relative to the allowed base directory (as returned by
list_directory) when calling the tool from clients.
Tool description made concise and Qwen-friendly; instructs models to return
only the digest string.
Only md5, sha1, sha256 are supported by design; no extended algorithms
or env toggles in this PR.

Introduce a streaming file-hash tool using `crypto.createHash` and `fs.createReadStream`. Validates input via Zod and respects allowed-roots path checks. - Tool: `get_file_hash` - Args: { path: string, algorithm: "md5"|"sha1"|"sha256", encoding?: "hex"|"base64" } - Output: digest as hex (default) or base64 - Handler: added to CallToolRequest; included in ListTools Notes: - Rejects non-regular files (e.g., directories/devices) - Fails fast if algorithm is unavailable in this Node/OpenSSL build (e.g., FIPS) with a clear error message.

…ed testing Extract `getFileHash` from `index.ts` into a standalone module to avoid pulling server bootstrap and top-level await into unit tests. This decouples hashing logic from transport setup and other side effects, allowing tests to import the function directly. No behavior change: the server now imports `getFileHash` from `hash-file.ts`. This prepares the codebase for comprehensive unit tests covering success and error paths.

Add a policy gate to `getFileHash` that rejects any algorithm not in {md5, sha1, sha256}. Motivation: these are the widely used hashes in digital forensics; keeping the list small helps interoperability with DFIR tools and simplifies model/tool prompts. Also prepares unit tests to assert failure on unsupported algorithms (e.g., sha512, crc32, etc.). Runtime availability is still checked via crypto.getHashes to surface FIPS/build issues cleanly. Default remains sha256; md5/sha1 are retained for legacy sets.

Add unit tests covering hashing of the text "ForensicShark" across md5/sha1/sha256 with expected digests. Validate rejection of non-regular paths (directory, symlink to directory, device like /dev/null when present). Verify hashing of a small binary snippet across all three algorithms. Assert that unsupported algorithms (e.g. sha512, crc32, whirlpool) throw per the policy gate. Exercise both encodings (hex and base64) for text and binary cases. Tests are platform-aware: skip the device case on Windows and create a junction for the directory symlink on Windows.

…comments

Refine the `get_file_hash` tool description to be concise and Qwen-friendly with properly escaped quotes. Document encoding as optional with default "hex"; require an absolute path and state the input must be a regular file under allowed directories (not directories/devices). Instruct models to return only the digest string. This improves tool-calling reliability and matches the Zod schema and server defaults.

Pucciano added 7 commits August 8, 2025 17:21

Docs(README): include get_file_hash feature details

ea629ad

refactor(index.ts): remove unused getFileHash function and related …

14b402b

…comments

Pucciano marked this pull request as ready for review August 9, 2025 14:39

olaservo added server-filesystem Reference implementation for the Filesystem MCP server - src/filesystem enhancement New feature or request labels Aug 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(filesystem): add streaming `get_file_hash` tool for cryptographic digests (md5/sha1/sha256) #2516

feat(filesystem): add streaming `get_file_hash` tool for cryptographic digests (md5/sha1/sha256) #2516

Uh oh!

Pucciano commented Aug 9, 2025

Uh oh!

Uh oh!

feat(filesystem): add streaming get_file_hash tool for cryptographic digests (md5/sha1/sha256) #2516

Are you sure you want to change the base?

feat(filesystem): add streaming get_file_hash tool for cryptographic digests (md5/sha1/sha256) #2516

Uh oh!

Conversation

Pucciano commented Aug 9, 2025

Description

Server Details

Motivation and Context

How Has This Been Tested?

Breaking Changes

Types of changes

Checklist

Additional context

Uh oh!

Uh oh!

feat(filesystem): add streaming `get_file_hash` tool for cryptographic digests (md5/sha1/sha256) #2516

feat(filesystem): add streaming `get_file_hash` tool for cryptographic digests (md5/sha1/sha256) #2516