-
Notifications
You must be signed in to change notification settings - Fork 7.6k
Description
Summary
This report details the investigation and resolution of a critical bug that rendered the mcp-server-memory
toolset non-functional, causing "Unexpected non-whitespace character after JSON" errors. The root cause was a combination of a race condition in the server's file-writing logic and an environment misconfiguration that prevented the execution of the corrected code. The issue was ultimately resolved after the user manually deleted the corrupted memory.json
file, allowing the fixed server environment to take over.
Root Cause Analysis
The investigation revealed a multi-faceted root cause:
- Code Bug: The server's code lacked a file lock on the
memory.json
file. This created a race condition where concurrent writes would corrupt the file. This was demonstrated by the providedtest_race_condition.py
script. - Data Corruption: The corrupted
memory.json
file was the direct cause of the JSON parsing errors. The server would attempt to read the malformed file and fail. - Environment Issue: The primary blocker was an environment misconfiguration. The system was using
npx
to run an old, buggy version of the server from the internet, instead of the locally compiled and fixed version that included the race condition fix.
Evidence of the Bug (Race Condition)
The following Python script simulates the original buggy logic. It starts two processes that attempt to write to the same JSON file concurrently without a lock. This can result in a corrupted file, demonstrating the race condition.
Bug Simulation Code
import json
import os
from multiprocessing import Process, Manager
# The file to be shared and potentially corrupted
TEST_FILE = 'test_memory.json'
# This function simulates the buggy load/save logic from the TypeScript code
def buggy_operation(process_id, new_entry, shared_dict):
"""Simulates a single process reading, updating, and writing the file."""
try:
# 1. Load the graph (read from file)
try:
with open(TEST_FILE, 'r') as f:
data = f.read()
# In a race condition, this data could be incomplete
graph = json.loads(data) if data else {'entries': []}
except (FileNotFoundError, json.JSONDecodeError):
graph = {'entries': []}
# 2. Add the new entry
graph['entries'].append(new_entry)
# 3. Save the graph (write back to file)
with open(TEST_FILE, 'w') as f:
# This is the critical section where the race condition happens.
# One process can overwrite the other's changes.
json.dump(graph, f)
shared_dict[process_id] = 'Success'
except Exception as e:
# If a json.JSONDecodeError happens, it means we read a corrupted file
shared_dict[process_id] = f'Error: {type(e).__name__} - {e}'
def main():
"""Run two buggy operations in parallel to try and trigger the race condition."""
# Clean up previous test file if it exists
if os.path.exists(TEST_FILE):
os.remove(TEST_FILE)
with Manager() as manager:
shared_dict = manager.dict()
# Create two processes that will run the buggy operation concurrently
p1 = Process(target=buggy_operation, args=(1, {'id': 1, 'data': 'from_process_1'}, shared_dict))
p2 = Process(target=buggy_operation, args=(2, {'id': 2, 'data': 'from_process_2'}, shared_dict))
p1.start()
p2.start()
p1.join()
p2.join()
print("--- Process Results ---")
for pid, result in shared_dict.items():
print(f"Process {pid}: {result}")
print("\n--- Final File Content ---")
try:
with open(TEST_FILE, 'r') as f:
content = f.read()
print(content)
# Final check: is the file valid JSON?
final_data = json.loads(content)
print("\n--- Final Parsed Data ---")
print(final_data)
if len(final_data.get('entries', [])) != 2:
print("\n*** BUG CONFIRMED: Data was lost. Expected 2 entries. ***")
else:
print("\n*** TEST PASSED (Race condition not triggered this time) ***")
except (FileNotFoundError, json.JSONDecodeError) as e:
print(f"*** BUG CONFIRMED: File is corrupted or does not exist. ***")
print(f"Error: {e}")
finally:
# Clean up the test file
if os.path.exists(TEST_FILE):
os.remove(TEST_FILE)
if __name__ == '__main__':
main()
The Fix (Locking Mechanism)
The corrected code uses a file-based lock to ensure that only one process can write to the memory.json
file at a time, preventing the race condition. The following script demonstrates this fixed logic.
Fix Verification Code
import os
import json
from multiprocessing import Process, Manager, Lock
class FixedKnowledgeGraphManager:
def __init__(self, memory_file_path, lock):
self.memory_file_path = memory_file_path
self.lock = lock
def _load_graph_safe(self):
"""Safely loads the graph from the JSON file."""
try:
with open(self.memory_file_path, 'r') as f:
data = f.read()
return json.loads(data) if data else {'entities': [], 'relations': []}
except FileNotFoundError:
return {'entities': [], 'relations': []}
def _save_graph_safe(self, graph):
"""Safely saves the graph to the JSON file."""
with open(self.memory_file_path, 'w') as f:
json.dump(graph, f, indent=2)
def _execute_with_lock(self, operation, *args, **kwargs):
"""Executes a given file operation with a lock."""
with self.lock:
graph = self._load_graph_safe()
result, updated_graph = operation(graph, *args, **kwargs)
self._save_graph_safe(updated_graph)
return result
def create_entities(self, graph, entities):
new_entities = [e for e in entities if e['name'] not in {ex['name'] for ex in graph['entities']}]
graph['entities'].extend(new_entities)
return new_entities, graph
def create_relations(self, graph, relations):
# Simple add for testing; a real implementation would check for duplicates
graph['relations'].extend(relations)
return relations, graph
# Wrapper methods that will be called by the test processes
def safe_create_entities(self, entities):
return self._execute_with_lock(self.create_entities, entities)
def safe_create_relations(self, relations):
return self._execute_with_lock(self.create_relations, relations)
TEST_FILE = 'test_memory_fixed.json'
def safe_operation(process_id, entities_to_add, relations_to_add, shared_dict, lock):
"""Performs concurrent operations using the fixed manager."""
try:
manager = FixedKnowledgeGraphManager(TEST_FILE, lock)
manager.safe_create_entities(entities_to_add)
manager.safe_create_relations(relations_to_add)
shared_dict[process_id] = 'Success'
except Exception as e:
shared_dict[process_id] = f'Error: {type(e).__name__} - {e}'
def main():
"""Run two safe operations in parallel to verify the fix."""
if os.path.exists(TEST_FILE):
os.remove(TEST_FILE)
with Manager() as process_manager:
shared_dict = process_manager.dict()
lock = Lock()
# Define data for two processes
data_p1 = {
'entities': [{'name': 'Human', 'entityType': 'Actor'}],
'relations': [{'from': 'Human', 'to': 'Start_Bank', 'relationType': 'is_at'}]
}
data_p2 = {
'entities': [{'name': 'Goat', 'entityType': 'Actor'}],
'relations': [{'from': 'Goat', 'to': 'Start_Bank', 'relationType': 'is_at'}]
}
p1 = Process(target=safe_operation, args=(1, data_p1['entities'], data_p1['relations'], shared_dict, lock))
p2 = Process(target=safe_operation, args=(2, data_p2['entities'], data_p2['relations'], shared_dict, lock))
p1.start()
p2.start()
p1.join()
p2.join()
print("--- Process Results ---")
for pid, result in shared_dict.items():
print(f"Process {pid}: {result}")
print("\n--- Final File Content ---")
# We need a final read outside the lock to verify the final state
try:
with open(TEST_FILE, 'r') as f:
final_graph = json.load(f)
print(json.dumps(final_graph, indent=2))
num_entities = len(final_graph.get('entities', []))
num_relations = len(final_graph.get('relations', []))
print(f"\nFound {num_entities} entities and {num_relations} relations.")
if num_entities == 2 and num_relations == 2:
print("\n*** BUG FIXED: All data was written successfully. ***")
else:
print(f"\n*** TEST FAILED: Data was lost. Expected 2 entities and 2 relations. ***")
except (FileNotFoundError, json.JSONDecodeError) as e:
print(f"Error reading final file: {e}")
finally:
if os.path.exists(TEST_FILE):
os.remove(TEST_FILE)
if __name__ == '__main__':
main()
Relation to Previously Reported Bugs
This issue is consistent with previously reported bugs in the modelcontextprotocol/servers
repository, suggesting a systemic problem that has now been addressed:
- Direct Confirmation: This bug is a direct parallel to Issue "Claude will return soon" after commands like "list_allowed_directories" #2412, which reported the exact same "Unexpected non-whitespace character" error, although for the filesystem server. This confirms the problem was not isolated to a single server.
- Related Symptom: The server timeouts and unexpected exits reported in Issue Requested timeout #719 are also likely symptoms of the same underlying server instability caused by this bug.
Conclusion
The "Unexpected non-whitespace character" bug is resolved. The resolution required both a code fix (implementing a file lock to prevent the race condition) and an environment fix (ensuring the correct, locally-built server is executed). The user's manual deletion of the corrupted memory.json
file was the final step that enabled the corrected environment to function properly.