Skip to content

Conversation

Copilot
Copy link
Contributor

@Copilot Copilot AI commented Sep 3, 2025

This PR fixes the NotImplementedError that occurs when attempting to finetune from .pth (frozen/scripted) models in PyTorch backend.

Problem

Users encountered a RuntimeError when trying to finetune from frozen models:

dp --pt train input.json -t dpa2.pth --use-pretrain-script

The error occurred because the get_finetune_rules() function unconditionally used torch.load() with weights_only=True to load finetune models, which fails for .pth files that are created with torch.jit.save() and require torch.jit.load().

Solution

Updated get_finetune_rules() function in deepmd/pt/utils/finetune.py:

  • Added file extension detection to use appropriate loading method
  • .pt files: torch.load() with weights_only=True (existing behavior)
  • .pth files: torch.jit.load() and extract model params via get_model_def_script()

Updated training logic in deepmd/pt/train/training.py:

  • Added proper .pth support in model resuming/loading logic
  • Used strict=False when loading state dict from .pth files to handle different key structures
  • Gracefully handle missing optimizer state and step info in frozen models

The implementation follows the existing pattern used in the change_bias() function, ensuring consistency across the codebase.

Testing

  • Added comprehensive test cases covering both .pt and .pth finetune workflows
  • Verified backward compatibility with existing .pt finetune functionality
  • Tested error handling for invalid file extensions
  • Manual CLI testing confirms end-to-end workflow works correctly

Users can now successfully finetune from both checkpoint (.pt) and frozen (.pth) models:

# Works with checkpoint files (existing functionality)
dp --pt train input.json --finetune model.pt --use-pretrain-script

# Now works with frozen models (new functionality)  
dp --pt train input.json --finetune model.pth --use-pretrain-script

Fixes #4262.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

- Fix get_finetune_rules() to handle both .pt and .pth files
- Update training logic to support .pth model loading
- Add comprehensive tests for both file types
- Use appropriate loading methods: torch.load() for .pt, torch.jit.load() for .pth
- Maintain backward compatibility with existing .pt workflow

Fixes #4262
@Copilot Copilot AI changed the title [WIP] Encounting NotImplementedError when finetuning a Single-Task .pth model feat(pt): Add support for finetuning from .pth (frozen) models Sep 3, 2025
@Copilot Copilot AI requested a review from njzjz September 3, 2025 15:02
Copilot finished work on behalf of njzjz September 3, 2025 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Encounting NotImplementedError when finetuning a Single-Task .pth model
2 participants