feat: initial implementation of extension malware check #4272
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Goose users need to install extensions in order to get the full utility they desire. However, our stdio MCPs are quite often coming from package ecosystems that can contain malware.
Though it's not possible to completely protect users from all forms of Malware (and other Supply Chain Attacks) implementing some basic checks for known bad packages better protects all Goose users! ❤️
Implementation
When a user attempts to activate an extension we will parse out the name of the package and/or version information (where possible) and attempt to query the OSV API to understand if the package contains malware (identified by MAL in the formatting).
If the http call fails for some reason and/or we have issues parsing we opt to fail open to ensure this doesn't present a disruption to users.
Note: This implementation only supports PyPi and NPM as package manager formats
Handling of non-provided package versions
One key choice here was we @latest syntax and no version provided syntax by checking if any version of the package has malware. Broadly speaking it's going to be rare that an extension is compromised, malware finding filed and then persisted on those versions but it "could" happen and I'm open to adjusting the approach if we have major upfront concerns
OSV/Malicious Packages
The OSV project contains vulnerability information and malware information for open-source packages. The Malware information is populated via https://github.com/ossf/malicious-packages and broadly is a trustworthy datasource.
Though OSV does contain vulnerability data we're strictly only checking for Malware in this case. We could potentially inform users of CVEs in the future using this interface but we'd want to make the product interface there much easier to understand for all users!
Testing