Import Transformers into Spark NLP 🚀 #5669

maziyarpanahi · 2021-06-07T12:46:58Z

maziyarpanahi
Jun 7, 2021

Import Models into Spark NLP

Overview

Since version 3.1.0, Spark NLP 🚀 has supported importing pretrained models from Hugging Face 🤗 and TensorFlow Hub into equivalent Spark NLP annotators.

This means you can bring your favorite Transformer architectures such as BERT, RoBERTa, DistilBERT, DeBERTa, XLM-RoBERTa, Longformer, CamemBERT, XLNet, and many others directly into Spark NLP pipelines for tasks like:

Embeddings (sentence, token, multimodal)
Sequence classification (sentiment, zero-shot, intent, etc.)
Token classification (NER, POS tagging)
Question answering
Text generation (summarization, translation, etc.)
Computer vision
Vision and speech models (ViT, Whisper, CLIP, etc.)
Large Language Models (LLMs) and Vision-Language Models (VLMs)

With every release, we extend this compatibility to cover more architectures and runtimes.

Quick Start

Basic Model Import

from sparknlp.annotator import BertEmbeddings

bert = (
    BertEmbeddings.loadSavedModel(
        folder_path="path/to/bert-base-uncased",
        spark_session=spark
    )
)

bert.overwrite().save("./bert_base_uncased_spark_nlp")

👉 Explore runnable end-to-end examples in our Notebook Gallery Repository. You’ll find Colab/Jupyter notebooks for each annotator and runtime (TensorFlow, ONNX, OpenVINO, Llama.cpp).

Compatibility Matrix

✅ Fully supported
❎ Under development
❌ Not supported

Text Embeddings

Annotator	TensorFlow	ONNX	OpenVINO
AlbertEmbeddings	✅	✅	✅
BGEEmbeddings	✅	✅	✅
BertEmbeddings	✅	✅	✅
BertSentenceEmbeddings	✅	✅	✅
CamemBertEmbeddings	✅	✅	✅
DeBertaEmbeddings	✅	✅	✅
DistilBertEmbeddings	✅	✅	✅
ElmoEmbeddings	✅	❌	❌
E5Embeddings	✅	✅	✅
E5VEmbeddings	❌	❌	✅
InstructorEmbeddings	✅	✅	✅
LongformerEmbeddings	✅	❌	❌
MiniLMEmbeddings	❌	✅	✅
MPNetEmbeddings	✅	✅	✅
MxbaiEmbeddings	✅	✅	❌
NomicEmbeddings	❌	✅	✅
RoBertaEmbeddings	✅	✅	✅
RoBertaSentenceEmbeddings	✅	✅	✅
SnowFlakeEmbeddings	✅	✅	✅
UAEEmbeddings	✅	✅	✅
UniversalSentenceEncoder	✅	❌	❌
XlnetEmbeddings	✅	❌	❌
XlmRoBertaEmbeddings	✅	✅	✅
XlmRoBertaSentenceEmbeddings	✅	✅	✅

Sequence Classification

Annotator	TensorFlow	ONNX	OpenVINO
AlbertForSequenceClassification	✅	✅	✅
BartForZeroShotClassification	✅	✅	✅
BertForSequenceClassification	✅	✅	✅
BertForZeroShotClassification	✅	✅	✅
CamemBertForSequenceClassification	✅	✅	✅
DeBertaForSequenceClassification	✅	✅	✅
DeBertaForZeroShotClassification	✅	✅	✅
DistilBertForSequenceClassification	✅	✅	✅
DistilBertForZeroShotClassification	✅	✅	✅
LongformerForSequenceClassification	✅	❌	❌
MPNetForSequenceClassification	❌	✅	✅
RoBertaForSequenceClassification	✅	✅	✅
RoBertaForZeroShotClassification	✅	✅	✅
XlnetForSequenceClassification	✅	❌	❌
XlmRoBertaForSequenceClassification	✅	✅	✅
XlmRoBertaForZeroShotClassification	✅	✅	✅

Token Classification

Annotator	TensorFlow	ONNX	OpenVINO
AlbertForTokenClassification	✅	✅	✅
BertForTokenClassification	✅	✅	❌
CamemBertForTokenClassification	✅	✅	✅
DeBertaForTokenClassification	✅	✅	✅
DistilBertForTokenClassification	✅	✅	✅
LongformerForTokenClassification	✅	❌	❌
MPNetForTokenClassification	❌	✅	✅
RoBertaForTokenClassification	✅	✅	✅
XlnetForTokenClassification	✅	❌	❌
XlmRoBertaForTokenClassification	✅	✅	✅

Question Answering

Annotator	TensorFlow	ONNX	OpenVINO
AlbertForQuestionAnswering	✅	✅	✅
BertForQuestionAnswering	✅	✅	✅
CamemBertForQuestionAnswering	✅	✅	✅
DeBertaForQuestionAnswering	✅	✅	✅
DistilBertForQuestionAnswering	✅	✅	✅
LongformerForQuestionAnswering	✅	❌	❌
MPNetForQuestionAnswering	❌	✅	✅
RoBertaForQuestionAnswering	✅	✅	✅
TapasForQuestionAnswering	✅	❌	❌
XlmRoBertaForQuestionAnswering	✅	✅	✅

Text Generation

Annotator	TensorFlow	ONNX	OpenVINO
BartTransformer	✅	✅	✅
CoHereTransformer	❌	✅	✅
CPMTransformer	❌	✅	✅
MarianTransformer	✅	✅	❌
M2M100Transformer	❌	✅	✅
NLLBTransformer	❌	✅	✅
T5Transformer	✅	✅	✅

Computer Vision

Annotator	TensorFlow	ONNX	OpenVINO
BLIPForQuestionAnswering	✅	❌	❌
CLIPForZeroShotClassification	❌	✅	✅
ConvNextForImageClassification	✅	✅	✅
Florence2Transformer	❌	❌	✅
SwinForImageClassification	✅	✅	✅
ViTForImageClassification	✅	✅	✅
VisionEncoderDecoderForImageCaptioning	✅	✅	✅

Speech Processing

Annotator	TensorFlow	ONNX	OpenVINO
HubertForCTC	✅	✅	✅
Wav2Vec2ForCTC	✅	✅	✅
WhisperForCTC	✅	✅	✅

Large Language Models

Annotator	TensorFlow	ONNX	OpenVINO	llamacpp (GGUF)
GPT2Transformer	✅	✅	✅	❎
LLAMA2Transformer	❌	✅	✅	✅
LLAMA3Transformer	❌	✅	✅	✅
MistralTransformer	❌	✅	✅	✅
OLMoTransformer	❌	❌	✅	✅
Phi2Transformer	❌	✅	✅	✅
Phi3Transformer	❌	✅	✅	✅
Phi4Transformer	❌	✅	✅	✅
QwenTransformer	❌	✅	✅	✅
StarCoderTransformer	❌	✅	✅	✅

Vision-Language Models

Annotator	TensorFlow	ONNX	OpenVINO	llamacpp (GGUF)
Gemma3ForMultiModal	❌	❌	✅	✅
InternVLForMultiModal	❌	❌	✅	✅
JanusForMultiModal	❌	❌	✅	✅
LLAVAForMultiModal	❌	❌	✅	✅
MLLamaForMultimodal	❌	❌	✅	✅
PaliGemmaForMultiModal	❌	❌	✅	✅
Phi3Vision	❌	❌	✅	✅
Qwen2VLTransformer	❌	❌	✅	✅
SmolVLMTransformer	❌	❌	✅	✅

Importing Pretrained Models to Spark NLP

We provide a comprehensive collection of end-to-end notebooks for importing and converting pretrained models into Spark NLP. These resources cover all major annotators and runtimes:

TensorFlow models: Notebook Gallery
ONNX models: Notebook Gallery
OpenVINO models: Notebook Gallery
Llama.cpp (GGUF) models: Notebook Gallery

HuggingFace to Spark NLP (TensorFlow)

Spark NLP	Notebooks
AlbertEmbeddings
AlbertForQuestionAnswering
AlbertForSequenceClassification
AlbertForTokenClassification
BartForZeroShotClassification
BartTransformer
BertEmbeddings
BertForQuestionAnswering
BertForSequenceClassification
BertForTokenClassification
BertForZeroShotClassification
BertSentenceEmbeddings
BGEEmbeddings
CamemBertEmbeddings
CamemBertForQuestionAnswering
CamemBertForSequenceClassification
CamemBertForTokenClassification
DeBertaEmbeddings
DeBertaForQuestionAnswering
DeBertaForSequenceClassification
DeBertaForTokenClassification
DeBertaForZeroShotClassification
DistilBertEmbeddings
DistilBertForQuestionAnswering
DistilBertForSequenceClassification
DistilBertForTokenClassification
DistilBertForZeroShotClassification
ElmoEmbeddings
E5Embeddings
GPT2Transformer
InstructorEmbeddings
LongformerEmbeddings
LongformerForQuestionAnswering
LongformerForSequenceClassification
LongformerForTokenClassification
MarianTransformer
MPNetEmbeddings
MPNetForSequenceClassification
RoBertaEmbeddings
RoBertaForQuestionAnswering
RoBertaForSequenceClassification
RoBertaForTokenClassification
RoBertaForZeroShotClassification
RoBertaSentenceEmbeddings
SnowFlakeEmbeddings
T5Transformer
UAEEmbeddings
UniversalSentenceEncoder
ViTForImageClassification
VisionEncoderDecoderForImageCaptioning
Wav2Vec2ForCTC
WhisperForCTC
XlnetEmbeddings
XlnetForSequenceClassification
XlnetForTokenClassification
XlmRoBertaEmbeddings
XlmRoBertaForQuestionAnswering
XlmRoBertaForSequenceClassification
XlmRoBertaForTokenClassification
XlmRoBertaForZeroShotClassification
XlmRoBertaSentenceEmbeddings
MxbaiEmbeddings

HuggingFace to Spark NLP (ONNX)

Spark NLP	Notebooks
AlbertEmbeddings
AlbertForQuestionAnswering
AlbertForSequenceClassification
AlbertForTokenClassification
BartForZeroShotClassification
BartTransformer
BertEmbeddings
BertForQuestionAnswering
BertForSequenceClassification
BertForTokenClassification
BertForZeroShotClassification
BertSentenceEmbeddings
BGEEmbeddings
CamemBertEmbeddings
CamemBertForQuestionAnswering
CamemBertForSequenceClassification
CamemBertForTokenClassification
CoHereTransformer
CPMTransformer
DeBertaEmbeddings
DeBertaForQuestionAnswering
DeBertaForSequenceClassification
DeBertaForTokenClassification
DeBertaForZeroShotClassification
DistilBertEmbeddings
DistilBertForQuestionAnswering
DistilBertForSequenceClassification
DistilBertForTokenClassification
DistilBertForZeroShotClassification
ElmoEmbeddings
E5Embeddings
GPT2Transformer
InstructorEmbeddings
MiniLMEmbeddings
MPNetEmbeddings
MPNetForQuestionAnswering
MPNetForSequenceClassification
MPNetForTokenClassification
MxbaiEmbeddings
NLLBTransformer
NomicEmbeddings
RoBertaEmbeddings
RoBertaForQuestionAnswering
RoBertaForSequenceClassification
RoBertaForTokenClassification
RoBertaForZeroShotClassification
RoBertaSentenceEmbeddings
SnowFlakeEmbeddings
T5Transformer
UAEEmbeddings
UniversalSentenceEncoder
ViTForImageClassification
VisionEncoderDecoderForImageCaptioning
Wav2Vec2ForCTC
WhisperForCTC
XlnetEmbeddings
XlnetForSequenceClassification
XlnetForTokenClassification
XlmRoBertaEmbeddings
XlmRoBertaForQuestionAnswering
XlmRoBertaForSequenceClassification
XlmRoBertaForTokenClassification
XlmRoBertaForZeroShotClassification
XlmRoBertaSentenceEmbeddings
MxbaiEmbeddings

HuggingFace to Spark NLP (OpenVINO)

Spark NLP	Notebooks
AlbertEmbeddings
AlbertForQuestionAnswering
AlbertForSequenceClassification
AlbertForTokenClassification
BartForZeroShotClassification
BartTransformer
BertEmbeddings
BertForQuestionAnswering
BertForSequenceClassification
BertForTokenClassification
BertForZeroShotClassification
BertSentenceEmbeddings
BGEEmbeddings
CamemBertEmbeddings
CamemBertForQuestionAnswering
CamemBertForSequenceClassification
CamemBertForTokenClassification
DeBertaEmbeddings
DeBertaForQuestionAnswering
DeBertaForSequenceClassification
DeBertaForTokenClassification
DeBertaForZeroShotClassification
DistilBertEmbeddings
DistilBertForQuestionAnswering
DistilBertForSequenceClassification
DistilBertForTokenClassification
DistilBertForZeroShotClassification
ElmoEmbeddings
E5Embeddings
GPT2Transformer
InstructorEmbeddings
LongformerEmbeddings
LongformerForQuestionAnswering
LongformerForSequenceClassification
LongformerForTokenClassification
MarianTransformer
MPNetEmbeddings
MPNetForSequenceClassification
RoBertaEmbeddings
RoBertaForQuestionAnswering
RoBertaForSequenceClassification
RoBertaForTokenClassification
RoBertaForZeroShotClassification
RoBertaSentenceEmbeddings
SnowFlakeEmbeddings
T5Transformer
UAEEmbeddings
UniversalSentenceEncoder
ViTForImageClassification
VisionEncoderDecoderForImageCaptioning
Wav2Vec2ForCTC
WhisperForCTC
XlnetEmbeddings
XlnetForSequenceClassification
XlnetForTokenClassification
XlmRoBertaEmbeddings
XlmRoBertaForQuestionAnswering
XlmRoBertaForSequenceClassification
XlmRoBertaForTokenClassification
XlmRoBertaForZeroShotClassification
XlmRoBertaSentenceEmbeddings
MxbaiEmbeddings

HuggingFace to Spark NLP (Llama.cpp)

Spark NLP	Notebooks
AutoGGUFEmbeddings
AutoGGUFModel
AutoGGUFVisionModel
AutoGGUFReranker

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Import Transformers into Spark NLP 🚀 #5669

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Import Transformers into Spark NLP 🚀 #5669

Uh oh!

Uh oh!

maziyarpanahi Jun 7, 2021

Import Models into Spark NLP

Table of Contents

Overview

Quick Start

Basic Model Import

Compatibility Matrix

Text Embeddings

Sequence Classification

Token Classification

Question Answering

Text Generation

Computer Vision

Speech Processing

Large Language Models

Vision-Language Models

Importing Pretrained Models to Spark NLP

HuggingFace to Spark NLP (TensorFlow)

HuggingFace to Spark NLP (ONNX)

HuggingFace to Spark NLP (OpenVINO)

HuggingFace to Spark NLP (Llama.cpp)

Replies: 0 comments

maziyarpanahi
Jun 7, 2021