-
Notifications
You must be signed in to change notification settings - Fork 28.8k
[SPARK-51348][BUILD][SQL] Upgrade Hive to 4.1 #52099
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@vrozov
@dongjoon-hyun Do you have other suggestions because you had experience to add |
To @sarutak , I have no opinion for the PR because apparently this PR didn't pass the CI yet and what I can say for now is that Apache Spark community doesn't allow this kind of regression (of this PR). I'm not sure why Hive 4.1 upgrade proposal enforces us to downgrade the dependencies. I hope it's a mistake and to make it sure that to avoid this kind of hidden stuff. - <antlr4.version>4.13.1</antlr4.version>
+ <antlr4.version>4.9.3</antlr4.version> |
@dongjoon-hyun |
Sorry but I'm not using Apache Hive in these days in all cases including even a HiveMetaStore. So, it's a little hard for me to help this PR. According to my previous experience, I can say that this PR requires significant verification efforts even after CI passes. May I ask if @vrozov and @sarutak uses Hive 4.1 or this patch internally in the production level with AWS Glue? If then, it would be a great reference to the community. |
@dongjoon-hyun
@vrozov Can you share the background if possible? |
@dongjoon-hyun @sarutak Unfortunately it is not a mistake. Hive uses 4.9.3 that is not compatible with 4.10.x and above: "Mixing ANTLR 4.9.3 and 4.10 can lead to errors that point to a version mismatch. A very common Java error looks like this: One possible solution is to shade antlr4 in Spark, so there will be no conflict between Spark and Hive version. Please let me know what do you think and if that sounds reasonable, I'll open a separate PR for shading. |
What changes were proposed in this pull request?
Upgrade Hive dependency to 4.1.0
Why are the changes needed?
Apache Hive 1.x, 2.x and 3.x are EOL and are not longer supported by the Apache Hive community
Does this PR introduce any user-facing change?
Yes, it drops support for EOL versions of Hive 1.x, 2.x and 3.x.
There is no change in the Spark behavior and several changes on Hive 4.x compared to Hive 2.3:
How was this patch tested?
Few tests required modification due to changed Hive behavior, otherwise all existing Spark tests pass.
Was this patch authored or co-authored using generative AI tooling?
No