Skip to content

Conversation

tompng
Copy link
Member

@tompng tompng commented Aug 20, 2025

x = BigDecimal('9'*(9<<26)); x * x
# 270 days(estimated) → 29 seconds

BigDecimal('9'*(9<<27)).div(BigDecimal('7'*(9<<26)), 9<<26)
# 84 days(estimated) → 184 seconds

NTT(Numeric Theory Translation) multiplication

Calculates multiplication/convolution using NTT with three primes.

# Calculate convolution in mod prime1, prim2 and prime3
conv1 = ntt_inverse(ntt(a, prime1).zip(ntt(b, prime1)).map{ _1 * _2 }, prime1)
conv2 = ...
conv3 = ...
# Restore actual convolution from conv1, conv2, conv3
conv = restore_convolution_from_modulo(conv1, conv2, conv3)

Consider calculating convolution of two arrays. Each array is of size N with array[i] in 0..999999999.
Maximum value of convolution[i] is 999999999**2 * N. This value is larger than 64bit and smaller than 96bit, so we need three 32-bit primes: 29<<27|1, 26<<27|1, 24<<27|1.
These are three largest 32-bit primes that satisfies P > 999999999, and P-1 need to be a multiple of large powers of two.
Constraints from this primes, maximum N is 1<<27.

Combination of primes/sizes

BASE Primes N estimated speed (smaller is better) memo
10**9 32bit x 3 1<<27 1 (baseline) This pull request
10**3 32bit x 1 1<<12 1 N is too small
10**6 32bit x 2 1<<24 1 N is small
10**6 64bit x 1 1<<24 2 N is small, needs uint128_t
10**14 64bit x 2 1<<34 12/7 Needs uint128_t

Multiplication of various size bigdecimal

Considering xx_xx_xx * yy
Calculate by convolution(ntt(xx_xx_xx), ntt(00_00_yy)) is possible, but repeating convolution(ntt(xx), ntt(yy)) is faster.

# aaaa_bbbb_cccc * yyyy
ntt_y = ntt(yyyy) # Calculate once and reuse
convolution(ntt(aaaa), ntt_y)
convolution(ntt(bbbb), ntt_y)
convolution(ntt(cccc), ntt_y)

If n_significant_digits is both larger than 1<<<26 == 603979776, multiplication fails with Error(too large).

Newton-Raphson division

X / Y can be calculated by X * Yinv
and Yinv can be calculated only by add/sub/mult using Newton's method.

x = 0.1
10.times { x = x * (2 - 7 * x) }
x #=> 0.14285714285714285 (== 1/7.0)

Division of various size bigdecimal

Considering 1111_1111_1111_1111.div(7777_7777_7777)
Required precision is 4. Calculating inverse of 7777_7777_7777 in 4 digit is enough.
1111_1111_1111_1111 * 1.285e-12 == 1428

Considering 1111_1111_1111_1111.div(7777)
We can calculate this by repeating xxxx_xxxx.divmod(7777) 3 times.
Calculating inverse of 7777 in 4 digit is enough.

Generic case: Split X into several blocks
xxx_xxxxx_xxxxx_xxxxx / yyyy
Can be calculated by repeating xxxx_xxxxx.divmod(yyyy) 3 times.
xxx_xxxx_xxxx / yyyyy
Can be calculated by repeating xxxxx_xxxx.divmod(yyyyy) 2 times.
xxxxx_xxxxxxx / yyyyy
Can be calculated by repeating xxxxxx_xxxxxxx.divmod(yyyyy) 1 time.

@tompng tompng force-pushed the ntt_mult_div branch 7 times, most recently from 2752dee to 8a5780b Compare August 25, 2025 11:24
Performs ntt with three primes (29<<27|1, 26<<27|1, 24<<27|1)
@tompng tompng force-pushed the ntt_mult_div branch 2 times, most recently from fabc2d2 to 236c1ff Compare August 27, 2025 16:57
Improve performance of huge divisions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant