A comprehensive, production-ready Fast Fourier Transform (FFT) library with automatic algorithm selection, GPU acceleration, and cross-platform support.
- Automatic Algorithm Selection: New intelligent API that chooses the best algorithm
- GPU Acceleration: CUDA support for NVIDIA GPUs and Metal Performance Shaders for Apple Silicon
- Redesigned API: Simplified interface with
fft_auto()
for ease of use - Cross-Platform: Full compatibility across Linux, macOS, and Windows
- Bug Fixes: Critical Bluestein algorithm fix for prime-sized transforms
- Radix-2 DIT/DIF: Classic Cooley-Tukey implementations
- Radix-4: Higher radix for 25% fewer operations
- Split-Radix: Optimal operation count
- Bluestein: Arbitrary size FFTs (with v2.0.0 bug fixes)
- Mixed-Radix: Efficient for composite sizes
- Automatic Selection: Let the library choose the best algorithm
- NVIDIA CUDA: Massive speedups on NVIDIA GPUs
- Apple Metal: Optimized for M1/M2/M3 processors
- Automatic GPU Detection: Falls back to CPU if GPU unavailable
- Audio spectrum analysis with windowing
- Digital filtering (low-pass, high-pass, band-pass)
- Fast convolution
- Power spectrum estimation
- 2D image FFT processing
- SIMD vectorization (SSE, AVX, AVX-512, NEON)
- Multi-threaded execution with OpenMP
- Cache-optimized memory access
- Fixed-point arithmetic for embedded systems
#include <fft_auto.h>
int main() {
int n = 1024;
complex_t* signal = fft_alloc_complex(n);
// Generate signal
for (int i = 0; i < n; i++) {
signal[i] = sin(2 * PI * 50 * i / 1000.0); // 50 Hz
}
// Automatic FFT - chooses best algorithm and uses GPU if available
fft_auto(signal, signal, n, -1); // -1 for forward FFT
// Process results...
fft_free(signal);
return 0;
}
// Create optimized plan
fft_plan_t plan = fft_plan_dft_1d(n, signal, signal, -1,
FFT_MEASURE | FFT_PREFER_GPU);
// Execute multiple times with same plan
for (int i = 0; i < 1000; i++) {
generate_signal(signal, n);
fft_execute(plan);
process_spectrum(signal, n);
}
fft_destroy_plan(plan);
- C compiler with C99 support (GCC 4.8+, Clang 3.4+, MSVC 2015+)
- Optional: CUDA Toolkit 11.0+ for NVIDIA GPU support
- Optional: Xcode 12+ for Metal support on macOS
# Clone repository
git clone https://github.com/muditbhargava66/FFT-implementation-in-C.git
cd FFT-implementation-in-C
# Quick build
./quickstart.sh
# Or manual build
make all # Build everything
make gpu-demo # Build GPU demonstrations
make install # Install system-wide
# For OpenMP support
brew install gcc
export CC=gcc-13
# Metal support is automatic on Apple Silicon
# For CUDA support
# Install CUDA Toolkit from NVIDIA
# Build with GPU support
make all
# Use WSL or MinGW
# Visual Studio project coming soon
#include <fft_auto.h>
// Allocate aligned memory
complex_t* data = fft_alloc_complex(1024);
// Perform FFT with automatic optimization
fft_auto(data, data, 1024, FFT_FORWARD);
// For inverse FFT
fft_auto(data, data, 1024, FFT_INVERSE);
// Free memory
fft_free(data);
// Check GPU availability
if (fft_gpu_available()) {
printf("GPU: %s\n", fft_gpu_get_device_name());
}
// Force GPU usage
fft_plan_t plan = fft_plan_dft_1d(n, in, out, -1, FFT_PREFER_GPU);
double* real_signal = fft_alloc_real(1024);
complex_t* spectrum = fft_alloc_complex(513); // n/2 + 1
fft_plan_t plan = fft_plan_r2c_1d(1024, real_signal, spectrum, FFT_ESTIMATE);
fft_execute(plan);
Size | CPU (AVX2) | GPU (CUDA) | Speedup |
---|---|---|---|
1K | 0.08 ms | 0.02 ms | 4x |
16K | 1.8 ms | 0.15 ms | 12x |
256K | 35 ms | 1.2 ms | 29x |
1M | 150 ms | 4.5 ms | 33x |
Size | CPU (NEON) | GPU (Metal) | Speedup |
---|---|---|---|
1K | 0.06 ms | 0.03 ms | 2x |
16K | 1.2 ms | 0.20 ms | 6x |
256K | 28 ms | 2.1 ms | 13x |
Full documentation is available at: https://fft-implementation-in-c.readthedocs.io/
We welcome contributions! See CONTRIBUTING.md for guidelines.
- Additional GPU backends (OpenCL, ROCm)
- More algorithms (Prime Factor, Winograd)
- Language bindings (Python, Julia, Rust)
- Performance optimizations
The v2.0.0 API is mostly backward compatible. Key changes:
// Old API (v1.x)
radix2_dit_fft(signal, n, FFT_FORWARD);
// New API (v2.0) - automatic optimization
fft_auto(signal, signal, n, -1);
// Or use planning for repeated transforms
fft_plan_t plan = fft_plan_dft_1d(n, signal, signal, -1, FFT_MEASURE);
fft_execute(plan);
- Original FFT algorithm by Cooley and Tukey
- Inspired by FFTW's planning approach
- GPU implementations based on cuFFT and Metal Performance Shaders
- Community contributors and testers
Note: This is a high-performance library suitable for production use. For educational purposes, explore the algorithms/core/
directory for well-documented implementations.
βοΈ Star the repo and consider contributing!
π« Contact: @muditbhargava66 π Report Issues: Issue Tracker π Documentation π¬ Discussions
Β© 2025 Mudit Bhargava. MIT License