About the Project
Engineered a highly scalable, distributed algorithm for approximating the Max-Cut problem on large-scale graphs utilizing the Cross-Entropy (CE) method. The solution deployed a hybrid MPI+CUDA architecture based on a distributed 'island model' across the AiMOS supercomputer. Each MPI rank managed an independent CE process on NVIDIA V100 GPUs, leveraging massively parallel kernels to sample and evaluate millions of candidate partitions per second. The implementation incorporated bitwise optimizations for extreme kernel acceleration alongside periodic synchronization strategies to maintain global search efficiency and prevent local optima stagnation. The project achieved near-linear speedup in strong scaling tests up to 16 GPUs and successfully approximated solutions within 1.12% of the best-known results for the G-Set benchmark.