-
Real-Time Cryptocurrency Trade Correlation Engine: A High-Performance C++ Implementation
Production-quality C++ system for real-time cryptocurrency trade aggregation and correlation detection across multiple exchanges
-
From 0.37x to 18.7x: Building a High-Performance SIMD Library with AVX-512 Speedups in Data Science, Inference, & HPC Workloads
A comprehensive technical journey through building a high-performance SIMD library, achieving extraordinary speedups through masked operations, multiple data types, and advanced CPU feature detection.
-
Lock-Free Queues with Advanced Memory Reclamation: A Deep Dive into Epoch-Based Reclamation and Hazard Pointers
Understanding how modern concurrent systems solve the memory reclamation problem in lock-free data structures
-
From 245s to 0.37s: Optimizing an MPI Traveling Salesman Solver
A comprehensive technical journey through four iterations of MPI-based TSP solver optimization, achieving a 635× performance improvement through algorithmic enhancements, hybrid parallelization, and careful engineering.
-
Level 3 mini_malloc: A Security-Enhanced Memory Allocator with Debugging Features
Technical deep-dive into mini_malloc - a memory allocator showcasing security-enhanced design patterns and debugging infrastructure. Demonstrates arena-based concurrency, immediate coalescing, dual allocation strategies, and corruption detection mechanisms. Features complete implementation (~800 lines), comprehensive test coverage, and detailed performance analysis comparing against system malloc.