Nezha: Deployable and High-Performance Consensus Using Synchronized Clocks
- Develop a common primitive based on synchronized clocks, deadline-ordered multicast
(DOM)
- Develop a protocol based on DOM, which is called Nezha. Nezha has proved to outperform
the typical consensus protocols,
including Multi-Paxos, Raft, NOPaxos, Fast Paxos, EPaxos, etc.
- Nezha is open-sourced here,
and we plan to integrated Nezha with some industrial systems which require
high-performance consensus.
CloudEx: Fair-Access Financial Trading System in Cloud
- Implement fair trading mechanism in CloudEx, and provide user-friendly APIs
- Performance optimization for CloudEx
- Implement fault tolerance for CloudEx
Rima: An RDMA-Accelerated Model-Parallelized Solution to Large-Scale Matrix
Factorization
- Re-design the architecture of distributed matrix factorization. Leverage ring-based
architecture instead of PS-based architecture to eliminate the centralized bottleneck
and data redundancy
- Design one-step transformation strategy to halve the communication workload for
large-scale matrix factorization
- Design three partial randomness strategies to add more robustness to the algorithm
- Overlap the disk I/O overheads with the communication/computation overheads
- Conduct a comparative testbed experiment between Rima and DSGD
HiPS: Hierarchical Parameter Synchronization in Large-Scale Data Center Network
- Incorporate data center network (DCN) topology with distributed machine learning (DML)
to boost the performance
- Design high-efficient synchronization algorithms on top of server-centric topologies to
better embrace the benefit of RDMA
- Implement a prototype of BCube+HiPS in Tensorflow, and conduct comparative experiments
with a 2-layer BCube testbed
LOS: High Performance and Strong Compatible User-level Network Stack
- Design and implement a user-level stack based on DPDK, to achieve high performance and
strong compatibility
- Implement user-level Netfilter with dynamic library
- Port Nginx on LOS without changing the source code