Jinkun's Homepage

Education

BEng Software Engineering

2012.09 - 2016.07

I enjoyed a wonderful four-year time in BUAA, where I met many talented teachers and classmates and learned a lot from them. The four-year study led me into the fascinating world of computer science and software engineering. I graduated from BUAA, ranking top 1 among 134 students.

MSc Computer Science

Tsinghua University (THU)

2016.09 - 2019.07

I was recommended for admission to THU for my master degree, where I work with Professor Dan Li and focus on the interdisciplinary area of Network & AI.

PhD Computer Science

Stanford University

2019 - ?

I am enrolled in Computer Science Department of Stanford University in 2019, and I am pursuing my PhD degree. Currently, I am advised by Professor Balaji Prabhakar, and co-advised by Professor Anirudh Sivaraman and Professor Mendel Rosenblum.

Research Projects

Nezha: Deployable and High-Performance Consensus Using Synchronized Clocks

PhD@Stanford

2021.03 - Present

Develop a common primitive based on synchronized clocks, deadline-ordered multicast (DOM)
Develop a protocol based on DOM, which is called Nezha. Nezha has proved to outperform the typical consensus protocols, including Multi-Paxos, Raft, NOPaxos, Fast Paxos, EPaxos, etc.
Nezha is open-sourced here, and we plan to integrated Nezha with some industrial systems which require high-performance consensus.

CloudEx: Next Generation of High-Frequency Financial Trading System in Cloud

PhD@Stanford

2019.08 - Present

Integrate the Huygens synchronized clock API to implement fair gaming mechanism for different traders
Implement and keep optimizing the main features of CloudEx
Now I am focusing on the fault tolerance of CloudEx

Fela: Flexible and Elastic Distributed Machine Learning

Master@THU

2019.03 - 2019.08

Incorporate flexible parallelism and elastic tuning mechanism to accelerate distributed machine learning
Integrate a variety of scheduling policies (inspired by token scheduling) to optimize the DML performance
Implement the prototype atop PyTorch. A tricky virtual layer mechanism is designed and involved

R2-SVM: Large-Scale Support Vector Machine Acceleration

Master@THU

2018.06 - 2018.09

Re-design the tree-based structure of existing distributed solutions (e.g. Cascade-SVM, DC-SVM, etc) to large-scale SVM
Design interchangeable block rotation strategy to eliminate the skewed updates for Lagrange multipliers
Incorporate hybrid synchronous parallel mode to accelerate the convergence of algorithm

Rima: An RDMA-Accelerated Model-Parallelized Solution to Large-Scale Matrix Factorization

Master@THU

2017.12 - 2018.06

Re-design the architecture of distributed matrix factorization. Leverage ring-based architecture instead of PS-based architecture to eliminate the centralized bottleneck and data redundancy
Design one-step transformation strategy to halve the communication workload for large-scale matrix factorization
Design three partial randomness strategies to add more robustness to the algorithm
Overlap the disk I/O overheads with the communication/computation overheads
Conduct a comparative testbed experiment between Rima and DSGD

SmartPS: Accelerating Distributed Machine Learning by Smart Parameter Server

Master@THU

2018.03 - 2018.07

Design a new parameter abstraction (Parameter Unit) for distributed machine learning (DML)
Incorporate four strategies to accelerate DML under PS-based architecture, i.e. selective update, proactive update, straggler assistant and blocking unnecessary pushes
Incorporate priority-based transmission strategy to mitigate the performance gap between workers (especially for heterogeneous clusters), and conduct comparative evaluation with 17 VPSes on Aliyun.

HiPS: Hierarchical Parameter Synchronization in Large-Scale DML

Master@THU

2017.5 - 2018.03

Incorporate data center network (DCN) topology with distributed machine learning (DML) to boost the performance
Design high-efficient synchronization algorithms on top of server-centric topologies to better embrace the benefit of RDMA
Implement a prototype of BCube+HiPS in Tensorflow, and conduct comparative experiments with a 2-layer BCube testbed

Hosepipe: Resource Management in Data Center

Master@THU

2016.10 - 2017.04

Design new schemes to control and manage bandwidth resources in multi-tenant data center environment
Implement a kernel module and test the performance in a 4-node cluster

LOS: High Performance and Strong Compatible User-level Network Stack

Master@THU

2015.10 - 2017.05

Design and implement a user-level stack based on DPDK, to achieve high performance and strong compatibility
Implement user-level Netfilter with dynamic library
Port Nginx on LOS without changing the source code

Software Vulnerability Modeling

Intern@THU

2015.03 - 2015.07

Modeling software vulnerability and evaluate software trustworthiness
Extensive study with the common public vulnerability dataset

Database Intrusion Detection

Intern@THU

2014.08 - 2015.03

Design a novel clustering-based intrusion detection algorithm
Participate in the development of database intrusion detection system

Artificial Neural Network

Intern@BUAA

2013.05 - 2014.06

Study the basics of artificial neural network
Apply the typical intelligent algorithms (e.g. ANN, PSO, etc) to signal processing

Publications

Jinkun Geng, Anirudh Sivaraman, Balaji Prabhakar, Mendel Rosenblum. Nezha: Deployable and High-Performance Consensus Using Synchronized Clocks (Preprint, Repository)
49th International Conference on Very Large Data Bases (VLDB 2023)

Ahmad Ghalayini, Jinkun Geng, Vighnesh Sachidananda, Vinay Sriram, Yilong Geng, Balaji Prabhakar, Mendel Rosenblum, Anirudh Sivaraman. CloudEx: A Fair-Access Financial Exchange in the Cloud (PDF)
18th Workshop on Hot Topics in Operating Systems (HotOS 2021)

Jinkun Geng, Dan Li, Shuai Wang. Fela: Incorporating Flexible Parallelism and Elastic Tuning to Accelerate Large-Scale DML (Preprint, PPT)
36th IEEE International Conference on Data Engineering (ICDE 2020)

Jinkun Geng, Dan Li, Shuai Wang. Rima: An RDMA-Accelerated Model-Parallelized Solution to Large-Scale Matrix Factorization (PDF, PPT, Poster)
35th IEEE International Conference on Data Engineering (ICDE 2019)

Shuai Wang, Dan Li, Jinkun Geng. Geryon: Accelerating Distributed CNN Training by Network-Level Flow Scheduling (Preprint)
IEEE International Conference on Computer Communications(INFOCOM 2020)

Shuai Wang, Dan Li, Jinkun Geng, Yue Gu, Yang Cheng. Impact of Network Topology on the Performance of DML: Theoretical Analysis and Practical Factors (PDF)
IEEE International Conference on Computer Communications(INFOCOM 2019)

Songtao Wang, Dan Li, Yang Cheng, Jinkun Geng, Yanshu Wang, Shuai Wang, Shutao Xia, Jianping Wu. BML: A High-performance, Low-cost Gradient Synchronization Algorithm for DML Training (PDF, Poster)
Thirty-second Conference on Neural Information Processing Systems (NeurIPS 2018)

Songtao Wang, Dan Li, Yang Cheng, Jinkun Geng, Yanshu Wang, Shuai Wang, Shutao Xia, Jianping Wu. A Scalable, High-Performance, and Fault-Tolerant Network Architecture for Distributed Machine Learning. (PDF)
IEEE/ACM Transactions on Networking (2020).

Kaihui Gao, Dan Li, Li Chen, Jinkun Geng, Fei Gui, Yang Cheng, Yue Gu. Incorporating Intra-flow Dependencies and Inter-flow Correlations for Traffic Matrix Prediction (Preprint)
IEEE/ACM International Symposium on Quality of Service (IWQoS 2020)

Yang Cheng, Dan Li, Zhiyuan Guo, Binyao Jiang, Jiaxin Lin, Xi Fan, Jinkun Geng, Xinyi Yu, Wei Bai, Lei Qu, Ran Shu, Peng Cheng, Yongqiang Xiong, Jianping Wu. DLBooster: Boosting End-to-End Deep Learning Workflows with Offloading Data Preprocessing Pipelines (PDF)
48th International Conference on Parallel Processing (ICPP 2019)

Jinkun Geng, Dan Li, Yang Cheng, Shuai Wang, Junfeng Li. HiPS: Hierarchical Parameter Synchronization in Large-Scale Distributed Machine Learning (PDF, PPT)
ACM SIGCOMM 2018 Workshop on Network Meets AI & ML (NetAI 2018)

Jinkun Geng, Dan Li, Shuai Wang. Accelerating Distributed Machine Learning by Smart Parameter Server (PDF, PPT)
3rd Asia-Pacific Workshop on Networking (APNet 2019)

Jinkun Geng, Dan Li, Shuai Wang. ElasticPipe: An Efficient and Dynamic Model-Parallel Solution to DNN Training (PDF, PPT)
10th workshop on Scientific Cloud Computing (ScienceCloud'19)

Jinkun Geng, Dan Li, Shuai Wang. Horizontal or Vertical? A Hybrid Approach to Large-Scale Distributed Machine Learning (PDF, PPT)
1st Workshop on Converged Computing Infrastructure (CCIW'19)

Yukai Huang, Jinkun Geng, Du Lin, Bin Wang, Junfeng Li, Ruilin Ling, Dan Li. LOS: A High Performance and Compatible User-level Network Operating System (PDF, PPT)
1st Asia-Pacific Workshop on Networking (APNet 2017)

Kaihui Gao, Dan Li, Li Chen, Jinkun Geng, Fei Gui, Yang Cheng, Yue Gu. Predicting Traffic Demand Matrix by Considering Inter-flow Correlations (Preprint)
The First IEEE INFOCOM Workshop on Networking Algorithms (WNA)

Kaihui Gao, Dan Li, Li Chen, Jinkun Geng, Fei Gui, Yang Cheng, Yue Gu. Incorporating Intra-flow Dependencies and Inter-flow Correlations for Traffic Matrix Prediction. (PDF, PPT)
IEEE/ACM International Symposium on Quality of Service (IWQoS 2020)

Best Runner-Up! Junfeng Li, Dan Li, Wenfei Wu, K. K. Ramakrishnan Jinkun Geng, Fei Gui, Fanzhao Wang, Kai Zheng. Sphinx: A Transport Protocol for High-Speed and Lossy Mobile Networks (PDF)
38th IEEE International Performance Computing and Communications Conference (IPCCC 2019)

Jinkun Geng. CODE: Incorporating Correlation and Dependency for Task Scheduling in Data Center (PDF, PPT)
15th IEEE International Symposium on Parallel and Distributed Processing with Applications (IEEE ISPA 2017)

Yang Cheng, Jinkun Geng, Yanshu, Wang, Junfeng, Li, Dan Li, Jianping Wu. Bridging Machine Learning and Computer Network Researches: A Survey (PDF)
CCF Transactions on Networking

Junfeng Li, Dan Li, Yirong, Yu, Yukai, Huang, Jing, Zhu, Jinkun Geng. Towards full virtualization of SDN infrastructure (PDF)
Computer Networks

Zhetao Li, Fei Gui, Jinkun Geng, Dan Li, Zhibo Wang, Junfeng Li, Yang Cheng, Usama Zafar. Dante: Enabling FOV-Aware Adaptive FEC Coding for 360-Degree Video Streaming (PDF, PPT)
2nd Asia-Pacific Workshop on Networking (APNet 2018)

Dan Li, Jinkun Geng. A Survey on Large-Scale Machine Learning Network (in Chinese：大规模机器学习网络研究) (PDF)
Communications of the CCF

Jinkun Geng, Dan Li. Documentary Report on APNet'18 (in Chinese：第二届亚太地区网络研讨会 (APNet 2018)全程纪实) (PDF)
IEEE Computer (Chinese version)

Jinkun Geng, Daren, Ye, Ping Luo, Pin Lv. A Novel Clustering Algorithm for Database Anomaly Detection
EAI SecureComm Workshop on Applications and Techniques in Information Security (ATCS), 2015

Talks

Nezha: A High-Performance Consensus Protocol Using Accurately Synchronized Clocks
Stanford Platform Lab Winter Review, 2022

CloudEx: Building a Jitter-free Financial Exchange in the Cloud
Stanford Platform Lab Seminar, 2020

Fela: Incorporating Flexible Parallelism and Elastic Tuning to Accelerate Large-Scale DML
IEEE International Conference on Data Engineering (ICDE 2020)

Accelerating Distributed Machine Learning by Smart Parameter Server
3rd Asia-Pacific Workshop on Networking (APNet 2019)

Some Perspective on Large-Scale Distributed Machine Learning (in Chinese: 大规模分布式机器学习之我见) (PPT)
Trusted Cloud Summit (可信云大会)

ElasticPipe: An Efficient and Dynamic Model-Parallel Solution to DNN Training
10th workshop on Scientific Cloud Computing (ScienceCloud'19)

Horizontal or Vertical? A Hybrid Approach to Large-Scale Distributed Machine Learning
1st Workshop on Converged Computing Infrastructure (CCIW'19)

Rima: An RDMA-Accelerated Model-Parallelized Solution to Large-Scale Matrix Factorization
35th IEEE International Conference on Data Engineering (ICDE 2019)

HiPS: Hierarchical Parameter Synchronization in Large-Scale Distributed Machine Learning
ACM SIGCOMM 2018 Workshop on Network Meets AI & ML (NetAI 2018)

LOS: A High Performance and Compatible User-level Network Operating System
1st Asia-Pacific Workshop on Networking (APNet 2017)

CODE: Incorporating Correlation and Dependency for Task Scheduling in Data Center
15th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA 2017)

Work Experiences

Clockwork

Palo Alto

Software Engineer

2022.06 – 2022.09

Refactor the codebase of Nezha and prepare for open-sourcing.
Test and evaluate Nezha.
Explore the application of clock synchronization in new areas (concurrentcy control).

Google

Mountain View (Virtual)

Software Engineer

2020.06 – 2020.09

Mentor: Hui Su

Optimize video encoding performance
Implement Warped transformation for motion vector predictio
Implement DBSCAN algorithm for clustering motion vectors.

Lopodo.com

Beijing

Web Developer

2015.08 – 2015.11

Lopodo.com is a high-tech company majoring in eliminating poisonous emission and creating healthy indoor environment for modern people.
I cooperate with three other software engineers to develop and maintain the official website for Lopodo.
I also develop a mini E-shop based on Wechat platform for Lopodo.

Ransetee.com

Beijing

Web Developer

2015.04 – 2015.07

Ransetee is a incubator project started in CAFA, which aims to bridge artistic designer and customers and establish a commercial ecosystem for personalized clothing and accessories.
I cooperate with other software engineers to develop and maintain the official website for Ransetee.
I also conduct some optimization with the content loading algorithm in the website.

Rewards

APNet Student Travel Grant@APNet Committee (2019) (2019.07)

Outstanding Master Graduate in Tsinghua (3 out of 141) @THU (2019.07)

Best Case Award in 2nd Cloud Max Performance Innovation@ CAICT (2019.06)

Outstanding Master Dissertation in Tsinghua (7 out of 141) (PDF) @THU (2019.06)

HPDC Student Travel Grant@HPDC Committee (2019) (2019.06)

ICDE Student Travel Awards@ICDE Committee (2019) (2019.03)

National Scholarship@Ministry of Education (2018.10)

Outstanding Graduate@BUAA (2016.07)

Samsung Enterprise Scholarship@Samsung Group (2014.10)

National Scholarship@Ministry of Education (2013.08)

First Prize in NECCS @TEFL China (2012.12)