About Me

Dr. Qi Zheng is an Assistant Professor at the College of Electronic and Information Engineering, Shenzhen University. She earned her bachelor’s and master’s degrees in information engineering from Huazhong University of Science and Technology, where she conducted research under the guidance of Wei Yuan and Xinge You, respectively. She then pursued her Ph.D. in computer science at the University of Sydney, working with Dacheng Tao, where she focused on multimodal learning.

Research Focus

Embodied Intelligence: Indoor Embodied Navigation & Manipulation
3D Human-Object Interaction (HOI): Reconstruction, Generation, Skill-learning

Join Our Team in Fall 2026

We have a limited number of openings for dedicated Master’s and Ph.D. students. If our research aligns with your ambitions, let’s build the future of intelligent interaction together.

News

Feb 2026: One paper on SLAM is accepted by TIM, congrats to Yuanbiao!
Apr 2025: HOI-TG is selected as a highlight for CVPR (<3%), congrats to Zhenrong!
Mar 2025: One paper on Point Cloud Completion is accepted by TMM, congrats to Junkang!
Feb 2025: One paper on HOI reconstruction is accepted by CVPR, congrats to Zhenrong!
Feb 2025: One paper on Visual-Inertial Odometry (VIO) is accepted by RA-L, congrats to Changshi!
Sep 2024: One paper on Image Paragraph Captioning is accepted by CVIU.
Jul 2024: One paper on Vision-Language Navigation (VLN) is accepted by IJCV.
Apr 2024: One paper on Embodied Planning is accepted by IJCAI, congrats to Kanxue!
Mar 2023: One paper on Video Representation Learning is accepted by CVPR as a Highlight, congrats to Heng!

Publications

(# indicates the student I advised or co-advised)

Bubble

Li D, Peng B, Li C, Qiao N, Zheng Q, et al. An Atomic Skill Library Construction Method for Data-Efficient Embodied Manipulation. [arXiv]

Journal

Yang Y#, Feng D, Zheng Q, Zhuang Y. VIR-Fusion: A Robust Visual-Inertial-UWB Fusion SLAM System for Corner Cases[J]. IEEE Transactions on Instrumentation and Measurement, 2026. [paper]
Ma J, Wang S, Zheng Q, and Mai X. Geometric Continuity and Consistency Learning for Self-Supervised Point Cloud Completion[J]. IEEE Transactions on Multimedia, 2025. [paper]
Mu C#, Feng D, Zheng Q, et al. A Robust and Efficient Visual-Inertial Initialization with Probabilistic Normal Epipolar Constraint[J]. IEEE Robotics and Automation Letters, 2025. [paper] [code]
Zheng Q, Liu D, Wang C, et al. Esceme: Vision-and-language navigation with episodic scene memory[J]. International Journal of Computer Vision, 2025, 133(1): 254-274. [paper] [code]
Zheng Q, Wang C, Wang D. Bypass network for semantics driven image paragraph captioning[J]. Computer Vision and Image Understanding, 2024, 249: 104154. [paper]
Zheng Q, Wang C Y, Wang D, et al. Visual superordinate abstraction for robust concept learning[J]. Machine Intelligence Research, 2023, 20(1): 79-91. [paper]
Zheng Q, Gong M, You X, et al. A unified B-spline framework for scale-invariant keypoint detection[J]. International Journal of Computer Vision, 2022, 130(3): 777-799. [paper] [code-MATLAB] [code-C++]
Zheng Q, Yu S, You X. Coarse-to-fine salient object detection with low-rank matrix recovery[J]. Neurocomputing, 2020, 376: 232-243. [paper]
Jiang Z, Yuan W, Leung H, You X, Zheng Q. Coalition formation and spectrum sharing of cooperative spectrum sensing participants[J]. IEEE transactions on cybernetics, 2016, 47(5): 1133-1146.

Conference

Wang Z#, Zheng Q, et al. End-to-End HOI Reconstruction Transformer with Graph-based Encoding. CVPR, 2025. Highlight Presentation [paper] [code]
Zheng Q. Cross-modal contrastive learning for robust reasoning in vqa[C]//2024 7th International Conference on Pattern Recognition and Artificial Intelligence (PRAI). IEEE, 2024: 905-913. [paper]
Li K#, Yu B, Zheng Q, et al. MuEP: A Multimodal Benchmark for Embodied Planning with Foundation Models [C]//Intemational Joint Conferences on Artificial Intelligence. IJCAI. 2024: 129-138. [paper] [code]
Zhang H, Liu D, Zheng Q, et al. Modeling video as stochastic processes for fine-grained video representation learning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 2225-2234. (Highlight) [paper] [code]
Zheng Q, Wang C, Tao D. Syntax-aware action targeting for video captioning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 13096-13105. [paper] [code]
Xu J, You X, Zheng Q, et al. Robust Multi-view Subspace Learning Through Structured Low-Rank Matrix Recovery[C]//PRCV 2018, Guangzhou, China, November 23-26, 2018: 427-439. [paper]
Yu C, Zhao X, Zheng Q, et al. Hierarchical bilinear pooling for fine-grained visual recognition[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 574-589. [paper] [code]
Zheng Q, Zhang P, You X, et al. Hierarchical learning for salient object detection[C]//International Conference on Security, Pattern Analysis, and Cybernetics (SPAC). IEEE, 2017: 192-197. [paper]
Zheng Q, Zhang P, You X. Saliency Detection by Compactness Diffusion[C]//BMVC. 2017. [paper]

Teaching

2026 Spring: An Introduction to Robotics
2025 Fall: Complex Functions and Field Theory
2025 Spring: An Introduction to Robotics
2024 Fall: Complex Functions and Field Theory
2024 Spring: An Introduction to Robotics