Minhao Fan

MINHAO001@e.ntu.edu.sg

Hi, I'm Minhao Fan. I am currently a PhD student at Nanyang Technological University(NTU) supervised by Prof. Weichen Liu. I received my B.S degree in Intelligence Science and Technology from Peking University in 2020. With my previous research focusing on data mining and computer vision related topics, my current research interests lie primarily in Agentic AI and Efficient Multimodal Large Language Model(MLLM). I focus on minimizing the effort required to solve complex real-world problems with MLLM-based agentic flow, enabling the system to operate more autonomously with minimal prompt engineering expertise required from users. Efficient training and inference technologies for LLM such as PEFT, In-context Learning, Flash-Attention and KV Cache are also other aspects of my research. I had worked as a research intern at the Spatial and Temporal Restoration Understanding and Compression Team (STRUCT) at WICT under the supervision of Prof. Jiaying Liu. I had worked as a research intern at Vision CAIR at KAUST under the supervision of Prof. Mohamed Elhoseiny. I had also worked with the Data to Knowledge Lab at Rice University supervised by Prof. Xia (Ben) Hu.

Experience

Research Intern at Wangxuan Institute of Computer Technology

Supervised by Prof. Jiaying Liu

A Novel Benchmark for Low-light Enhancement

We present a comprehensive study and evaluation of existing single image low-light enhancement algorithms from the perspective of both human perception and machine vision. Beyond the traditional evaluations in the view of low-level vision, we make the first attempt to set and address a novel task, i.e. face detection in the low-light condition, to explore the potential of benefiting high-level vision tasks with image enhancement methods, both off-line and in an end-to-end manner.

Integrating Semantic Segmentation and Retinex Model for Low-Light Image Enhancement.

Based on the observation that various objects and backgrounds have different material, reflection and perspective attributes, regions of a single low-light image may require different adjustment and enhancement regarding contrast, illumination and noise. We propose an enhancement pipeline with three parts which effectively utilize the semantic layer information. Specifically, we extract the segmentation layer as well as the reflectance, and illumination, and concurrently enhance every separate region, i.g. sky, ground and objects for outdoor scenes.

September 2017 - June 2020

Remote Research Intern at King Abdullah University of Science and Technology (KAUST)

Supervised by Prof. Mohamed Elhoseiny

Multimodal Agentic Flow for Reasoning and Knowledge-based VQA

With the initial exploration to Visual Chain-of-Thought(VCoT) and prompt engineering/programming, we integrate the multimodal understanding property to a programmable prompting codebase. Then we conduct experiments on traditional and latest Visual Question Answering(VQA) benchmarks to spot the challenges of the increasingly complex multimodal tasks. With continuously updated tools such as Object Detection, Google Lens, Wikipedia Retrieval, we are now developing an agentic flow to handle reasoning and knowledge-based VQA tasks with planning and reflexion.

February 2024 - August 2024

Professional Experiences

Intern of Machine Learning Algorithms

4Paradigm, Beijing

Applying Graph Neural Networks to Sales Volume Forecasting

Intern Researcher of Computer Vision

SenseTime, Shanghai

Portrait Matting on Mobile Devices: Towards App development with limited memory

Software Engineer

Chenyan Technology Co., Shanghai

Leaded the development of intelligent engineering design powered by artiffcial intelligence.

September 2020 - June 2023

Education

Peking University

Bachelor of Science

Department of Machine Intelligence

September 2016 - June 2020

Nanyang Technological University

PhD Student

Hardware & Embedded Systems Lab (HESL)

Supervisor: Prof. Weichen Liu

August 2024 - Present

Publications

Minhao Fan, Wenjing Wang, Wenhan Yang, & Jiaying Liu. Integrating Semantic Segmentation and Retinex Model for Low-Light Image Enhancement. ACM International Conference on Multimedia (ACM MM), 2020.

Jiaying Liu, Dejia Xu, Wenhan Yang, Minhao Fan, & Haofeng Huang. Benchmarking Low-Light Image Enhance-ment for Human Perception and Machine Intelligence. International Journal of Computer Vision (IJCV), 2020.

Course Projects

Ataxx Bot

I developed a bot using greedy algorithm for the game of Ataxx on Botzone in the course 'Introduction to Computation'.

The Strongest Projectile

This project includes a simulation program of a WeChat game : ‘The Strongest Projectile’. Also, we wrote a DQN algorithm which taught the agent to play the game. (Ref: https://github.com/RuntianZ/IRL)

Group Based File Management System

A simple file server which enables users to manage their files as well as setup groups with friends for file sharing. (Ref: https://github.com/XFW-go/PKU-Web-Project)

M3-VQA: A Novel Pipeline for Multilingual and Multimodal BioMedical VQA.

M3-VQA leverages translation for multilingual inputs, retrieval augmented generation (RAG) for knowledge grounding, and in-context learning (ICL) with Chain-of-Thought prompting for accurate reasoning. (Ref: https://github.com/AmuroEita/M3-VQA/tree/main)

LLM for SVG Image Generation and Edit

We exploring the SVG generation and edit ability of LLMs and trying to improve the performance by in-context learning and Lora fine-tuning. (Ref: https://github.com/XFW-go/LLM4SVG_Gen_Edit)

Awards & Certifications

May 4th Schorlarship, Peking University, 2017
Third Prize in the 16th Annual ACM Competition at Peking University, 2017
Third Prize in the 17th Annual ACM Competition at Peking University, 2018
A Low-Illumination Face Detection Method Based on Multi-Feature Fusion and A Low-Illumination Face Detection Network, Jiaying Liu, Dejia Xu, Wenhan Yang, Minhao Fan. (IPC publication G06K 9/00, Application number 201910813847.4, Peking University) 2019

Skills

Programming Languages & Tools

Python
C++
HTML/CSS/JavaScript
Tensorflow
Pytorch