Hi, I'm Minhao Fan. I am currently a PhD student at Nanyang Technological University(NTU) supervised by Prof. Weichen Liu. I received my B.S degree in Intelligence Science and Technology from Peking University in 2020. With my previous research focusing on data mining and computer vision related topics, my current research interests lie primarily in Agentic AI and Efficient Multimodal Large Language Model(MLLM). I focus on minimizing the effort required to solve complex real-world problems with MLLM-based agentic flow, enabling the system to operate more autonomously with minimal prompt engineering expertise required from users. Efficient training and inference technologies for LLM such as PEFT, In-context Learning, Flash-Attention and KV Cache are also other aspects of my research. I had worked as a research intern at the Spatial and Temporal Restoration Understanding and Compression Team (STRUCT) at WICT under the supervision of Prof. Jiaying Liu. I had worked as a research intern at Vision CAIR at KAUST under the supervision of Prof. Mohamed Elhoseiny. I had also worked with the Data to Knowledge Lab at Rice University supervised by Prof. Xia (Ben) Hu.
We present a comprehensive study and evaluation of existing single image low-light enhancement algorithms from the perspective of both human perception and machine vision. Beyond the traditional evaluations in the view of low-level vision, we make the first attempt to set and address a novel task, i.e. face detection in the low-light condition, to explore the potential of benefiting high-level vision tasks with image enhancement methods, both off-line and in an end-to-end manner.
Based on the observation that various objects and backgrounds have different material, reflection and perspective attributes, regions of a single low-light image may require different adjustment and enhancement regarding contrast, illumination and noise. We propose an enhancement pipeline with three parts which effectively utilize the semantic layer information. Specifically, we extract the segmentation layer as well as the reflectance, and illumination, and concurrently enhance every separate region, i.g. sky, ground and objects for outdoor scenes.
With the initial exploration to Visual Chain-of-Thought(VCoT) and prompt engineering/programming, we integrate the multimodal understanding property to a programmable prompting codebase. Then we conduct experiments on traditional and latest Visual Question Answering(VQA) benchmarks to spot the challenges of the increasingly complex multimodal tasks. With continuously updated tools such as Object Detection, Google Lens, Wikipedia Retrieval, we are now developing an agentic flow to handle reasoning and knowledge-based VQA tasks with planning and reflexion.
Applying Graph Neural Networks to Sales Volume Forecasting
Portrait Matting on Mobile Devices: Towards App development with limited memory
Leaded the development of intelligent engineering design powered by artiffcial intelligence.
Minhao Fan, Wenjing Wang, Wenhan Yang, & Jiaying Liu. Integrating Semantic Segmentation and Retinex Model for Low-Light Image Enhancement. ACM International Conference on Multimedia (ACM MM), 2020.
Jiaying Liu, Dejia Xu, Wenhan Yang, Minhao Fan, & Haofeng Huang. Benchmarking Low-Light Image Enhance-ment for Human Perception and Machine Intelligence. International Journal of Computer Vision (IJCV), 2020.
I developed a bot using greedy algorithm for the game of Ataxx on Botzone in the course 'Introduction to Computation'.
This project includes a simulation program of a WeChat game : ‘The Strongest Projectile’. Also, we wrote a DQN algorithm which taught the agent to play the game. (Ref: https://github.com/RuntianZ/IRL)
A simple file server which enables users to manage their files as well as setup groups with friends for file sharing. (Ref: https://github.com/XFW-go/PKU-Web-Project)
M3-VQA leverages translation for multilingual inputs, retrieval augmented generation (RAG) for knowledge grounding, and in-context learning (ICL) with Chain-of-Thought prompting for accurate reasoning. (Ref: https://github.com/AmuroEita/M3-VQA/tree/main)
We exploring the SVG generation and edit ability of LLMs and trying to improve the performance by in-context learning and Lora fine-tuning. (Ref: https://github.com/XFW-go/LLM4SVG_Gen_Edit)