Xinyu Huang (黄新宇)

I am a fourth-year Ph.D. student at the School of Computer Science Fudan University, advised by Prof. Rui Feng and Prof. Yuejie Zhang. Meanwhile, I'm also a research intern at OPPO Research Institute, supervised by Researcher Youcai Zhang. I was fortunate to work with Prof. Yandong Guo and Prof. Lei Zhang.

My research interests include computer vision and multi-modality. I created the Recognize Anything Model (RAM) Family, which is a series of open-source and powerful image recognition models.

Email  /  Scholar  /  Github  /  Zhihu

profile photo
Research (* indicates equal contribution)
tag2text Recognize Anything Plus Model (RAM++)
Open-Set Image Tagging with Multi-Grained Text Supervision

Xinyu Huang, Yi-Jie Huang, Youcai Zhang, Weiwei Tian, Rui Feng, Yuejie Zhang, Yanchun Xie, Yaqian Li, Lei Zhang
Arxiv, 2023
arXiv / code

RAM++ is the next generation of RAM, which can recognize any category with high accuracy, including both predefined common categories and diverse open-set categories.

tag2text Recognize Anything Model (RAM)
Recognize Anything: A Strong Image Tagging Model

Youcai Zhang*, Xinyu Huang*, Jinyu Ma*, Zhaoyang Li*, Zhaochuan Luo, Yanchun Xie, Yuzhuo Qin, Tong Luo, Yaqian Li, Shilong Liu, Yandong Guo, Lei Zhang
CVPR 2024, Multimodal Foundation Models Workshop
project page / arXiv / demo / code

RAM is an image tagging model, which can recognize any common category with high accuracy.

tag2text Tag2Text Vision-Language Model
Tag2Text: Guiding Vision-Language Model via Image Tagging

Xinyu Huang, Youcai Zhang, Jinyu Ma, Weiwei Tian, Rui Feng, Yuejie Zhang, Yaqian Li, Yandong Guo, Lei Zhang
ICLR 2024
project page / arXiv / demo / code

Tag2Text is a vision-language model guided by tagging, which can support tagging and comprehensive captioning simultaneously.

idea IDEA: Increasing Text Diversity via Online Multi-Label Recognition for Vision-Language Pre-training
Xinyu Huang, Youcai Zhang, Ying Cheng, Weiwei Tian, Ruiwei Zhao, Rui Feng, Yuejie Zhang, Yaqian Li, Yandong Guo, Xiaobo Zhang
ACM MM, 2022
arXiv / code

We propose IDEA to provide more explicit textual supervision (including multiple valuable tags and texts composed by multiple tags) for visual models.

mlml Simple and Robust Loss Design for Multi-Label Learning with Missing Labels
Youcai Zhang*, Yuhao Cheng*, Xinyu Huang*, Fei Wen, Rui Feng, Yaqian Li, Yandong Guo
Arxiv, 2021
arXiv / code

Multi-label learning in the presence of missing labels(MLML) is a challenging problem. We propose two simple yet effective methods via robust loss design based on an observation.

Projects & Resources
tag2text Recognize Anything Family
Project Creator/Owner
2K+ stars!

We provide Recognize Anything Model Family (RAM) demonstrating superior image recognition ability!

tag2text RAM-Grounded-SAM
Project Co-Leader
10K+ stars!

RAM Faimly marry Grounded-SAM, which can automatically recognize, detect, and segment for an image! RAM Family showcases powerful image recognition capabilities!