Xinyu Huang (黄新宇)
I am a fourth-year Ph.D. student at the School of Computer Science Fudan University, advised by
Prof. Rui Feng and Prof. Yuejie Zhang. Meanwhile, I'm also a research intern at OPPO Research Institute, supervised by Researcher Youcai Zhang. I was fortunate to work with Prof. Yandong Guo and Prof. Lei Zhang.
My research interests include computer vision and multi-modality. I created the Recognize Anything Model (RAM) Family, which is a series of open-source and powerful image recognition models.
I expect to graduate at June 2025. I am opening to both academic positions and industrial research positions. Kindly download my Resume, and do not hesitate to email me if you're interested :)
Email  / 
Scholar
 / 
Github  / 
Zhihu
|
|
Research (* indicates equal contribution)
|
|
Recognize Anything Plus Model (RAM++)
Open-Set Image Tagging with Multi-Grained Text Supervision
Xinyu Huang, Yi-Jie Huang, Youcai Zhang, Weiwei Tian, Rui Feng, Yuejie Zhang, Yanchun Xie, Yaqian Li, Lei Zhang
Arxiv,
2023
arXiv
/
code
RAM++ is the next generation of RAM, which can recognize any category with high accuracy, including both predefined common categories and diverse open-set categories.
|
|
Recognize Anything Model (RAM)
Recognize Anything: A Strong Image Tagging Model
Youcai Zhang*,
Xinyu Huang*,
Jinyu Ma*, Zhaoyang Li*, Zhaochuan Luo, Yanchun Xie, Yuzhuo Qin, Tong Luo, Yaqian Li, Shilong Liu, Yandong Guo, Lei Zhang
CVPR 2024, Multimodal Foundation Models Workshop
project page
/
arXiv
/
demo
/
code
RAM is an image tagging model, which can recognize any common category with high accuracy.
|
|
Tag2Text Vision-Language Model
Tag2Text: Guiding Vision-Language Model via Image Tagging
Xinyu Huang,
Youcai Zhang,
Jinyu Ma,
Weiwei Tian,
Rui Feng,
Yuejie Zhang,
Yaqian Li,
Yandong Guo,
Lei Zhang
ICLR 2024
project page
/
arXiv
/
demo
/
code
Tag2Text is a vision-language model guided by tagging, which can support tagging and comprehensive captioning simultaneously.
|
|
IDEA: Increasing Text Diversity via Online Multi-Label Recognition for Vision-Language Pre-training
Xinyu Huang,
Youcai Zhang,
Ying Cheng,
Weiwei Tian,
Ruiwei Zhao,
Rui Feng,
Yuejie Zhang,
Yaqian Li,
Yandong Guo,
Xiaobo Zhang
ACM MM,
2022
arXiv
/
code
We propose IDEA to provide more explicit textual supervision (including multiple valuable tags and texts composed by multiple tags) for visual models.
|
|
Simple and Robust Loss Design for Multi-Label Learning with Missing Labels
Youcai Zhang*,
Yuhao Cheng*,
Xinyu Huang*,
Fei Wen,
Rui Feng,
Yaqian Li,
Yandong Guo
Arxiv,
2021
arXiv
/
code
Multi-label learning in the presence of missing labels(MLML) is a challenging problem. We propose two simple yet effective methods via robust loss design based on an observation.
|
|
Recognize Anything Family
Project Creator/Owner
2.4K+ stars!
We provide Recognize Anything Model Family (RAM) demonstrating superior image recognition ability!
|
|
RAM-Grounded-SAM
Project Co-Leader
14K+ stars!
RAM Faimly marry Grounded-SAM, which can automatically recognize, detect, and segment for an image! RAM Family showcases powerful image recognition capabilities!
|
|