CHU WEI's HOMEPAGE

W. Xu, J. Wang, L. Xie, J. He, H. Zhou, T. Wang, X. Wan, J. Chen, C. Qu, and W. Chu (2024) LogicMP: A neuro-symbolic approach for encoding first-order logic constraints, ICLR 2024 (View Abstract)

Inf Team (2024) INF's open-source 34B large language models, Technical Report 2024 (View Abstract)

Inf Team (2024) Towards trustworthy large language models in industry domains, Technical Report 2024 (View Abstract)

W. Hong, W. Ren, J. Lao, L. Xie, L. Zhong, J. Wang, J. Chen, H. Liu, W. Chu (2024) Training object detectors from scratch: an empirical study in the era of vision transformer, Int. J. Comput. Vis. 132(8): 2929-2942 (View Abstract)

X. Dong, Q. Guo, T. Gan, Q. Wang, J. Wu, X. Ren, Y. Cheng, W. Chu (2024) SNP-S3: Shared network pre-training and significant semantic strengthening for various video-text tasks, IEEE Trans. Circuits Syst. Video Technol. 34(4): 2525-2535 (View Abstract)

X. Tan, B. Li, X. Qiu, J. Huang, Y. Xu, W. Chu (2024) Robust deep Hawkes Process under label noise of both event and occurrence, ECAI 2024: 2870-2877 (View Abstract)

S. Shi, X. Tan, X. Qiu, C. Qu, K. Nie, Y. Cheng, W. Chu, Y. Xu, Y. Qi (2024) ULMR: Unlearning large language models via negative response and model parameter average, EMNLP (Industry Track) 755-762 (View Abstract)

X. Tan, L. Cheng, X. Qiu, S. Shi, Y. Cheng, W. Chu, Y.Xu, Y. Qi (2024) Enhancing personalized headline generation via offline goal-conditioned reinforcement learning with large language models, KDD 2024: 5762-5772 (View Abstract)

Xiaoyu Tan, Leijun Cheng, Xihe Qiu, Shaojie Shi, Yuan Cheng, Wei Chu, Yinghui Xu, Yuan Qi (2024) Enhancing task performance in continual instruction fine-tuning through format uniformity, SIGIR 2024: 2384-2389 (View Abstract)

J. Wang, J. He, W. Xu, R. Li, and W. Chu (2023) Learning to discover various Simpson's paradoxes, KDD 2023: 5092-5103 (View Abstract)

X. Yan, T. Song, Y. Jiao, J. He, J. Wang, R. Li, and W. Chu (2023) Spatio-temporal hypergraph learning for next POI recommendation, SIGIR 2023: 403-412 (View Abstract)

J. Lao, W. Hong, X. Guo, Y. Zhang, W. Jian, J. Chen, and W. Chu (2023) Simultaneously short- and long-term temporal modeling for semi-supervised video semantic segmentation, CVPR 2023 (View Abstract)

T. Pan, F. Xu, X. Yang, S. He, C. Jiang, Q. Guo, F. Qian, X. Zhang, Y. Cheng, L. Yang, and W. Chu (2023) Boundary-aware backward-compatible representation via adversarial learning in image retrieval, CVPR 2023 (View Abstract)

C. Jiang, H. Liu, X. Yu, Q. Wang, Y. Cheng, J. Xu, Z. Liu, Q. Guo, W. Chu, M. Yang, and Y. Qi (2023) Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning, ACM Multimedia: 4626-4636 (View Abstract)

L. Zhang, X. Yan, J. He, R. Li, and W. Chu (2023) DRGCN: Dynamic evolving initial residual for deep graph convolutional networks, AAAI 2023 (View Abstract)

W. Li, C. Zou, M. Wang, F. Xu, J. Zhao, R. Zheng, Y. Cheng, and W. Chu (2023) DC-Former: Diverse and compact transformer for person re-identification, AAAI 2023 (View Abstract)

J. Zhang, Z. Li, H. Fang, J. Wu, Z. Shen, J. Zheng, W. Chu, W. Duan, and P. Xu (2023) China's First Natural Language-based AI ChatBot Trader, WWW (Companion Volume) 2023: 1253-1256 (View Abstract)

J. Xu, W. Xu, M. Sun, T. Wang, and W. Chu (2022) Extracting trigger-sharing events via an event matrix, Findings of the Association for Computational Linguistics: EMNLP 2022 (View Abstract)

Q. Guo, K. Yao, and W. Chu (2022) Switch-BERT: Learning to model multimodal interactions by switching attention and input, ECCV 2022: 330-346 (View Abstract)

T.-T. Liang, X. Chu, Y. Liu, Y. Wang, Z. Tang, W. Chu, J. Chen, and H. Ling (2022) CBNet: A composite backbone network architecture for object detection, IEEE Trans. Image Process 31: 6893-6906 (View Abstract)

W. Hong, J. Lao, W. Ren, J. Wang, J. Chen, W. Chu (2022) Training object detectors from scratch: An empirical study in the era of vision transformer, CVPR 2022: 4652-4661 (View Abstract)

H. Wang, T.-W. Chang, T. Liu, J. Huang, Z. Chen, C. Yu, R. Li, W. Chu (2022) ESCM2: Entire space counterfactual multi-task model for post-click conversion rate estimation, SIGIR 2022: 363-372 (View Abstract)

K. Ji, J. Liu, W. Hong, L. Zhong, J. Wang, J. Chen, W. Chu (2022) CRET: Cross-modal retrieval transformer for efficient text-video retrieval, SIGIR 2022: 949-959 (View Abstract)

M. Li, X. Lin, X. Chen, J. Chang, Q. Zhang, F. Wang, T. Wang, Z. Liu, W. Chu, D. Zhao and R. Yan (2022) Keywords and instances: A hierarchical contrastive learning framework unifying hybrid granularities for text generation, ACL 2022: 4432-4441 (View Abstract)

F. Yu, K. Huang, M. Wang, Y. Cheng, W. Chu, and C. Li (2022) Width & depth pruning for vision transformers, AAAI 2022: 3143-3151 (View Abstract)

H. Huang, Y. Wang, Z. Chen, Y. Zhang, Y. Li, Z. Tang, W. Chu, J. Chen, W. Lin, and K.-K. Ma (2022) CMUA-Watermark: A cross-model universal adversarial watermark for combating deepfakes, AAAI 2022: 989-997 (View Abstract)

L. Chao, J. He, T. Wang and W. Chu (2021) PairRE: Knowledge graph embeddings via paired relation vectors, ACL 2021: 4360-4369 (View Abstract)

F. Xu, M. Wang, W. Zhang, Y. Cheng and W. Chu (2021) Discrimination-aware mechanism for fine-grained representation learning, CVPR 2021: 813-822 (View Abstract)

Recently, with the emergence of retrieval requirements for certain individual in the same superclass, e.g., birds, persons, cars, fine-grained recognition task has attracted a significant amount of attention from academia and industry. In fine-grained recognition scenario, the inter-class differences are quite diverse and subtle, which makes it challenging to extract all the discriminative cues. Traditional training mechanism optimizes the overall discriminativeness of the whole feature. It may stop early when some feature elements has been trained to distinguish training samples well, leaving other elements insufficiently trained for a feature. This would result in a less generalizable feature extractor that only captures major discriminative cues and ignores subtle ones. Therefore, there is a need for a training mechanism that enforces the discriminativeness of all the elements in the feature to capture more the subtle visual cues. In this paper, we propose a Discrimination-Aware Mechanism (DAM) that iteratively identifies insufficiently trained elements and improves them. DAM is able to increase the number of well learned elements, which captures more visual cues by the feature extractor. In this way, a more informative representation is learned, which brings better generalization performance. We show that DAM can be easily applied to both proxy-based and pair-based loss functions, and thus can be used in most existing fine-grained recognition paradigms. Comprehensive experiments on CUB-200-2011, Cars196, Market-1501, and MSMT17 datasets demonstrate the advantages of our DAM based loss over the related state-of-the-art approaches.

W. Hong, P. Guo, W. Zhang, J. Chen and W. Chu (2021) LPSNet: A lightweight solution for fast panoptic segmentation, CVPR 2021: 16746-16754 (View Abstract)

W. Hong, K. Ji, J. Liu, J. Wang, J. Chen and W. Chu (2021) GilBERT: Generative vision-language pre-training for image-text retrieval, SIGIR 2021: 1379-1388 (View Abstract)

C. Jiang, K. Huang, S. He, X. Yang, W. Zhang, X. Zhang, Y. Cheng, L. Yang, Q. Wang, F. Xu, T. Pan and W. Chu (2021) Learning segment similarity and alignment in large-scale content based video retrieval, ACM MM 2021 (View Abstract)

K. Chen, W. Xu, X. Cheng, X. Zou, Y. Zhang, L. Song, T. Wang, Y. Qi and W. Chu (2020) Question directed graph attention network for numerical reasoning over text, EMNLP 2020:6759–6768 (View Abstract)

L. Chao, J. Chen and W. Chu (2020) Variational connectionist temporal classification, ECCV 2020:460-476 (View Abstract)

X. Cheng, W. Xu, K. Chen, T. Wang, S. Jiang, F. Wang, W. Chu and Y. Qi (2020) SpellGCN: Incorporating phonological and visual similarities into language models for Chinese Spelling Check, ACL 2020:871-881 (View Abstract)

X. Lin, W. Jian, J. He, T. Wang, and W. Chu (2020) Generating informative conversational response using recurrent knowledge-interaction and knowledge-copy, ACL 2020:41-52 (View Abstract)

F. Xu, W. Zhang, Y. Cheng and W. Chu (2020) Metric learning with equidistant and equidistributed triplet-based loss for product image search, WWW 2020:57-65 (View Abstract)

S. Wang, B. Zhu, C. Li, M. Wu, J. Zhang, W. Chu, and Y. Qi (2020) Riemannian proximal policy optimization, Computer and Information Science 13(3) (View Abstract)

W. Zhang, Y. Cheng, X. Guo, Q. Guo, J. Wang, Q. Wang, C. Jiang, M. Wang, F. Xu and W. Chu (2020) Automatic car damage assessment system: reading and understanding videos as professional insurance inspectors, AAAI 2020:13646-13647 Demonstration Track (View Abstract)

W. Huang, X. Cheng, K. Chen, T. Wang, W. Chu (2020) Towards fast and accurate neural Chinese word segmentation with multi-criteria learning, COLING 2020:2062-2072 (View Abstract)

C. Li, X. Yan, X. Deng, Y. Qi, W. Chu, L. Song, J. Qiao, J. He and J. Xiong (2019) Latent dirichlet allocation for Internet price war, AAAI 2019:639-646 (View Abstract)

X. Cheng, W. Xu, T. Wang, W. Chu, W. Huang, K. Chen and J. Hu (2019) Variational semi-supervised aspect-term sentiment analysis via transformer, CoNLL 2019:961-969 (View Abstract)

W. Huang, X. Cheng, T. Wang and W. Chu (2019) BERT-based multi-head selection for joint entity-relation extraction, NLPCC (2) 2019:713-723 (View Abstract)

W. Sui, Q. Zhang, J. Yang and W. Chu (2018) A novel integrated framework for learning both text detection and recognition, ICPR 2018:2233-2238 (View Abstract)

T. Yin, X. Deng, Y. Qi, W. Chu, J. Pan, X. Yan and J. Xiong (2018) Personalized behavior prediction with encoder-to-decoder structure, NAS 2018:1-10 (View Abstract)

J. Yu, M. Qiu, J. Jiang, J. Huang, S. Song, W. Chu and H. Chen (2018) Modelling domain relationships for transfer learning on retroeval-based question answering systems in E-commerce, ACM International Conference on Web Search and Data Mining (WSDM-11):682-690 (View Abstract)

M. Qiu, P. Zhao, K. Zhang, X. Shi, X. Wang, J. Huang and W. Chu (2017) A short-term rainfall prediction model using multi-task convolutional neural networks, IEEE International Conference on Data Mining (ICDM) (View Abstract)

F. Li et al. (2017) AliMe Assist: an intelligent assistant for creating an innovative E-commerce experience, ACM International Conference on Information and Knowledge Management (CIKM) (View Abstract) Best Demo Award

M. Qiu, F.-L. Li, S. Wang, X. Gao, Y. Chen, W. Zhao, H. Chen, J. Huang and W. Chu(2017) AliMe Chat: A Sequence to Sequence and Rerank based Chatbot Engine, Annual Meeting of the Association for Computational Linguistics (ACL-55 Short Paper) (View Abstract)

J. Yang, Y. Chen, S. Wang, L. Li, C. Meng, M. Qiu, W. Chu (2017) Practical lessons of distributed deep learning, Workshop on Principled Approaches to Deep Learning, at ICML (View Abstract)

B. Bi, H. Ma, B. Hsu, W. Chu, K. Wang and J. Cho (2015) Learning to recommend related entities to search users, ACM International Conference on Web Search and Data Mining (WSDM-08):139-148 (View Abstract)

J. Yan, W. Chu, R. W. White (2014) Cohort modeling for enhanced personalized search, ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-37) (View Abstract)

X. Li, C. Guo, W. Chu, Y. Wang, J. Shavlik (2014) Deep learning powered in-session contextual ranking using clickthrough data, Workshop on Personalization: Methods and Applications, at Neural Information Processing Systems (NIPS) (View Abstract)

H. Wang, X. He, M. Chang, Y. Song, R. W. White, W. Chu (2013) Personalized ranking model adaptation for web search, ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-36) (View Abstract)

R. W. White, W. Chu, A. Hassan, X. He, Y. Song, H. Wang (2013) Enhancing personalized search by mining and modeling task behavior, International World Wide Web Conference (WWW-22) (View Abstract)

H. Wang, Y. Song, M. Chang, X. He, R. W. White, W. Chu (2013) Learning to extract cross-session search tasks, International World Wide Web Conference (WWW-22):1353-1364 (View Abstract)

T. Moon, W. Chu, L. Li, Z. Zheng, Y. Chang (2012) An online learning framework for refining recency search results with user click feedback, Transactions on Information Systems 30(4) (View Abstract)

L. Li, W. Chu, J. Langford, T. Moon, and X. Wang (2012) An unbiased offline evaluation of contextual bandit algorithms with generalized linear models, Journal of Machine Learning Research - Workshop and Conference Proceedings 26 (JMLR W&CP-26) (View Abstract)

P. Bennett, R. W. White, W. Chu, S. Dumais, P. Bailey, F. Borisyuk and X. Cui (2012) Modeling and measuring the impact of short and long-term behavior on search personalization, ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-35) (View Abstract) Test of Time Award

W. Chu, M. Zinkevich, L. Li, A. Thomas, and B. Tseng (2011) Unbiased online active learning in data streams, ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-17) (View Abstract)

L. Zhang, J. Yang, W. Chu, and B. Tseng (2011) A machine-learned proactive moderation system for auction fraud detection, ACM Conference on Information Retrieval and Knowledge Management (CIKM-20 Short Paper) (View Abstract)

L. Li, W. Chu, J. Langford, and X. Wang (2011) Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms, ACM International Conference on Web Search and Data Mining (WSDM-04) 297-306 (View Abstract) Best Paper Award

W. Chu, L. Li, L. Reyzin, and R. E. Schapire (2011) Contextual bandits with linear payoff functions, International Conference on Artificial Intelligence and Statistics (AISTATS-14) (View Abstract)

T. Moon, L. Li, W. Chu, C. Liao, Z. Zheng, and Y. Chang (2010) Online learning for recency search ranking using real-time user feedback, International Conference on Information and Knowledge Management (CIKM-19 Short Paper) 1501-1504 (View Abstract)

L. Li, W. Chu, J. Langford, and R. E. Schapire (2010) A contextual-bandit approach to personalized news article recommendation, International World Wide Web Conference (WWW-19) 661-670 (View Abstract) Seoul Test of Time Award

S.-T. Park and W. Chu (2009) Pairwise preference regression for cold-start recommendation, ACM Recommender Systems (RecSys-03):21-28 (View Abstract)

W. Chu and Z. Ghahramani (2009) Probabilistic models for incomplete multi-dimensional arrays, International Conference on Artificial Intelligence and Statistics (AISTATS-12):89-96 (View Abstract)

W. Chu and S.-T. Park (2009) Personalized recommendation on dynamic content using predictive bilinear models, International World Wide Web Conference (WWW-18):692-700 (View Abstract)

W. Chu, et al. (2009) A case study of behavior-driven conjoint analysis on Yahoo! Front Page Today Module, ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-15 Industry Track):1097-1104 (View Abstract)

R. Silva, W. Chu and Z. Ghahramani (2007) Hidden common cause relations in relational learning, Neural Information Processing Systems (NIPS-20):1345-1352 (View Abstract)

K. Yu and W. Chu (2007) Gaussian process models for link analysis and transfer learning, Neural Information Processing Systems (NIPS-20):1657-1664 (View Abstract)

P. K. Shivaswamy, W. Chu and M. Jansche (2007) A support vector approach to censored targets, IEEE International Conference on Data Mining (ICDM-07):655-660 (View Abstract)

W. Chu and S. S. Keerthi (2007) Support vector ordinal regression, Neural Computation 19(3):792-815 (View Abstract)

V. Sindhwani, W. Chu and S. S. Keerthi (2007) Semi-supervised Gaussian process classifiers, International Joint Conferences on Artificial Intelligence (IJCAI-20):1059-1064 (View Abstract)

W. Chu, V. Sindhwani, Z. Ghahramani and S. S. Keerthi (2006) Relational learning with Gaussian processes, Neural Information Processing Systems (NIPS-19):289-296 (View Abstract)

K. Yu, W. Chu, S. Yu, V. Tresp and Z. Xu (2006) Stochastic relational models for discriminative link prediction, Neural Information Processing Systems (NIPS-19):1553-1560 (View Abstract)

S. K. Shevade and W. Chu (2006) Minimum enclosing spheres formulations for support vector ordinal regression, IEEE International Conference on Data Mining (ICDM-06):1054-1058 (View Abstract)

W. Chu, Z. Ghahramani, R. Krause and D. L. Wild (2006) Identifying protein complexes in high-throughput protein interaction screens using an infinite latent feature model, Pacific Symposium on Biocomputing (PSB-11):231-242 (View Abstract)

W. Chu (2006) Model selection: an empirical study on two kernel classifiers, International Joint Conference on Neural Networks (IJCNN-06):1673-1679

W. Chu, Z. Ghahramani, A. Podtelezhnikov and D. L. Wild (2006) Bayesian segmental models with multiple sequence alignment profiles for protein secondary structure and contact map prediction, IEEE/ACM Transactions on Computational Biology and Bioinformatics 3(2):98-113 (View Abstract)

W. Chu, S. S. Keerthi, C. J. Ong and Z. Ghahramani (2006) Bayesian support vector machines for feature ranking and selection, In I. Guyon, S. Gunn, M. Nikravesh, and L. Zadeh, editors, Feature Extraction, Foundations and Applications Springer:403-418

W. Chu, Z. Ghahramani, F. Falciani and D. L. Wild (2005) Biomarker discovery with Gaussian processes in microarray gene expression data, Bioinformatics 2005(21):3385-3393 (View Abstract)

W. Chu and Z. Ghahramani (2005) Gaussian processes for ordinal regression, Journal of Machine Learning Research 6(Jul):1019-1041 (View Abstract)

W. Chu, C. J. Ong and S. S. Keerthi (2005) An improved conjugate gradient scheme to the solution of least squares SVM, IEEE Transactions on Neural Networks 16(2):498-501 (View Abstract)

S. S. Keerthi and W. Chu (2005) A matching pursuit approach to sparse Gaussian process regression, Neural Information Processing Systems (NIPS-18):643-650 (View Abstract)

W. Chu and Z. Ghahramani (2005) Preference learning with Gaussian processes, International Conference on Machine Learning (ICML-22):137-144 (View Abstract)

W. Chu and S. S. Keerthi (2005) New approaches to support vector ordinal regression, International Conference on Machine Learning (ICML-22):145-152 (View Abstract)

W. Chu and Z. Ghahramani (2005) Extensions of Gaussian processes for ranking: semi-supervised and active learning, Workshop Learning to Rank at (NIPS-18):29-34 (View Abstract)

W. Chu, Z. Ghahramani and D. L. Wild (2004) A graphical model for protein secondary structure prediction, International Conference on Machine Learning (ICML-21):161-168 (View Abstract)

W. Chu, Z. Ghahramani and D. L. Wild (2004) Protein secondary structure prediction using sigmoid belief networks to parameterize segmental semi-Markov models, European Symposium on Artificial Neural Networks (ESANN-05):81-86

W. Chu, S. S. Keerthi and C. J. Ong (2004) Bayesian support vector regression using a unified loss function, IEEE Transactions on Neural Networks 15(1):29-44 (View Abstract)

W. Chu (2003) Bayesian approach to support vector machines, Doctoral Dissertation, National University of Singapore (View Abstract)

K. Duan, S. S. Keerthi, W. Chu, S. K. Shevade and A. N. Poo (2003) Multi-category classification by soft-max combination of binary classifiers, Multiple Classifier Systems (MCS-04) Lecture Notes in Computer Science 2709 Springer:125-134

W. Chu, S. S. Keerthi and C. J. Ong (2003) Bayesian trigonometric support vector classifier, Neural Computation 15(9):2227-2254 (View Abstract)

W. Chu, S. S. Keerthi and C. J. Ong (2002) A general formulation for support vector machines, International Conference on Neural Information Processing (ICONIP-09)

W. Chu, S. S. Keerthi and C. J. Ong (2002) A new Bayesian design method for support vector classification, International Conference on Neural Information Processing (ICONIP-09)

S. S. Keerthi, et al. (2002) A machine learning approach for the curation of Biomedical literature - KDD Cup 2002 (Task 1), SIGKDD Explorations Newsletter, 4(2) Honorable Mention

W. Chu, S. S. Keerthi and C. J. Ong (2001) A unified loss function in Bayesian framework for support vector regression, International Conference on Machine Learning (ICML-18):51-58

"Towards trustworthy large language models in industry domains" Tech Report 2024, INF Team

"INF's open-source 34B large language models" Tech Report 2024, INF Team

"Question directed graph attention network for numerical reasoning over text" EMNLP 2020, 1st place on the DROP leaderboard of AI2, cited by GPT4 Tech Report

"Knowledge graph and real-life applications", presented at CogX 2020

"SpellGCN: incorporating phonological and visual similarities into language models for Chinese spelling check" ACL 2020, access to the code on github

Chief Scientist & Co-Founder, Inf Tech, 2023.04 till now

Senior Director of Engineering, AI Dept, Ant Group, 2017.08 to 2023.04

Director of Engineering, Alibaba Cloud, Alibaba Group, 2014.11 to 2017.08

Principal Applied Scientist Lead, Bing, Microsoft, 2011.05 to 2014.11

Scientist, Yahoo! Labs, 2008.01 to 2011.05

Associate Research Scientist, CCLS, Columbia University, 2006.01 to 2008.01

Research Fellow, Gatsby Unit, University College London, 2003.02 to 2006.01

Seoul Test of Time Award, The Web Conference (WWW), 2023

Test of Time Award, ACM SIGIR, 2022

Best Demo Award, ACM CIKM, 2017

Best Paper Award, ACM WSDM, 2011

Super Star Team Award, Yahoo!, 2008

Honorable Mention Team, ACM KDD CUP, 2002

(by topic: natural language processing, computer vision, recommender systems, bioinformatics, machine learning)