Combine CV with NLP tasks,place emphasis on Image/Video Captioning、VQA、Paragraph Description Generation and Medical Report Generation.
- Image/Video Captioning
- Paragraph Description Generation
- Visual Question Answering
- Medical Report Generation
- Medical Image Processing
- Medical Datasets
- Natural Image Tasks
- Metrics
- Others
-
CNN-RNN
- Show and Tell: A Neural Image Caption Generator, Oriol Vinyals et al, CVPR 2015, Google(pdf)
- Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, Kelvin Xu et at, ICML 2015(pdf)(code)
- Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge, PAMI 2016(pdf)(code)
- Areas of Attention for Image Captioning, ICCV 2017(pdf)
- Rethinking the Form of Latent States in Image Captioning, ECCV 2018, CUHK(pdf)
- Recurrent Fusion Network for Image Captioning, ECCV 2018, Tencent AI Lab, 复旦(pdf)
- Move Forward and Tell- A Progressive Generator of Video Descriptions, ECCV 2018, CUHK(pdf)
- Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks, CVPR 2016(pdf)
-
CNN-CNN
-
Reinforcement Learning
-
Others
- A Neural Compositional Paradigm for Image Captioning, NIPS 2018, CUHK(pdf)
- CNN-RNN
- DenseCap: Fully Convolutional Localization Networks for Dense Captioning, Justin Johnson et al, CVPR 2016, Standford(homepage)(code)
- A Hierarchical Approach for Generating Descriptive Image Paragraphs, Jonathan Krause et al, CVPR 2017, Stanford(homepage)(dense-caption code)
- Recurrent Topic-Transition GAN for Visual Paragraph Generation, ICCV 2017
- Diverse and Coherent Paragraph Generation from Images, ECCV 2018(code)
- CNN-RNN
- Multi-level Attention Networks for Visual Question Answering, CVPR 2017
- Motion-Appearance Co-Memory Networks for Video Question Answering, 2018
- Deep Attention Neural Tensor Network for Visual Question Answering, ECCV 2018, HIT
- Question-Guided Hybrid Convolution for Visual Question Answering, Peng Gao et al, ECCV 2018, CUHK(pdf)
-
CNN-RNN
- Learning to Read Chest X-Rays- Recurrent Neural Cascade Model for Automated Image Annotation, CVPR 2016(pdf)
- TieNet Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-rays, Xiaosong Wang et at, CVPR 2018, NIH(pdf)(author's homepage)
- On the Automatic Generation of Medical Imaging Reports, Baoyu Jing et al, ACL 2018, CMU(pdf)(author's homepage)
- Multimodal Recurrent Model with Attention for Automated Radiology Report Generation, Yuan Xue, MICCAI 2018, PSU
-
Reinforcement Learning
- Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation, Christy Y. Li et al, NIPS 2018, CMU(pdf)(author's homepage)
-
Other
- TextRay Mining Clinical Reports to Gain a Broad Understanding of Chest X-rays, 2018 MICCAI(pdf)
-
检测(detection)
- CheXNet- Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning, 2018 吴恩达
- Attention-Guided Curriculum Learning for Weakly Supervised Classification and Localization of Thoracic Diseases on Chest Radiographs, Yuxing Tang et at, MICCAI 2018, NIH(pdf)
- DeepRadiologyNet - Radiologist Level Pathology Detection in CT Head Images
- 肺部CT图像病变区域检测方法
- 基于定量影像组学的肺肿瘤良恶性预测方法
-
增强(enhace)
- 超分(super resolution)
- Image Super-Resolution Using Deep Convolutional Networks
- Deeply-Recursive Convolutional Network for Image Super-Resolution
- 超分(super resolution)
-
分割(segmentation)
- U-Net: Convolutional Networks for Biomedical Image Segmentation, 2015 MICCAI
- A 3D Coarse-to-Fine Framework for Automatic Pancreas Segmentation
- NIH Chest X-ray8/14(download link)(kaggle's download link)
- ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases, CVPR 2017, NIH(pdf)
- Open-i Chest X-Ray(download link)
- Radiology Objects in COntext(ROCO)
- Radiology Objects in COntext (ROCO): A Multimodal Image Dataset, MICCAI 2018(intro)(pdf)(download)
- Detection
- You Only Look Once- Unified, Real-Time Object Detection, CVPR 2016
- BLEU
- BLEU: a method for automatic evaluation of machine translation, Kishore Papineni et al, ACL 2002(pdf)
- CIDEr
- CIDEr: Consensus-based Image Description Evaluation, CVPR 2015(pdf)
- Visual Commonsense Reasoning(VCR-视觉常识推理)
- From Recognition to Cognition- Visual Commonsense Reasoning, Rowan Zeller et al, 2018, Paul G. Allen School(homepage)(pdf)
- Language Model(语言模型)
- Word Representations
- Deep contextualized word representations, Matthew E. Peters et al, NAACL 2018, Paul G. Allen School(homepage)(pdf)(code-tf)