Skip to content
/ Papers Public
forked from wangleihitcs/Papers

CV和NLP结合的任务,侧重于图像生成文字

Notifications You must be signed in to change notification settings

cike14/Papers

 
 

Repository files navigation

Intro

Combine CV with NLP tasks,place emphasis on Image/Video Captioning、VQA、Paragraph Description Generation and Medical Report Generation.

Papers and Codes/Notes

Image Video Captioning

  • CNN-RNN

    • Show and Tell: A Neural Image Caption Generator, Oriol Vinyals et al, CVPR 2015, Google(pdf)
    • Show, Attend and Tell: Neural Image Caption Generation with Visual Attention, Kelvin Xu et at, ICML 2015(pdf)(code)
    • Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge, PAMI 2016(pdf)(code)
    • Areas of Attention for Image Captioning, ICCV 2017(pdf)
    • Rethinking the Form of Latent States in Image Captioning, ECCV 2018, CUHK(pdf)
    • Recurrent Fusion Network for Image Captioning, ECCV 2018, Tencent AI Lab, 复旦(pdf)
    • Move Forward and Tell- A Progressive Generator of Video Descriptions, ECCV 2018, CUHK(pdf)
    • Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks, CVPR 2016(pdf)
  • CNN-CNN

  • Reinforcement Learning

    • Improving Reinforcement Learning Based Image Captioning with Natural Language Prior, 2018, Tencent/IBM(pdf)
    • End-to-End Video Captioning with Multitask Reinforcement Learning(pdf)
  • Others

    • A Neural Compositional Paradigm for Image Captioning, NIPS 2018, CUHK(pdf)

Paragraph Description Generation

  • CNN-RNN
    • DenseCap: Fully Convolutional Localization Networks for Dense Captioning, Justin Johnson et al, CVPR 2016, Standford(homepage)(code)
    • A Hierarchical Approach for Generating Descriptive Image Paragraphs, Jonathan Krause et al, CVPR 2017, Stanford(homepage)(dense-caption code)
    • Recurrent Topic-Transition GAN for Visual Paragraph Generation, ICCV 2017
    • Diverse and Coherent Paragraph Generation from Images, ECCV 2018(code)

Visual Question Answering

  • CNN-RNN
    • Multi-level Attention Networks for Visual Question Answering, CVPR 2017
    • Motion-Appearance Co-Memory Networks for Video Question Answering, 2018
    • Deep Attention Neural Tensor Network for Visual Question Answering, ECCV 2018, HIT
    • Question-Guided Hybrid Convolution for Visual Question Answering, Peng Gao et al, ECCV 2018, CUHK(pdf)

Medical Report Generation

  • CNN-RNN

    • Learning to Read Chest X-Rays- Recurrent Neural Cascade Model for Automated Image Annotation, CVPR 2016(pdf)
    • TieNet Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-rays, Xiaosong Wang et at, CVPR 2018, NIH(pdf)(author's homepage)
    • On the Automatic Generation of Medical Imaging Reports, Baoyu Jing et al, ACL 2018, CMU(pdf)(author's homepage)
    • Multimodal Recurrent Model with Attention for Automated Radiology Report Generation, Yuan Xue, MICCAI 2018, PSU
  • Reinforcement Learning

    • Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation, Christy Y. Li et al, NIPS 2018, CMU(pdf)(author's homepage)
  • Other

    • TextRay Mining Clinical Reports to Gain a Broad Understanding of Chest X-rays, 2018 MICCAI(pdf)

Medical Image Processing

  • 检测(detection)

    • CheXNet- Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning, 2018 吴恩达
    • Attention-Guided Curriculum Learning for Weakly Supervised Classification and Localization of Thoracic Diseases on Chest Radiographs, Yuxing Tang et at, MICCAI 2018, NIH(pdf)
    • DeepRadiologyNet - Radiologist Level Pathology Detection in CT Head Images
    • 肺部CT图像病变区域检测方法
    • 基于定量影像组学的肺肿瘤良恶性预测方法
  • 增强(enhace)

    • 超分(super resolution)
      • Image Super-Resolution Using Deep Convolutional Networks
      • Deeply-Recursive Convolutional Network for Image Super-Resolution
  • 分割(segmentation)

    • U-Net: Convolutional Networks for Biomedical Image Segmentation, 2015 MICCAI
    • A 3D Coarse-to-Fine Framework for Automatic Pancreas Segmentation

Medical Datasets

Natural Image Tasks

  • Detection
    • You Only Look Once- Unified, Real-Time Object Detection, CVPR 2016

Metrics

  • BLEU
    • BLEU: a method for automatic evaluation of machine translation, Kishore Papineni et al, ACL 2002(pdf)
  • CIDEr
    • CIDEr: Consensus-based Image Description Evaluation, CVPR 2015(pdf)

Others

  • Visual Commonsense Reasoning(VCR-视觉常识推理)
    • From Recognition to Cognition- Visual Commonsense Reasoning, Rowan Zeller et al, 2018, Paul G. Allen School(homepage)(pdf)
  • Language Model(语言模型)
    • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin et al, 2018, Googel AI Language(pdf)(code)
  • Word Representations
    • Deep contextualized word representations, Matthew E. Peters et al, NAACL 2018, Paul G. Allen School(homepage)(pdf)(code-tf)

About

CV和NLP结合的任务,侧重于图像生成文字

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published