top of page

FAVORITE PAPERS

Intro
Introduction​​

Each year over the past half decade, there have been increasingly more AI papers released by researchers from around the world.  Accordingly, it has become very hard to keep up and distinguish the ones meriting attention from the ones offering little new information and containing dubious results. 

 

Using citations as a barometer for a paper's importance is a decent start, but it fails to identify the interesting ideas without the results yet to justify immediate mass response by AI research community.  It will also lead to reading papers reusing many of the same ideas.

​

For that reason, I wanted to curate a list of papers for others to use as a starting point to their exploration into wild world of AI research.  This list of papers was formulated with the intention of providing the following:

​

  • interesting ideas  (in terms of the introduction of a new concept or clear extension of an important old one)

  • seminal work  (in terms of advancing a previous state-of-art benchmark)

  • diversity  (avoid listing too many papers from a particular sub-domain of AI)

Image Classification​
​
ResNet Architecture
​

December 2015

Radically new and accurate model representing one of the biggest breakthroughs in modern CNNs.  Changed the paradigm for most future CNNs in that by noticing the improvement in speed and accuracy of training very deep CNNs with addition rather compounding multiplication, gradient flow was identified as one of key factors in trainability.

​

DenseNet

​

August 2016

This CNN extended the work of the ResNet architecture by concatenating layers to reduce the path of error signals from a model's last layers to its first layers.

​

Spatially Adaptive Computational Time
​

December 2016

Interesting model (expanding on Alex Grave's ACT paper) where certain sections of an image can be processed by the CNN for longer periods than other sections depending on the section's complexity. This idea of processing time being dynamically dependent on the complexity of the input attempts to mimic how humans would process things.
 

Dual Path Networks
​

July 2017

Model combining DenseNet and ResNet into a singular model with impressive results. Argues that each of the two models' channels provides different information and by combining them, it's possible to get a bit of the best from both models.

​

SliceNet
​

July 2017

Architecture inspired by Xception and ByteNet. Introduces "super"-separable convolution to achieve state-of-the-art parameter efficiency.

​

Squeeze-Excite
​

September 2017

Paper introducing a new network building block called the "Squeeze-Excite" layer. Essentially, it uses a models activations at a certain layer to reduce some channels of activations and emphasis other channels of activations.

​

ShuffleNet v2
​

July 2018

Model leveraging parameter efficiency of super-separable convolution plus a shuffling technique to create a very small, yet powerful CNN (ideally suited for running on devices with limited memory and compute).

IMAGE CLASSIFICATION
IMAGE SEGMENT
Image Segmentation and Localization
Faster RCNN
​

June 2015

Multi-step model for generating object region proposals and then refining them for object localization.  At the time, this was a breakthrough paper in object localization and shaped a considerable amount of future research.

​

The One Hundred Layers Tiramisu

​

November 2016

Model leveraging recent concepts such layer concatenation (from DenseNet), skip connections, and deconvolutions to produce state-of-the-art image segmentation.

​

YOLO v3
​

December 2016

A model with a much different approach to RCNN for object localization.  It performs the task in a single step by pre-processing the image into regions and possible anchor boxes and then outputting bounding boxes and class scores for each anchor box.

​

Efficient Video Object Segmentation
​

February 2018 

Super promising model for segmenting frames in a video by using the initial frame and then subsequent spatial guides (e.g. predicted object mask of the previous frame) for each frame to slightly modulate the parameters for the main segmentation network. Significantly faster than existing methods with almost identical accuracy.

Video Processing
Learning to Generate Long-term Future via Hierarchical Prediction
​
April 2017

Multi-component model using seperate pose and image modules to collectively predict future images in a sequence from a clip.  It is intended for use with single actor sequences and performs extremely well.  

​

Predictive-Corrective Networks

​

April 2017

Very interesting concept of using frame embedding differences as input to layers. It is designed to mirror human image processing using Kalman filters. The subtraction operation provides a way for gradient information to flow between time steps.

​

I3D and Kinetics Dataset
​

May 2017

Groundbreaking model from DeepMind for video processing using two-stream 3D convolutions and initializing models using weights from an inflated, pretrained 2D CNN. Paper also introduces a new enormous video dataset called Kinetics.

​

Extraction and Classification of Diving Clips
​

May 2017

Paper divides sports classification problem into three steps. First it attempts to do temporal cropping.  Then, the framework localizes and tracks the main individual(s) in the clip.  And finally, it predicts the actions with high accuracy.  Interesting approach combining multiple accepted video processing ideas and using each step to help the next step with its objective.

​

Learning to Learn from Noisy Web Videos
​

June 2017

Fascinating paper showing technique for getting new training examples from online without needing them all to be labelled accurately.  It leverages the change in error rate from training a classifier with the new data selected using a Deep Q-Network.
            

Visual Interaction Network (VIN)
​

June 2017

Paper by DeepMind attempting to predict future physical states from video data using only a handful of frames. The key components are a visual encoder, dynamics predictor, and a state decoder. The core engine of the model uses relational reasoning to process dynamics between objects.

​

What Actions are Needed for Understanding Human Actions in Videos
​

August 2017

A high level overview of the common themes and methodologies from current video processing research.  It presents concepts shared by the most successful models and also explores areas for future research.

​

Temporal Relational Reasoning in Videos
​

November 2017

Model copying overarching idea from Deepmind's Relational Reasoning model and applying it to videos.  Essentially, the model attempts to process sequences of frames of different lengths through simple networks and combine their output with an addition operation to reason about that combination. It worked quite well and is an extremely promising as an approach.

​

Learning 3D Human Dynamics Video
​

December 2018

An interesting recent model combining ideas from "Learning to Generate Long-term Future via Hierarchical Prediction" paper (above) and Deepmind's series of papers on various imagination modules for providing information about action dynamics.

VIDEO PROCESS
Sequential Input Processing
PixelRNNWaveNet, and ByteNet
​

Mid-2016

A series of transformative papers by Deepmind applying skip connections and dialated convolutions to various types of sequential input in a new way.  Varients of WaveNet have been used in smart devices for years as the preeminent network for speech-to-text.

​

Adaptive Computational Networks (ACT)

​

March 2016

An exceptional paper addressing the inconvenient fact that certain inputs require more processing than others, and DNNs have never really accounted for that.  This RNN lets the model determine how long it needs to process the input.  It does not yield groundbreaking results in character prediction, but it's an idea that could easily reappear as a great solution in a different context.

​

Attention Is All You Need (Transformer Model)
​

June 2017

An often used sequence encoder-decoder framework with state-of-the-art results.  This model has been used as the core encoder-decoder module for a number of other prominent frameworks.

​

Dilated Recurrent Neural Networks
​

October 2017 

A multi-layer RNN where the connections are designed to be more effective for long-range relationships by skipping a larger number of time steps at higher layer nodes.  This skip pattern is similar to that of a dilated convolution.

​

Recurrent Relational Networks
​

November 2017

A great model for solving multi-step processes requiring sequential input. Having achieved state-of-the-art results on previous relational reasoning benchmarks, this paper introduces a new, more difficult dataset for reasoning and also solves Sudoku's using the new model.

​

Nested LSTMs
​

January 2018 

A nice variant of a traditional LSTM where the inner LSTM can store longer-term memories and the outer LSTM tends to keep more fluid memories.  It seems to work well in practice too.

SEQUENTIAL INPUT
Deep Reinforcement Learning
Curiosity-driven Exploration by Self-supervised Prediction
​
May 2017

Initial paper by Deepmind in a sequence of seminal deep RL papers using imagination and curosity to drive learning in environments with sparse rewards. In this model, there is an Intrinstic Curosity Module for constant learning - even in states with no reward. 

​

Learning Model-based Planning from Scratch

​

July 2017

Novel paper by Deepmind using Deep RL to create dual-pronged model where a "manager" module can decide to either "imagine" or "act" in an environment using several imagination-based planning strategies.  Model also keeps track of a history so that it can reference it to make better decisions going forward.

​

Rainbow: Combining Improvements in Deep Reinforcement Learning
​

October 2017

A nice overview of the current state of RL on the whole and an exploration of the techniques having shown promise when used in conjunction with each other.

​

Temporal Difference Models
​

February 2018

A Deep RL model successfully combining the advantages model-based and model-free learning. This paper was heralded for its success in overcoming many of the normal limitations of traditional RL approaches.

​

Data Efficient Hierarchical Reinforcement Learning
​

May 2018

Paper discussing an approach to using a multi-level Hierarchical RL controller for solving more complex tasks without the normal constraints of needing task-specific design and on-policy training.  These models can also be used with limited number of interaction examples.

            

Unsupervised Meta-Learning for Reinforcement Learning
​

June 2018

Fascinating paper proposing a family of unsupervised meta-learning algorithms with experimental results demonstrating accelerated reinforcement learning procedures without the need for manual task design. 

​

Relational Deep Reinforcement Learning
​

June 2018

Excellent paper by DeepMind presenting a model using self-attention to iteratively reason about the relationships between entities in a scene and to guide a model-free policy.  Ultimately, the model performs at state-of-the-art levels on StarCraft mini-games and shows high-level ability to generalize to new tasks.

DEEP RL
OTHER DOMAINS
Other Domains
Learning to Learn by Gradient Descent
​
June 2016

Interesting paper exploring the idea of using a DNN to replace an optimizer in training RNNs and CNNs. Though the results were highly biased towards specific use-cases, the paper opens up new possibilities for using deep learning to improve deep learning.

​

Learning from Simulated and Unsupervised Images through Adversarial Training

​

December 2016

Innovative paper discussing a solution to reduce the impact of one of the major bottlenecks of deep learning -- lack of training data.  This paper generates new training data using simulated images post-processed through a GAN.

​

In Defense of the Triplet Loss
​

March 2017

Paper referenced in my blog post [link] outlining common uses and methodologies for deploying the triplet loss function.  Also, the paper suggests a new alternative method for generating triplet batches to improve training efficiency.

​

Time-Contrastive Networks
​

April 2017

An excellent multi-part model leveraging the triplet loss function for action recognition.  The model parlays its success in grouping radically different looking versions of the same action together into marked improvements in a larger framework for making robots perform the correct movements/actions with their bodies to mimic human movements/actions.

​

Hierarchical Representations for Efficient Architecture Search
​

November 2017

One of the better papers on using AI to explore new/optimal architectures for constructing CNN's. Relative to other papers, this one incorporates more degrees of freedom and flexibility in its architecture search. Ultimately, the search produces models yielding near state-of-the-art accuracy.

​

Mastering Chess and Shogi by Self-Play
​

December 2017

Paper discussing the evolution of AlphaGo (e.g. initial Go playing AI created by Deepmind) into AlphaZero (e.g. general game playing AI capable of mastering many games including chess and shogi).
            

Adversial Patch
​

December 2017

Paper in AI security identifying a consistent weakness of CNN's to be vulnerable to a targeted "patch attack".  In essence, paper proves that many different CNN architectures can be tricked to output a desired incorrect class by adding a patch to the input image.  This finding is very disturbing due to the number of AI systems susceptible to this attack in production and also because of the prospect of future similar attacks on other AI models/frameworks.

​

Efficient Neural Architecture Search via Parameter Sharing
​

February 2018

Paper outlining a new method to find accurate CNN and RNN architectures in days (as opposed to weeks) by sharing weights and adding a couple of slightly restrictive rules. This methodology is currently the best for balancing final accuracy and speed of search.

​

Automatic Paper Summary Generation from Visual and Textual Information
​

November 2018

Novel new approach to the task of Paper Summary Generation (PSG) using combination of a vision-based supervised components detector and language-based unsupervised important sentence extractor.  Model produces very promising results and ideas for further exploration within the domain.

​

Massively Distributed SGD: Training ImageNet/ResNet-50 in Minutes
​

November 2018

A very recent paper looking at the increasingly popular exploration of mechanisms to induce training time speed-ups.  In this case, a variety of traditional methods are used in conjunction with several new ones to train ImageNet on a ResNet-50 in a distributed fashion over the course of just a handful of minutes.

bottom of page