Sunday, December 24, 2017

Notes from ICCV 2017

Top Acceptance Rates
Video and Language - 53.8%
Autonomous Driving - 50%
Large-scale Optimization - 45%

Total - 29%

Favourite Papers on Video and Recognition


A Read-Write Memory Network for Movie Story Understanding
 - Question and answering task for large-scale, multimodal movie story understanding

Temporal Tessellation: A Unified Approach for Video Analysis
 - General approach to video understanding inspired by semantic transfer techniques
 - A test video is processed by forming correspondences between its clips and the clips of reference videos with known semantics, following which, reference semantics can be transferred to the test video.

Unsupervised Action Discovery and Localization in Videos
 - Training data - unlabeled data without bounding box annotations
 - The proposed approach a. Discovers action class labels and b. Spatio-temporally localizes actions in videos

Dense-Captioning Events in Videos
 - Introduce the task of dense-captioning events, which involves both detecting and describing events in a video.
 - Identify all events in a single pass of the video
 - Introduce a variant of an existing proposal module that is designed to capture both short as well as long events that span minutes
 - Introduce a new captioning module that uses contextual information from past and future events to jointly describe all events.
 - New dataset - ActivityNet Captions - 849 video hours with 100k total descriptions

Learning long-term dependencies for action recognition with a biologically-inspired deep network
 - Biological neural systems are typically composed of both feedforward and feedback connections
 - shuttleNet - consists of several processors, each of which is a GRU while associated with multiple groups of hidden states
 - All processors inside shuttleNet are loop connected to mimic the brain's feedforward and feedback connections, in which they are shared across multiple pathways in the loop connection.

Compressive Quantization for Fast Object Instance Search in Videos
 - Object instance search in videos, where efficient point-to-set (image-to-video) matching is essential
 - Jointly optimizing vector quantization method to compress M object proposals extracted from each video into only k binary codes, where k << M
 - Similarity between the query object and the whole video can be determined by the Hamming distance between the query's binary code and the video's best-matched binary code

Complex Event Detection by Identifying Reliable Shots From Untrimmed Videos
 - Formulate as a MIL problem by taking each video as a bag and the video shots in each video as instances
 - New MIL method, which simultaneously learns a linear SVM classifier and infers a binary indicator for each instance in order to select reliable training instances from each positive or negative bag
 - In the objective function balance the weighted training errors and an l1-l2 mixed-norm regularization term which adaptively selects reliable shots as diverse as possible

Spatio-Temporal Person Retrieval via Natural Language Queries
 - Person retrieval from multiple videos
 - Output a tube which encloses the person described by the query
 - New dataset
 - Design a model that combines methods for spatio-temporal human detection and multimodal retrieval

Joint Discovery of Object States and Manipulation Actions
 - Automatically discover the states of objects and the associated manipulation actions
 - Given a set of videos for a particular task, we propose a joint model that learns to identify object states and to localize state-modifying actions

Pixel-Level Matching for Video Object Segmentation using Convolutional Neural Networks
 - The network aims to distinguish the target area from the background on  the basis of the pixel-level similarity between two object units
 - The proposed network represents a target object using features from different depth layers in order to take advantage of both the spatial details and the category-level semantic information

Joint Detection and Recounting of Abnormal Events by Learning Deep Generic Knowledge
 - Recounting of abnormal events - explaining why they are judged to be abnormal
 - Integrate a generic CNN model and environment-dependent anomaly detectors
 - Learn a CNN with multiple visual tasks to exploit semantic information that is useful for detecting and recounting abnormal events
 - Appropriately plugging the model into anomaly detectors

TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals
 - TURN jointly predicts action proposals and refines the temporal boundaries by temporal coordinate regression
 - Fast computation is enabled by unit feature reuse: a long untrimmed video is decomposed into video units, which are reused as basic building blocks of temporal proposals

Spatial-Aware Object Embeddings for Zero-Shot Localization and Classification of Actions
 - Zero-shot localization and classification of human actions in video
 - Spatial-aware object embedding
 - Build embedding on top of freely available actor and object detectors
 - Exploit the object positions and sizes in the spatial-aware embedding to demonstrate a new spatio-temporal action retrieval scenario with composite queries

Temporal Dynamic Graph LSTMs for Action-Driven Video Object Detection
 - Weakly supervised object detection from videos
 - Use action descriptions as supervision
 - But, objects of interest that are not involved in human actions are often absent in global action descriptions
 - Propose a novel temporal dynamic graph LSTM (TD-Graph). TD_graph LSTM enables global temporal reasoning by constructing a dynamic graph that is based on temporal correlations of object proposals and spans the entire video
 - The missing label issue for each individual frame can thus be significantly alleviated by transferring knowledge across correlated object proposals in the whole video


Open Set Domain Adaptation
 - Domain adaptation in open sets - only a few categories of interest are shared between source and target data
 - The proposed method fits in both closed and open set scenarios
 - The approach learns a mapping from the source to the target domain by jointly solving an assignment problem that labels those target instances that potentially belong to the categories of interest present in the source dataset

FoveaNet: Perspective-aware Urban Scene Parsing
 - Estimate the perspective geometry of a scene image through a convolutional network which integrates supportive evidence from contextual objects within the image
 - FoveaNet "undoes" the camera perspective projection - analyzing regions in the space of the actual scene, and thus provides much more reliable parsing results
 - Introduce a new dense CRFs model that takes the perspective geometry as a prior potential

Generative Modeling of Audible Shapes for Object Perception
 - Present a novel, open-source pipeline that generates audio-visual data, purely from 3D shapes and their physical properties
 - Synthetic audio-visual dataset - Sound-20K for object perception tasks
 - Auditory and visual information play complementary roles in object perception, and the representation learned on synthetic audio-visual data can transfer to real-world scenarios

VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation
 - Transfer human supervision between the previously separate tasks

 - Establishing semantic correspondences between images depicting different instances of the same object or scene category
 - CNN architecture for learning a geometrically plausible model for semantic correspondence
 - Uses region proposals as matching primitives, and explicitly incorporates geometric consistency in its loss function

 - The real-world noisy labels exhibit multi-modal characteristics as the true labels, rather than behaving like independent random outliers
 - Propose a unified distillation framework to use “side” information, including a small clean dataset and label relations in knowledge graph, to “hedge the risk” of learning from noisy labels.
 - Propose a suite of new benchmark datasets

 - Presents a framework for localization or grounding of phrases in images using a large collection of linguistic and visual cues
 - Model the appearance, size, and position of entity bounding boxes, adjectives that contain attribute information, and spatial relationships between pairs of entities connected by verbs or prepositions
 - Special attention given to relationships between people and clothing or body parts mentions, as they are useful for distinguishing individuals. 
 - Automatically learn weights for combining these cues and at test time, perform joint inference over all phrases in a caption

 -  Leverage the strong correlations between the predicate and the <subj, obj> pair (both semantically and spatially) to predict predicates conditioned on the subjects and the objects.
 - Use knowledge of linguistic statistics to regularize visual model learning
 - Obtain linguistic knowledge by mining from both training annotations (internal knowledge) and publicly available text, e.g., Wikipedia (external knowledge), computing the conditional probability distribution of a predicate given a <subj, obj> pair
 - Distill this knowledge into the deep model to achieve better generalization

 - Introduce an end-to-end multi-task objective that jointly learns object-action relationships
 - Proposed architecture can be used for zero-shot learning of actions

 - Inspired by module networks, this paper proposes a model for visual reasoning that consists of a program generator that constructs an explicit representation of the reasoning process to be performed, and an execution engine that executes the resulting program to produce an answer
 - Both the program generator and the execution engine are implemented by neural networks, and are trained using a combination of backpropagation and REINFORCE

Wednesday, May 3, 2017

Superintelligence and Singularity

This is another essay I wrote for a class at Maryland.


Understanding and emulating human intelligence has been the target of artificial intelligence researchers for a long time now. However, human-level artificial intelligence is not the final destination. Most researchers seem to think that in the next few decades we will start developing technologies that improve upon human intelligence. This could either happen by increasing human intelligence or creating artificial intelligence which surpasses human intelligence. This will lead to a positive feedback loop where improving intelligence will lead to more technology which improves intelligence further. The mathematician, I. J. Good called this process “intelligence explosion” [5]. This means that even a small improvement in intelligence will lead to immense changes within a short period. This event is called a technological singularity (or the Singularity, here). Singularity is an event where the runaway intelligence growth far surpasses any human comprehension or control. Ray Kurzweil defines the Singularity as a future period during which the pace of technological change will be so rapid, its impacts so deep, that human life will be irreversibly transformed [6].

An agent that possesses intelligence far surpassing that of the brightest and the most gifted humans [1] is called a superintelligence. Many philosophers and AI researchers believe that once we achieve superintelligence, the singularity is not far behind [4; 7]. And they believe that we are not far from achieving superintelligence. This raises questions about what such a superintelligence might do. Some people believe that this raises major existential risks for humans [4]. Others think that this will be extremely useful for humans [6]. However, everyone agrees that the Singularity is an event which will change/end the way we live. Vernor Vinge says that the change will be comparable to the rise of human life on Earth [7]. Eliezer Yudkowsky believes that the next few decades could determine the future of intelligent life. He says that superintelligence is the single most important issue in the world right now [2]. I.J. Good wrote - “The first ultraintelligent machine is the last invention that man need ever make” [5].

In this essay, I will present some paths that might lead to superintelligence, and hence, the Singularity. I will also discuss the ways in which such an agent might affect human lives and some steps to be taken to avoid the “major existential risks”.


There are several ways through which superintelligence could be achieved. It is extremely difficult to predict exactly which one will ultimately lead to superintelligence. However, most researchers believe that some combination of the following is likely to be the reason [4; 7; 8; 6; 2].

Artificial superintelligence
In this scenario, humans will create an artificial intelligence matching human intelligence. But, since an AI operates at much higher speeds than humans, it will be able to rewrite its own source code and create higher intelligence within a very short time leading to an intelligence explosion.

Biomedical improvements
Humans will increase their intelligence by enhancing the functioning of our biological brains. This could be achieved, for example, through drugs, selective breeding, or manipulation of genes. Such cognitive enhancements will accelerate science and technology. This will enable humans to increase their intelligence further. Higher cognitive capabilities will also enable humans to understand their own brains better and thus build a superintelligent AI.

Brain-to-computer interfaces
We will be able to build technology that can directly interface with human brains. This means that we will achieve intelligence amplification through brain-machine interface. There will be no difference between man and machine. They will become a single entity.

Networks and organisations that link humans with one another will become sufficiently efficient to be considered a superhuman being. This is an example of a collective superintelligence. Such a network will be efficient in the sense that the barriers to communication are reduced or removed. All of humanity will become one superintelligent being.


Regardless of how science achieves superintelligence, its impact on intelligent life will be immense. This will be an event similar to the origin of human life on Earth [7]. What will a superintelligent being do? This is an important question. It is also unanswerable before a superintelligence actually emerges. Unlike human intelligence, the space of all possible superintelligences is vast [2]. Yudkowski says that the impact of the intelligence explosion depends on exactly what kind of minds go through the tipping point [2]. Vinge argues that what the superintelligence will do is absolutely unpredictable [7]. You have to be as intelligent as the superintelligence to understand its motivations and actions. On the other hand, Kurzweil believes that technological developments follow typically smooth exponential curves and thus we can predict the arrival of new technology and its impacts [6; 3]. (He makes several such predictions in his book, which I will discuss in a bit.)

Given all of this, there are two main camps of thought about the future: the pessimists and the optimists. The first camp believes that the development of a superintelligence poses a major existential crisis [4]. Bostrom argues that an intelligence explosion will not give us time to adapt. Once someone finds one of the several keys to creating a superintelligence, we will have anywhere from a few hours to a few weeks till it achieves complete world dominance. This is not enough to form strategies for dealing with such a dramatic change. He believes that the default outcome of this event is doom. The first such system will quickly achieve a decisive strategic advantage and become a singleton eliminating all competing superintelligent systems. Even if programmed with a goal to serve humanity, such an agent might have a convergent instrumental reason to eliminate threats to itself. It might consider the same humans it is supposed to serve as hindrances in achieving its goals. The pessimist camp says that there are several malignant failure modes for a superintelligent system. The agent might find some way of satisfying its final goals which violates the intentions of the programmers who defined the goals. Or the agent might transform large parts of the universe into infrastructure needed to satisfy its goals. This will prevent humanity from realising its ”full axio logical potential” [4]. Bostrom also argues that controlling such an agent is almost impossible.

On the other hand, the optimists believe that development of a superintelligence will be beneficial for humanity. Ray Kurzweil says - “The Singularity will allow us to transcend the limitations of our biological bodies and brains. We will be able to live long (as long as we want)...fully understand human thinking and will vastly extend and expand its reach”. He believes that the Singularity will be achieved through brain-machine interface. He envisions a world that is still human but that transcends our biological roots. In his world, there will be no distinction between brain and machine or physical and virtual reality. Kurzweil says that the intelligence will still represent the human civilization. Others in the optimist camp believe that the superintelligent agents will be benevolent gods. Such agents can develop cures for currently incurable diseases, can crack the aging problem, and can find ways to eliminate all human suffering.

The impact of the Singularity is a very contentious issue. However, everyone agrees that it will be immense and the development of a superintelligence will be a world changing event. Such an event also raises moral and ethical issues. Should the superintelligent agent be given moral status? If so, how much? Should the agent be considered on par with humans? Or should it be given a higher moral status? These are important questions and have significant implications.

I believe that development of superintelligence represents the next level in evolution of intelligent beings. I think that if a truly superintelligent being is created, then it has every right to attain world dominance just like we do now. Such an agent might decide to eliminate humans or we might become that agent. But this should not stop us from trying to understand intelligence and build intelligent systems. However, we have to be absolutely sure that such an agent is superintelligent, i.e., is better than humans in all respects. Unless we are sure of that, we have to be extremely careful.

[4] Nick Bostrom. Superintelligence: Paths, dangers, strategies. OUP Oxford, 2014.
[5] Irving John Good. Speculations concerning the first ultraintelligent machine. Advances in computers, 6:31–88, 1966.
[6] Ray Kurzweil. The singularity is near: When humans transcend biology. Penguin, 2005.
[7] Vernor Vinge. The coming technological singularity: How to survive in the post-human era. In Proceedings of a Symposium Vision-21: Interdisciplinary Science & Engineering in the Era of CyberSpace, held at NASA Lewis Research Center (NASA Conference Publication CP-10129).1993, 1993.
[8] Vernor Vinge. Signs of the singularity. IEEE Spectrum, 45(6), 2008.

Thursday, April 13, 2017

False Memories

This is an essay I wrote for a class at Maryland.


Each of us remembers an event or events which none of our friends and relatives remember. You might remember getting lost in a mall while on a family trip or witnessing an accident or, like I did recently, taking a group photograph at a friend’s wedding. However, your friends and family remember something completely different about the day of the event and they all agree on what happened. You think all your friends just have very poor memories and they must have forgotten the event. But, chances are, you are the one who doesn’t remember what happened. The event you so clearly remember might not have ever happened or might have happened very differently. What is happening here? Are you losing your mind or are your friends playing a prank?

False memory is a well studied psychological phenomenon of a person recalling something which either did not occur or occurred differently. When you remember taking a photograph at your friend’s wedding and no such photograph exists, you have created a false memory somehow. In this essay, I will discuss some studies which show how easy it is to acquire false memories. Studying false memories can shed light on how the human brain stores and retrieves memories.

When false memories begin influencing the orientation of a person’s life, the condition is called false memory syndrome (FMS). Though it is not recognised as a psychiatric illness, FMS can affect the “identity and relationships” of a person [1]. In some cases, the whole identity of a person can change because of a false memory of a traumatic experience. Understanding this phenomenon will help us understand ideas about identity and consciousness. Neurological study of patients suffering from FMS can help unlock secrets of the memory creation and storage process.

The concepts of false memory and false memory syndrome are close related to the phenomenon of confabulation which is the process of creating false memories without the intention to deceive. There are profound legal issues related to confabulation and false memories. How do you find out if a person has an intention of deceiving? How much do you trust eye-witness testimony? Which of the thousands of claims of repressed memories childhood sexual abuse do you believe? These are important questions for the judiciary and for psychologists trying to understand the human behaviour. Another related effect is the source-monitoring error. This happens when you incorrectly attribute the source of a memory/information. You might attribute a fact that you know to a book when you actually saw it in a video.

In the next few sections, I will explore some of these phenomena and present some experiments which might make you question every memory you have.


Scientists have discovered several ways of creating false memories in people. Photos, speech, or text have all been used to create these false memories. I will describe some very simple examples of memory distortion and false memory implantation. First, I discuss a very influential study by Loftus et al. which shows how language can create false memories.

Recalling incorrect information due to language of the question
In [7], Loftus and Palmer showed videos of cars hitting each other to a few subjects. They then asked the subjects to estimate the speed of the cars. They found that using different words to describe the accident led to different estimates of the speed. For example, the question “About how fast were the cars going when they smashed into each other?” [7] led the subjects to estimate the speed of the cars higher than when using the verbs bumped, collided, contacted, and hit. They observed similar trends for the question “Did you see any broken glass?” [7]. This showed that human memories are extremely susceptible to suggestion and can be influenced by changing just a single word.

Similar studies with changing an article (“Did you see a stop sign?” vs. “Did you see the stop sign?”) or an adjective (“How tall was the basketball player?” vs. “How short was the basketball player?”) in the question led to differing accounts of events [2]. This is because using a particular word instead of others causes subjects to create certain presuppositions which colour their judgment about the events in questions. This raises questions about the reliability of the recalled memories. Dr. Loftus has written extensively about the unreliability of memories recalled through prolonged searches for them [6]. She says that the rise in cases of child abuse involving repressed memories is alarming. The possibility of these recalled memories actually being false memories should not be ignored. In some cases, the psychiatrists themselves might be responsible for creating these false memories in the subjects through techniques like age regression, hypnosis, guided visualisation, etc.

Another study involving the use of language for creating false memories dealt with remembering lists of words.

False memories through lists
The authors of [8] show that even college students who are “professional memorisers” can falsely remember words not present in a list which they were asked to remember. Subjects were given lists of words related to a concept (nonpresented word), without explicitly stating the concept in the list. For example, a subject might have been given the list bed, alarm, rise, dream, ... etc. All these words are usually associated with sleep but the word sleep is not explicitly mentioned in the list. The recall rate of the nonpresented word was very high in the subjects. This led the authors to conclude that all memory is constructive in nature. This was in contradiction to the theory of reproductive and reconstructive memories proposed by Bartlett and Burt [4] which was the prevalent belief at that time. The theory said that list learning paradigms come under rote reproduction which causes few errors. On the other hand, rich material like stories encourages constructive processes which form associations and connection between different parts of the material. Retrieval of these memories leads to more errors. By showing incorrect recall of words in lists, the authors of [8] showed that the distinction between reproductive and reconstructive memories was ill-founded.

Obviously, language is not the only source of false memory creation. The next section describes a study which shows that visual information can also lead to false memories.

Photographs with news articles
Photographs accompanying a news article can help cement the content better [10]. In their experiments, authors of [10] showed newspaper headlines to subjects. Some of these headlines were accompanied by photos which were tangentially related to the headline. Also, some of these headlines were false, that is, the events described in the headlines had never actually happened. After reading the headlines and seeing the photographs, where present, the subjects were asked whether they remembered the events described in the headlines. The authors found that photos mattered. For both true and false headlines, people remembered more of the events described by the headlines which were accompanied by photographs. In remembering the events described by the false newspaper headlines,people had created false memories of the events. And they created more false memories for the events which had photos associated with them. The authors claimed that this could be explained by Rubin’s basic systems approach to memory [9]. This theory says that memory is a result of multiple systems and subsystems - visual, auditory, language etc. - which interact and reinforce each other. Providing stimulus to multiple subsystems leads to reinforcement of each subsystem and that helps in creating stronger memories.

From all these ways of creating false memories, we can clearly say that the study of the phenomenon of false memory can help in answering several questions about the human brain and how it encodes, stores, and retrieves information.

However, false memories are not just a personal phenomena. Whole societies and communities can create false memories among the community. Similar false memories can be shared by many people or the whole community.

Collective False Memory
Very recently in a lecture, someone mentioned that Jimmy Carter held a nuclear engineering degree. A lot of people in the audience agreed with this fact. However, on checking, I found out that he actually did not hold a nuclear engineering degree. This is an example of a collective false memory - a memory shared by multiple people which is incorrect. This phenomenon is also called the ‘Mandela’ effect due to several people around the world incorrectly remembering that Nelson Mandela died in the 1980s. Social reinforcement of false memories is held to be one of the leading causes of collective false memory. Suggestibility of people under similar circumstances can also lead to the creation of collective false memories.

The study of false memories can give important clues as to how human memory is encoded, stored, manipulated, and retrieved. Studying retroactive interference and the misinformation effect [3] can help us in understanding the encoding process for memories. Retroactive interference is the process by which information presented later interferes with the information already stored in the brain. This causes the earlier information/memories to be modified or completely erased. This effect can be clearly seen at play during several studies which create false memories (e.g. the case of false memory creation through language.). Neurological studies while conducting false memory experiments can reveal the areas of the brain being affected by the incorrect information.

False memories are also related to imagination. In [5], the authors demonstrated “imagination inflation” - the phenomenon that simply imagining a childhood event increases the confidence of the subjects that that event actually happened. Studying this further might help us understand how we imagine, what is the process of forming pictures in the “mind’s eye”, and how is imagination related to memory.

Studying false memory, like any other peculiar human behaviour can provide important information about the human brain and the mind.

[1] memory syndrome.
[2] memory.
[3] effect.
[4] Frederic Charles Bartlett and Cyril Burt. Remembering: A study in experimental and social psychology. British Journal of Educational Psychology, 3(2):187–192, 1933.
[5] Maryanne Garry, Charles G Manning, Elizabeth F Loftus, and Steven J Sherman. Imagination inflation: Imagining a childhood event inflates confidence that it occurred. Psychonomic Bulletin & Review, 3(2):208–214, 1996.
[6] Elizabeth Loftus. Memory distortion and false memory creation. Bulletin of the American Academy of Psychiatry and Law, 24(3):281–295, 1996.
[7] Elizabeth F Loftus and John C Palmer. Reconstruction of automobile destruction: An example of the interaction between language and memory. Journal of verbal learning and verbal behavior, 13(5):585–589, 1974.
[8] Henry L Roediger and Kathleen B McDermott. Creating false memories: Remembering words not presented in lists. Journal of experimental psychology: Learning, Memory, and Cognition, 21(4):803, 1995.
[9] David C Rubin. The basic-systems model of episodic memory. Perspectives on Psychological Science, 1(4):277–311, 2006.
[10] Deryn Strange, Maryanne Garry, Daniel M Bernstein, and D Stephen Lindsay. Photographs cause false memories for the news. Acta psychologica, 136(1):90–94, 2011.