AI Journal
Posts
Artificial Intelligence that Able to Understand Object Relationships

Artificial Intelligence that Able to Understand Object Relationships

December 03, 2021

Many deep learning models struggle to see the world this way because they don’t understand the entangled relationships between individual objects. Without knowledge of these relationships, a robot designed to help someone in a kitchen would have difficulty following commands like “pick up the spatula that is to the left of the stove and place it on top of the cutting board.”

In an effort to solve this problem, MIT researchers have developed a model that understands the underlying relationships between objects in a scene. Their model represents individual relationships one at a time, then combines these representations to describe the overall scene. This enables the model to generate more accurate images from text descriptions, even when the scene includes several objects that are arranged in different relationships with one another.

This work could be applied in situations where industrial robots must perform intricate, multistep manipulation tasks, like stacking items in a warehouse or assembling appliances. It also moves the field one step closer to enabling machines that can learn from and interact with their environments more as humans do.

“When I look at a table, I can’t say that there is an object at XYZ location. Our minds don’t work like that. In our minds, when we understand a scene, we really understand it based on the relationships between the objects. We think that by building a system that can understand the relationships between objects, we could use that system to more effectively manipulate and change our environments,” says Yilun Du, a PhD student in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and co-lead author of the paper.

Du wrote the paper with co-lead authors Shuang Li, a CSAIL PhD student, and Nan Liu, a graduate student at the University of Illinois at Urbana-Champaign; as well as Joshua B. Tenenbaum, a professor of computational cognitive science in the Department of Brain and Cognitive Sciences and a member of CSAIL; and senior author Antonio Torralba, the Delta Electronics Professor of Electrical Engineering and Computer Science and a member of CSAIL. The research will be presented at the Conference on Neural Information Processing Systems in December.

One relationship at a time

The framework the researchers developed can generate an image of a scene based on a text description of objects and their relationships, like “A wood table to the left of a blue stool. A red couch to the right of a blue stool.”

Their system would break these sentences down into two smaller pieces that describe each individual relationship (“a wood table to the left of a blue stool” and “a red couch to the right of a blue stool”), and then model each part separately. Those pieces are then combined through an optimization process that generates an image of the scene.

The researchers used a machine-learning technique called energy-based models to represent the individual object relationships in a scene description. This technique enables them to use one energy-based model to encode each relational description, and then compose them together in a way that infers all objects and relationships.

Understanding complex scenes

The researchers compared their model to other deep learning methods that were given text descriptions and tasked with generating images that displayed the corresponding objects and their relationships. In each instance, their model outperformed the baselines.

They also asked humans to evaluate whether the generated images matched the original scene description. In the most complex examples, where descriptions contained three relationships, 91 percent of participants concluded that the new model performed better.

“One interesting thing we found is that for our model, we can increase our sentence from having one relation description to having two, or three, or even four descriptions, and our approach continues to be able to generate images that are correctly described by those descriptions, while other methods fail,” Du says.

The researchers also showed the model images of scenes it hadn’t seen before, as well as several different text descriptions of each image, and it was able to successfully identify the description that best matched the object relationships in the image.

And when the researchers gave the system two relational scene descriptions that described the same image but in different ways, the model was able to understand that the descriptions were equivalent.

The researchers were impressed by the robustness of their model, especially when working with descriptions it hadn’t encountered before.

Tweets we found Interesting:

A New #MachineLearning Model Could Enable #Robots to Understand Interactions in The World in The Way We Do
#ArtificialIntelligence#AI#Robotic#𝒟𝒾𝑒𝒷𝒪1#𝒟𝒾𝑒𝒷𝒪37
scitechdaily.com/artificial-int…
— George Obeid (@_Georgeobeid)
11:08 AM • Nov 29, 2021

MIT researchers are finding new ways to make Artificial Intelligence that understands the world around them better. This model can understand object relationships when placed together.
#AI#ML#futurism#IntelligenceFactory#digitaltransformation#DX
— Brett Gould (@BrettGouldDX)
10:00 PM • Nov 29, 2021

New machine learning model allows AI to understand complex relationships between objects.
— 1440 Daily Digest (@Join1440)
11:28 PM • Nov 30, 2021

Articles related to the topic:

Artificial intelligence that understands object relationships | MIT News | Massachusetts Institute of Technology — news.mit.edu

MIT researchers developed a machine learning model that understands the underlying relationships between objects in a scene and can generate accurate images of scenes from text descriptions.

Machine-learning model could enable robots to understand interactions in the way humans do — techxplore.com

When humans look at a scene, they see objects and the relationships between them. On top of your desk, there might be a laptop that is sitting to the left of a phone, which is in front of a computer monitor.

Artificial intelligence that understands the relationships between objects - How smart Technology changing lives — voonze.com

At the moment Artificial Intelligence is not as intelligent as in the movies. He manages to learn a lot in a short time, and reach conclusions from what he has seen, as well as create new things, but always using information that humans have given him before.We are still a…