The Long And Winding Road To Artificial General Intelligence

As a young graduate student in the mid 1990s I became fascinated by the work of Yann LeCun. 

Yann, one of the original innovators in Artificial Intelligence (AI) and the creator of Deep Learning, had already been thinking about AI for over a decade. His original paper on Convolution Neural Networks introduced the idea that we could actually teach computers how to “learn”. We could do that by using neural networks and linear algebra gradient-based techniques. If we showed a neural network thousands of examples of an object, we could teach the network to recognize similar shapes and similar objects in other images.

The idea that we could build a computer model for “intelligence” and teach it to learn was fascinating to me. I spent the next few years building such models, training networks to predict hurricanes and weather patterns, and to detect benign from malignant cancerous masses. 

Later, after leaving academia, I continued to build Deep Learning networks to analyze financial price data, to make trade decisions, to identify patterns in cybersecurity and, in my current startup, to identify weeds from crops in farms, invasive species in the Florida Everglades, diseases and nutritional issues in plants before we could see those issues with the naked eye.

The models have been successful, some of them have even been incorporated into widely adopted industry products. Yet I would not necessarily say that they are intelligent. Predictions are close but never one hundred percent correct. Most models are there to provide a second opinion to decisions concurrently made by actual people. 

So it is not surprising to see Yann’s latest revelation: after forty years working in this space he no longer believes that we are on the right path to achieve general intelligence in AI. We will never get there following the path that we are on, that includes using supervised learning or reinforcement learning. Supervised learning is what most people use to create machine learning applications; to train a network we need to have thousands, or even tens of thousands, of labeled samples. Reinforcement learning requires us to train a network by trial and error, where we need to show the errors to the network over and over for it to learn what it should and shouldn’t do. This is not the way that humans or animals learn. The networks that we create with these techniques are specialized and brittle, and only good for specific tasks. And they do, at times, make absurd mistakes. 

In our continuous search for general intelligence in AI we tend to throw very large amounts of data at the problem. We build and train bigger and bigger models with bigger and bigger datasets, thinking that if we include enough data we could represent all possible scenarios. Yet no matter how much data we use, how large the networks are, or how many trials we run, we are never able to scale up the models to perform anything bigger than simple tasks. We will not arrive to human level intelligence by just training larger models with more data. 

Animals and humans on the other hand, can learn new tasks very quickly and with little data. They rely on background knowledge to understand how the world works. They accumulate this background knowledge by watching the world from the time that they are born, not only by trial and error. We translate this knowledge to the idea of common sense. Animals and humans learn to predict consequences of actions. They learn to reason, which then allows them to plan future steps. 

After all this time we haven’t quite discovered how to introduce common sense to machines. Animals and humans learn by observing their environment. But we haven’t been able to teach machines how to do that, to understand how the world works, how to reason, and how to learn. This may not necessarily be a problem if our objective is to continue to use AI for automation of tasks that don’t require common sense, or to help us make crucial decisions while an actual person oversees that decision. But it will be an issue if our goal is to do anything beyond that, such as taking care of a baby or using a self-driving car in a very crowded street.

According to Yann, we don’t need gigantic brains to have common sense. And we don’t need to keep throwing in more data, or making the network bigger to figure this out. It’s more of a question of what is the right paradigm for learning to achieve human-level general intelligence, and what is the right architecture for the network that we use, as opposed to supervised or reinforcement learning. Yann believes that we need to pivot to a different Deep Learning network architecture.

But I tend to disagree. I believe that the pivot needs to be much bigger than that. For true machine level intelligence the pivot should include not only the architecture that we are using to build our networks, but also the computer language, the hardware, chips, and the substrate that our technology runs on.

Life runs on carbon, not silicon. Life runs on molecular programming, on a conductive element (carbon) and not on silicon using binary programming and semi-conductors. If we truly want to understand general intelligence we need to begin by looking at the basics, such as how do life-lessons learned on carbon-based life become inherent knowledge? How do carbon-based protocols really work? And what is the essence of intelligence on carbon-based life?

How difficult would it be to leverage carbon-based technology to reach this goal? 

It sounds overwhelming until we realize that silicon chips and integrated circuits were only invented in the 1960s, and the silicon transistor not much before that. It took a few decades to go from the idea of the transistor to having supercomputing capabilities in our pockets (such as phones) capable of recognizing facial features, track our location, show movies, and much more. With an increasing demand for chips in evolving areas such as medical sciences, vaccine development, genome research and quantum computing, the limits of silicon are becoming more apparent. We need chips to be smaller, faster and more efficient, but shrinking silicon is becoming inefficient. The size of components in these chips is almost that of an atom; soon we will be unable to shrink them further. The energy used to power these chips is becoming unmanageable as more components are added, and trying to keep them cool has proven to be a challenge.

Carbon, on the other hand, is a conductor and has a talent for conducting and dissipating heat. We have seen thousands of patents being granted for carbon chip technology just in the last few years, so we are collectively already thinking about this transition. And with the focus that we have on re-building our chip manufacturing capabilities, allocating a good portion of the investment to carbon chips may be a good idea. By 2030 about 25% of the world’s energy, most of which is produced by burning fossil fuels, will be consumed by electronic devices if nothing is done to make them more energy efficient. The potential consequences of not pivoting, or delaying this pivot, could be detrimental. 

Taking a step back and looking at the situation more broadly and with a change in perspective will help. Rather than framing the problems that we currently have in this space, such as scaling limitations, heat dissipation concerns, escalating energy consumption, not being able to replicate common sense in our models, etc, we can look at the original intent of the technology which is to imitate life’s processes, automation and intelligence. We already have a working model for the end state scenario with carbon, i.e. life.

The demand for change is there. We need to focus on the supply side. A true innovator’s dilemma opportunity, and a space that I will be following closely. 

Leave a comment