Wednesday, June 1, 2022

Ideas for Machine Learning

A lot of what is called "AI" is less "Artificial Intelligence" and more "Machine Learning". The differences between the two are technical and rather boring, so few people talk about them. From a marketing perspective, "Artificial Intelligence" sounds better, so more people use that term.

But whichever term you use, I think we can agree that the field of "computers learning to think" has yielded dismal results. Computers are fast and literal, good at numeric computation and even good at running video games.

It seems to me that our approach to Machine Learning is not the correct one. We've been at it for decades, and our best systems suffer from fragility, providing wildly different answers for similar inputs.

That approach (from what I can tell) is to build a Machine Learning system, train it on a large set of inputs, and then have it match other inputs to the training set. The approach tries to match similar aspects to similar aspects.

I have two ideas for Machine Learning, although I suspect that they will be rejected by the community.

The first idea is to change the basic mechanism of Machine Learning. Instead of matching similar inputs, design systems which minimize errors. That is, balance the identification of objects with the identification of errors.

This is a more complex approach, as it requires some basic knowledge of an object (such as a duck or a STOP sign) and then it requires analyzing aspects and classifying them as "matching", "close match", "loose match", or "not a match". I can already hear the howls of practitioners, switching their mechanisms to something more complex.

But as loud as those complaints may be, they will be a gentle whisper compared to the reaction of my second idea: Switch from 2-D photographs to stereoscopic photographs.

Stereoscopic photographs are pairs of photographs of an object, taken by two cameras some distance apart. By themselves they are simple photographs. Together, they allow for the calculation of depth of objects. (Anyone who has used an old "Viewmaster" to look at a disk of transparencies has seen the effect.)

A stereoscopic photograph should allow for better identification of objects, because one can tell that items in the photograph are in the same plane or different planes. Items in different planes are probably different objects. Items in the same plane may be the same object, or may be two objects in close proximity. It's not perfect, but it is information.

The objections are, of course, that the entire corpus of inputs must be rebuilt. All of the 2-D photographs used to train ML systems are now invalid. Worse, a new collection of stereoscopic photographs must be taken (not an easy task), stored, classified, and vetted before they can be used.

I recognize the objections to my ideas. I understand that they entail a lot of work.

But I have to ask: is the current method getting us what we want? Because if it isn't, then we need to do something else.

No comments: