Fitzpatrick's Fabulous Future: JVM

Showing posts with label JVM. Show all posts

Wednesday, October 7, 2020

Platforms are not neutral

We like to think that our platforms (processor, operating system and operating environment) are neutral, at least when it comes to programming languages. After all, why should a processor care if the code that it executes (machine code) was generated by a compiler for C#, or Java, or Fortran, or Cobol? Ditto for the operating system. Does Windows care of the code was from a C++ program? Does Linux care of the code came from Go?

And if processors and operating systems don't care about code, why should a platform such as .NET or the JVM care about code?

One could argue that the JVM was designed for Java, and that it has the data types and operations that are needed for Java programs and not other languages. That argument is correct: the JVM was built for Java. Yet people have built compilers that convert other languages to the JVM bytecode. That list includes Clojure, Lisp, Kotlin, Groovy, JRuby, and Jython. All of those languages run on the JVM. All of those languages use the same data types as Java.

The argument for .NET is somewhat different. The .NET platform was designed for multiple languages. When announced, Microsoft supplied not only the C# compiler but also compilers for C++, VB, and Visual J++. Other companies have added compilers for many other languages.

But those experiences do not mean that the platforms are unbiased.

The .NET platform, and specifically the Common Language Runtime (CLR) was about interoperability. The goal was to allow programs written in different languages to work together. For example, to call a function in Visual Basic from a function in C++.

To achieve this interoperability, the CLR requires languages to use a common set of data types. These common types include 32-bit integers, 64-bit floating-point numbers, and strings. Prior to .NET, the different language compilers from Microsoft all had different ideas about numeric types and string types. C and C++ user null-terminated strings, but Visual Basic used counter-in-front strings. (Thus, a string in Visual Basic could contain NUL characters, but a string in C or C++ could not.) There were differences with floating-point numeric values also.

Notice that these types are aligned with the C data types. The CLR, and the agreement on data types, works for languages that use data types that match C's data types. The .NET version of Visual Basic (VB.NET) had to change its data types in order to comply with the rules of the CLR. Thus, VB.NET was quite a bit different from the previous Visual Basic.

The CLR works for languages that use C-style data types. The CLR supports custom data types, which is nice, and necessary for languages that do not use C-style data types, but then one loses interoperability, and interoperability was the major benefit of .NET.

The .NET platform favors C-style data types. (Namely integers, floating point, and NUL-terminated strings.)

The JVM also favors C-style data types.

Many languages use C-style data types.

What languages don't use C-style data types?

Cobol, for one. Cobol was developed prior to C, and it has its own ideas about data. It allows numeric values with PICTURE clauses, which can define limits and also formatting. Some examples:

05 AMOUNT1 PIC 999.99.
05 AMOUNT2 PIC 99999.99.
05 AMOUNT3 PIC 999999.99.
05 AMOUNT4 PIC 99.99.

(The '05' at the front of each line is not a part of the variable, but indicates how the variable is part of a larger structure.)

These four different values are numeric, but they do not align well with any of the CLR types. Thus, they cannot be exchanged with other programs in .NET.

There are compilers for Cobol that emit .NET modules. I don't know how they work, but I suspect that they either use custom types (which are not easily exchanged with modules from other languages) or they convert the Cobol-style data to a C-style value (which would incur a performance penalty).

Pascal has a similar problem with data types. Strings in Pascal are length-count strings, not NUL-terminated strings. Pascal has "sets" which can contain a set of values. The notion of a set translates poorly to other languages. C, C++, Java, and C# can use enums to do some of the work, but sets in Pascal are not quite enums.

Pascal also has definite ideas about memory management and pointers, and those ideas do not quite align with memory management in C (or .NET). With care, one can make it work, but Pascal is not a native .NET language any more than Cobol.

Fortran is another language that predates the .NET platform, and doesn't work well on it. Fortran is a simpler language that Cobol or Pascal, and concentrates on numeric values. The numeric types can convert to the CLR numeric types, so compiling and exchanging data is possible.

Fortran's strength was speed. It was (and still is) one of the fastest languages for numeric processing. Its speed is due to its static memory layout, something that I have not seen in compilers for Fortran to .NET modules. Thus, Fortran on .NET loses its advantage. Fortran on .NET is not fast, and I fear it never will be.

Processors, too, are biased. Intel processors handle binary numeric values for integers and floating-point values, but not BCD values. IBM S/360 processors (and their descendants) can handle BCD data. (BCD data is useful for financial transactions because it avoids many issues with floating-point representations.)

Our platforms are biased. We often don't see that bias, most likely because with use only a single platform. (A single processor type, a single operating system, a single programming language.) The JVM and .NET platforms are biased towards C-style data types.

There are different approaches to data and to computation, and we're limiting ourselves by limiting our expertise. I suspect that in the future, developers will rediscover the utility of data types that are not C-style types, especially the PICTURE-specified numeric types of Cobol. As C and its descendants are ill-equipped to handle such data, we will see new languages with the new (old) data types.

Saturday, January 28, 2017

A multitude of virtual machines

In IT, the term "virtual machine" has multiple meanings. We use the term to identify pretend servers with pretend disk space and pretend devices, all hosted on a real (or "physical") server. Cloud computing and even plain old (non-cloud) data centers have multiple instances of virtual machines.

We also use the term to identify the pretend processors used by various programming languages. Java has its JVM, C# has the .NET processor, and other languages have their own imaginary processors. It is an old concept, made popular by Java in the mid-1990s but going back to the 1980s with the UCSD p-System and even into the 1960s.

The two types of virtual machines are complementary. The former duplicates the hardware (usually for a PC) and provides virtual instances of everything in a computer: disk, graphics card, network card, USB and serial ports, and even a floppy disk (if you want). The one thing it doesn't virtualize is the processor; the hypervisor (the program controlling the virtual machines) relies on the physical processor.

The latter is a fictitious processor (with a fictitious instruction set) that is emulated by software on the physical processor. It has no associated hardware, and the term "virtual processor" might have been a better choice. (I have no hope of changing the name now, but I will use the term for this essay.)

It is the virtual processor that interests me. Or rather, it is the number of virtual processors that exist today.

We are blessed (or cursed) with a large number of virtual processors. Oracle's Java uses one called "JVM". Microsoft uses one called "CLR" (for "common language runtime"). Perl uses a virtual processor (two, actually; one for Perl 5 and a different one for Perl 6). Python uses a virtual processor. Ruby, Erlang, PHP, and Javascript all use virtual processors.

We are awash in virtual processors. It seems that each language has its own, but that's not true. The languages Groovy, Scala, Clojure, Kotlin, JRuby, and Jython all run on JVM. Microsoft's CLR runs C#, F#, VB.NET, IronPython, and IronRuby. Even BEAM, the virtual processor for Erlang, supports "joxa", "lfe", "efene", "elixir", "eml", and others.

I will point out that not every language uses a virtual processor. C, C++, Go, and Swift all produce executable code. Their code runs on the real processor. While more efficient, an executable is bound to the processor instruction set, and you must recompile to run on a different processor.

But back to virtual processors. We have a large number of virtual processors. And I have to think: "We've been here before".

The PC world long ago settled on the Intel x86 architecture. Before it did, we had a number of processors, from Intel (the 8080, 8085, 8086, and 8088), Zilog (the Z-80), Motorola (the 6800, 6808, and 6809), and MOS (the 6502).

The mainframe world saw many processors, before the rise of the IBM System/360 processor. Its derivatives are now the standard for mainframes.

Will we converge on a single architecture for virtual processors? I see no reason for such convergence in the commercial languages. Oracle and Microsoft have nothing to gain by adopting the other's technology. Indeed, one using the other would make them beholden to the competition for improvements and corrections.

The open source community is different, and may see convergence. An independent project, providing support for open source languages, may be possible. It may also make sense, allowing the language maintainers to focus on their language-specific features and remove the burden of maintaining the virtual processor. An important factor in such a common virtual processor is the interaction between the language and the virtual processor.

Open source has separated and consolidated other items. Sometimes we settle on a single solution, sometimes multiple. The kernel settled on Linux. The windowing system settled on X and KDE. The file system. The compiler back end.

Why not the virtual processor?

Tuesday, December 9, 2014

Open source .NET is less special and more welcoming

The Microsoft "toolchain" (the CLR, the .NET framework libraries, and the C# compiler) were special. They were Microsoft's property, guarded jealously and subject to Microsoft's whims. They were also the premiere platform and tools for development in Windows and for Windows. If you were serious about application development (for Windows), you used the Microsoft tools.

There were other toolchains. The Java set includes the JVM and the Java compiler. The major scripting languages (Perl, Python, Ruby, and PHP) each have their own runtime engines and class libraries. None were considered special in the way that the Microsoft toolchain was special. (The other toolchains were -- and still are -- considered good, and some people considered them superior to the Microsoft toolchain, but even most non-Microsoft fans would admit that the Microsoft toolchain was of high quality.)

Microsoft's announcement to open the .NET framework and the C# compiler changes that status. Microsoft wants to expand .NET to the Linux and MacOS platforms. They want to expand their community of developers. All reasonable goals; Microsoft clearly sees opportunities beyond the Windows platform.

What interests me is my reaction to the announcement. For me, opening the .NET framework and moving it to other platforms reduces the "specialness" of .NET. The Microsoft toolchain becomes just another toolchain. It is no longer the acknowledged leader for development on Windows.

The demotion of the Microsoft toolchain is accompanied by a promotion of the Java toolchain. Before, the Microsoft toolchain was the "proper" way to develop applications for Windows. Now, it is merely one way. Before, the Java toolchain was the "rebel" way to develop applications for Windows. Now, it is on par with the Microsoft toolchain.

I feel more comfortable developing a Java application to run on Windows. I also feel comfortable developing an application in .NET to run on Windows or Linux. (Yes, I know that the Linux version of .NET is not quite ready. But I'm comfortable with the idea.)

I think other folks will be comfortable with the idea. Comfortable enough to start experimenting with the .NET framework as people have experimented with the Java toolchain. Folks have created new languages to run under the JVM. (Clojure, Scala, and Groovy are popular ones, and there are lots of obscure ones.) I suspect that people avoided experimenting with the Microsoft toolchain because they feared changes or repercussions from Microsoft. Perhaps we will see experiments with the CLR and the .NET framework. (Perhaps new versions of the IronPython and IronRuby projects, too.)

By opening their toolchain, Microsoft has made it more accessible, technically and psychologically. They have reduced the barriers to innovation. I'm looking forward to the results.

Sunday, November 7, 2010

Where Apple is falling behind

Apple is popular right now. It has a well-received line of products, from MacBooks to iPhones to iPads. It has easy-to-use software, from OSX to iTunes and GarageBand. it has beaten Microsoft and Google in the markets that it chooses.

Yet in one aspect, Apple is falling behind.

Apple is using real processor technology, not the virtual processors that Microsoft and Google use. By "virtual processors", I mean the virtual layers that separate the application code from the physical processor. Microsoft has .NET with its virtual processor, its IL code, and its CLR. Google uses Java and Python, and those languages also have the virtual processing layer.

Most popular languages today have a virtual processing layer. Java uses its Java Virtual Machine (JVM). Perl, Python, and Ruby use their virtual processors.

But Apple uses Objective-C, which compiles to the physical processor. In this, Apple is alone.

Compiling to physical processor has the advantage of performance. The virtual processors of the JVM and .NET (and Perl, and Python...) impose a performance cost. A lot of work has been done to reduce that cost, but the cost is still there. Microsoft's use of .NET for its Windows Mobile offerings means higher demands for processor power and higher draw from the battery. An equivalent Apple product can run with less power.

Compiling to a virtual processor also has advantages. The virtual environments can be opened to debuggers and performance monitors, something not possible with a physical processor. Therefore, writing a debugger or a performance monitor in a virtual processor environment is easier and less costly.

The languages which use virtual processors all have the ability for class introspection (or reflection, as some put it). I don't know enough about Objective-C to know if this is possible, but I do know that C and C++ don't have reflection. Reflection makes it easier to create unit tests and perform some type checking on code, which reduces the long-term cost of the application.

The other benefit of virtual processors is freedom from the physical processor, or rather from the processor line. Programs written to the virtual processor can run anywhere, with the virtual processor layer. This is how Java can run anywhere: the byte-code is the same, only the virtual processor changes from physical processor to physical processor.

The advantages of performance are no longer enough to justify a physical processor. Virtual processors have advantages that help developers and reduce the development costs.

Is it possible that Apple is working on a virtual processor of its own? The technology is well within their abilities.

I suspect that Apple would build their own, rather than use any of the existing virtual processors. The two biggies, .NET and Java, are owned by companies not friendly to Apple. The engines for Perl, Python, and Ruby are nice but perhaps not suitable to the Apple set of applications. An existing engine is not in Apple's best interests. They need their own.

Apple doesn't need the virtual processor engine immediately, but they will need it soon -- perhaps within two years. But there is more to consider.

Apple has pushed the Objective-C, C, and C++ languages for its development platform. For the iPhone, iPod, and iPad, it has all but banished other languages and technologies. But C, C++, and Objective-C are poorly suited for virtual processors. Apple will need a new language.

Given Apple's desire for reliable (that is, crash-free) applications, the functional languages may appeal to them. Look for a move from Objective-C to either Haskell or Erlang, or possibly a new language developed at Apple.

It will be a big shift for Apple, and their developers, but in the long run beneficial.

Fitzpatrick's Fabulous Future