Fitzpatrick's Fabulous Future

Wednesday, April 10, 2019

Program language and program size

Can programs be "too big"? Does it depend on the language?

In the 1990s, the two popular programming languages from Microsoft were Visual Basic and Visual C++. (Microsoft also offered Fortran and an assembler, and I think COBOL, but they were used rarely.)

I used both Visual Basic and Visual C++. With Visual Basic it was easy to create a Windows application, but the applications in Visual Basic were limited. You could not, for example, launch a modal dialog from within a modal dialog. Visual C++ was much more capable; you had the entire Windows API available to you. But the construction of Visual C++ applications took more time and effort. A simple Visual Basic application could be "up and running" in a minute. The simplest Visual C++ application took at least twenty minutes. Applications with dialogs took quite a bit of time in Visual C++.

Visual Basic was better for small applications. They could be written quickly, and changed quickly. Visual C++ was better for large applications. Larger applications required more design and coding (and more testing) but could handle more complex tasks. Also, the performance benefits of C++ were only obtained for large applications.

(I will note that Microsoft has improved the experience since those early days of Windows programming. The .NET framework has made a large difference. Microsoft has also improved the dialog editors and other tools in what is now called Visual Studio.)

That early Windows experience got me thinking: are some languages better at small programs, and other languages better at large programs? Small programs written in languages that require a lot of code (verbose languages) have a disadvantage because of the extra work. Visual C++ was a verbose language; Visual Basic was not -- or was less verbose. Other languages weigh in at different points on the scale of verbosity.

Consider a "word count" program. (That is, a program to count the words in a file.) Different languages require different amounts of code. At the small-program end of the scale we have languages such as AWK and Perl. At the large-end of the scale we have COBOL.

(I am considering lines of code here, and not executable size or the size of libraries. I don't count run-time environments or byte-code engines.)

I would much rather write (and maintain) the word-count program in AWK or Perl (or Ruby or Python). Not because these languages are modern, but because the program itself is small. (Trival, actually.) The program in COBOL is large; COBOL has some string-handling functions (but not many) and it requires a fair amount of overhead to define the program. A COBOL program is long, by design. The COBOL language is a verbose language.

Thus, there is an incentive to build small programs in certain languages. (I should probably say that there is an incentive to build certain programs in certain languages.)

But that is on the small end of the scale of programs. What about the other end? Is there an incentive to build large programs in certain languages?

I believe that the answer is yes. Just as some languages are good for small programs, other languages are good for large programs. The languages that are good for large programs have structures and constructs which help us humans manage and understand the code in large scale.

Over the years, we have developed several techniques we use to manage source code. They include:

Multiple source files (#include files, copybooks, separate compiled files in a project, etc.)
A library of subroutines and functions (the "standard library")
A repository of libraries (CPAN, CRAN, gems, etc.)
The ability to define subroutines
The ability to define functions
Object-oriented programming (the ability to define types)
The ability to define interfaces
Mix-in fragments of classes
Lambdas and closures

These techniques help us by partitioning the code. We can "lump" and "split" the code into different subroutines, functions, modules, classes, and contexts. We can define rules to limit the information that is allowed to flow between the multiple "lumps" of a system. Limiting the flow of information simplifies the task of programming (or debugging, or documenting) a system.

Is there a point when a program is simply "too big" for a language?

I think there are two concepts lurking in that question. The first is a relative answer, and the second is an absolute answer.

Let's start with a hypothetical example. A mind experiment, if you will.

Let's imagine a program. It can be any program, but it is small and simple. (Perhaps it is "Hello, world!") Let's pick a language for our program. As the program is small, let's pick a language that is good for small programs. (It could be Visual Basic or AWK.)

Let's continue our experiment by increasing the size of our program. As this was a hypothetical program, we can easily expand it. (We don't have to write the actual code -- we simply expand the code in our mind.)

Now, keeping our program in mind, and remembering our initial choice of a programming language, let us consider other languages. Is there a point when we would like to switch from our chosen programming language to another language?

The relative answer applies to a language when compared to a different language. In my earlier example, I compared Visual Basic with Visual C++. Visual Basic was better for small programs, Visual C++ for large programs.

The exact point of change is not clear. It wasn't clear in the early days of Windows programming, either. But there must be a crossover point, where the situation changes from "better in Visual Basic" to "better in Visual C++".

The two languages don't have to be Visual Basic and Visual C++. They could be any pair. One could compare COBOL and assembler, or Java and Perl, or Go and Ruby. Each pair has its own crossover point, but the crossover point is there. Each pair of languages has a point in which it is better to select the more verbose language, because of its capabilities at managing large code.

That's the relative case, which considers two languages and picks the better of the two. Then there is the absolute case, which considers only one language.

For the absolute case, the question is not "Which is the better language for a given program?", but "Should we write a program in a given language?". That is, there may be some programs which are too large, too complex, too difficult to write in a specific programming language.

Well-informed readers will be aware that a program written in a language that is "Turing complete" can be translated into any other programming language that is also "Turing complete". That is not the point. The question is not "Can this program be written in a given language?" but "Should this program be written in a given language?".

That is a much subtler question, and much more subjective. I may consider a program "too big" for language X while another might consider it within bounds. I don't have metrics for such a decision -- and even if I did, one could argue that my cutoff point (a complexity value of 2000, say) is arbitrary and the better value is somewhat higher (perhaps 2750). One might argue that a more talented team can handle programs that are larger and more complex.

Someday we may have agreed-upon metrics, and someday we may have agreed-upon cutoff values. Someday we may be able to run our program through a tool for analysis, one that computes the complexity and compares the result to our cut-off values. Such a tool would be an impartial judge for the suitability of the programming language for our task. (Assuming that we write programs that are efficient and correct in the given programming language.)

Someday we may have all of that, and the discipline to discard (or re-design) programs that exceed the boundaries.

But we don't have that today.

Thursday, March 28, 2019

Spring cleaning

Spring is in the air! Time for a general cleaning.

An IT shop of any significant size will have old technologies. Some folks will call them "legacy applications". Other folks try not to think about them. But a responsible manager will take inventory of the technology in his (or her) shop and winnow out those that are not serving their purpose (or are posing threats).

Here are some ideas for tech to get rid of:

Perl: I have used Perl. When the alternatives were C++ and Java, Perl was great. We could write programs quickly, and they tended to be small and easy to read. (Well, sort of easy to read.)

Actually, Perl programs were often difficult to read. And they still are difficult to read.

With languages such as Python and Ruby, I'm not sure that we need Perl. (Yes, there may be a module or library that works only with Perl. But they are few.)

Recommendation: If you have no compelling reason to stay with Perl, move to Python.

Visual Basic and VB.NET: Visual Basic (the non-.NET version), is old and difficult to support. It will only become older and more difficult to support. It does not fit in with web development -- much less cloud development. VB.NET has always been a second chair to C#.

Recommendation: Migrate from VB.NET to C#. Migrate from Visual Basic to anything (except Perl).

Any version of Windows other than Windows 10: Windows 10 has been with us for years. There is no reason to hold on to Windows 8 or Windows 7 (or Windows Vista).

If you have applications that can run only on Windows 7 or Windows 8, you have an application that will eventually die.

You don't have to move to Windows 10. You can move some applications to Linux, for example. If people are using only web-based applications, you can issue them ChromeBooks or low-end Windows laptops.

Recommendation: Replace older versions of Windows with Windows 10, Linux, or Chrome OS.

CVS and Subversion: Centralized version control systems require administration, which translates into expense. Distributed version control systems often cost less to administer, once you teach people how to use them. (The transition is not always easy, and the conversion costs are not zero, but in the long run the distributed systems will cost you less.)

Recommendation: Move to git.

Everyone has old technology. The wise manager knows about it and decides when to replace it. The foolish manager ignores the old technology, and often replaces it when forced to by external events, and not at a time of his choosing.

Be a wise manager. Take inventory of your technology, assess risk, and build a plan for replacements and upgrades.

Wednesday, March 27, 2019

Something new for programming

One of my recent side projects involved R and R Studio. R is a programming language, an interpreted language with powerful data manipulation capabilities.

I am not impressed with R and I am quite disappointed with R Studio. I have ranted about them in a previous post. But in my ... excitement ... of R and R Studio, I missed the fact that we have something new in programming.

That something new is a new form of IDE, one that has several features:

on-line (cloud-based)
mixes code and documentation
immediate display of output
can share the code, document, and results

I call this new model the 'document, code, results, share' model. I suppose we could abbreviate it to 'DCRS', but even that short form seems a mouthful. It may be better to stick to "online IDE".

R Studio has a desktop version, which you install and run locally. It also has a cloud-based version -- all you need is a browser, an internet connection, and an account. The online version looks exactly like the desktop version -- something that I think will change as the folks at R Studio add features.

R Studio's puts code and documentation into the same file. R Studio uses a variant of Markdown language (named 'Rmd').

The concept of comments in code is not new. Comments are usually short text items that are denoted with special markers ('//' in C++ and '#' in many languages). The model has always been: code contains comments and the comments are denoted by specific characters or sequences.

Rmd inverts that model: You write a document and denote the code with special markers ('$$' for TeX and '```' for code). Instead of comments (small documents) in your code, you have code in your document.

R Studio runs your code -- all of it or a section that you specify -- and displays the results as part of your document. It is smart enough to pick through the document and identify the code.

The concept of code and documentation in one file is not exclusive to R Studio. There are other tools that do the same thing: Jupyter notebooks, Mozilla's Iodide, and Mathematica (possibly the oldest of the lot). Each allow for text and code, with output. Each also allow for sharing.

At a high level, these online IDEs do the same thing: Create a document, add code, see the results, and share.

Over the years, we've shared code through various means: physical media (punch cards, paper tape, magnetic tape, floppy disk), shared storage locations (network disks), and version-control repositories (CVS, Subversion, Github). All of these methods required some effort. The new online-IDEs reduce that effort; no need to attach files to e-mail, just send a link.

There are a few major inflection points in software development, and I believe that this is one of them. I expect the concept of mixing text and code and results will become popular. I expect the notion of sharing projects (the text, the code, the results) will become popular.

I don't expect all programs (or all programmers) to move to this model. Large systems, especially those with hard performance requirements, will stay in the traditional compile-deploy-run model with separate documentation.

I see this new model of document-code-results as a new form of programming, one that will open new areas. The document-code-results combination is a good match for sharing work and results, and is close in concept to academic and scientific journals (which contain text, analysis, and results of that analysis).

Programming languages have become powerful, and that supports this new model. A Fortran program for simulating water in a pipe required eight to ten pages; the Matlab language can perform the same work in roughly half a page. Modern languages are more concise and can present their ideas without the overhead of earlier computer languages. A small snippet of code is enough to convey a complex study. This makes them suitable for analysis and especially suitable for sharing code.

It won't be traditional programmers who flock to the document-code-results-share model. Instead it will be non-programmers who can use the analysis in their regular jobs.

The online IDE supports a project with these characteristics:

small code
multiple people
output is easily visualizable
sharing and enlightenment, not production

We've seen kind of change before. It happened with the early microcomputers and first PCs, when "civilians" (that is, people other than professional programmers) bought computers and taught themselves programming in BASIC. It happened slightly later with spreadsheets, when other "civilians" bought computers and taught themselves Visicalc and Lotus 1-2-3 (and later, Excel). The "spreadsheet revolution" made computing available to many non-programmers, with impressive results. The "online IDE" could do the same.

Tuesday, March 19, 2019

C++ gets serious

I'm worried that C++ is getting too ... complicated.

I am not worries that C++ is a dead language. It is not. The C++ standards committee has adopted several changes over the years, releasing new C++ standards. C++11. C++14. C++17 is the most recent. C++20 is in progress. Compiler vendors are implementing the new standards. (Microsoft has done an admirable job in their latest versions of their C++ compiler.)

But the changes are impressive -- and intimidating. Even the names of the changes are daunting:

contracts, with preconditions and postconditions
concepts
transactional memory
ranges
networking
modules
concurrency
coroutines
reflection
spaceship operator

Most of these do not mean what you think they mean (unless you have been reading the proposed standards). The spaceship operator is the familiar to anyone who has worked in Perl or Ruby. The rest may sound familiar but are quite specific in their design and use, and it is probably very different from your first guess.

Here is an example of range, which simplifies the common "iterate over a collection" loop:

int array[5] = { 1, 2, 3, 4, 5 };
for (int& x : array)
x *= 2;

This is a nice improvement. Notice that it does not use STL iterators; this is pure C++ code.

Somewhat more complex is an implementation of the spaceship operator:

template
struct pair {
T t;
U u;

auto operator<=> (pair const& rhs) const
-> std::common_comparison_category_t<
decltype(std::compare_3way(t, rhs.t)),
decltype(std::compare_3way(u, rhs.u)>
{
if (auto cmp = std::compare_3way(t, rhs.t); cmp != 0)
return cmp;

return std::compare3_way(u, rhs.u);
}
}

That code seems... not so obvious.

The non-obviousness of code doesn't end there.

Look at two functions, one for value types and one for all types (value and reference types):

For simple value types, for our two functions, we can write the following code:

std::for_each(vi.begin(), vi.end(), [](auto x) { return foo(x); });

The most generic form:

#define LIFT(foo) \
[](auto&... x) \
noexcept(noexcept(foo(std::forward(x)...))) \
-> decltype(foo(std::forward(x)...)) \
{ return foo(std::forward(x)...); }

I will let you ponder that bit of "trivial" code.

Notice that the last example uses the #define macro to do its work, with '\' characters to continue the macro across multiple lines.

* * *

I have been pondering that code (and more) for some time.

- C++ is becoming more capable, but also more complex. It is now far from the "C with Classes" that was the start of C++.

- C++ is not obsolete, but it is for applications with specific needs. C++ does offer fine control over memory management and can provide predictable run-time performance, which are advantages for embedded applications. But if you don't need the specific advantages of C++, I see little reason to invest the extra effort to learn and maintain C++.

- Development work will favor other languages, mostly Java, C#, Python, JavaScript, and Go. Java and C# have become the "first choice" languages for business applications; Python has become the "first choice" for one's first language. The new features of C++, while useful for specific applications, will probably discourage the average programmer. I'm not expecting schools to teach C++ as a first language again -- ever.

- There will remain a lot of C++ code, but C++'s share of "the world of code" will become smaller. Some of this is due to systems being written in other languages. But I'm willing to bet that the total lines of code for C++ (if we could measure it) is shrinking in absolute numbers.

All of this means that C++ development will become more expensive.

There will be fewer C++ programmers. C++ is not the language taught in schools (usually) and it is not the language taught in the "intro to programming" courses. People will not learn C++ as a matter of course; only those who really want to learn it will make the effort.

C++ will be limited to the projects that need the features of C++, projects which are larger and more complex. Projects that are "simple" and "average" will use other languages. It will be the complicated projects, the projects that need high performance, the projects that need well-defined (and predictable) memory management which will use C++.

C++ will continue as a language. It will be used on the high end projects, with specific requirements and high performance. The programmers who know C++ will have to know how to work on those projects -- amateurs and dabblers will not be welcome. If you are managing projects, and you want to stay with C++, be prepared to hunt for talent and be prepared to pay.

Thursday, March 14, 2019

A Relaxed Waterfall

We're familiar with the two development methods: Waterfall and Agile. Waterfall operates in a sequence of large steps: gather requirements, design the system, build the system, test the system, and deploy the system; each step must wait for the prior step to complete before it starts. Agile uses a series of iterations that each involve specifying, implementing and testing a new feature.

Waterfall's advantage is that it promises delivery on a specific date. Agile makes no such promise, but instead promises that you can always ship whatever you have built.

Suppose there was a third method?

How about a modified version of Waterfall: the normal Waterfall but no due date -- no schedule.

This may seem a bit odd, and even nonsensical. After all, the reason people like Waterfall is the big promise of delivery on a specific date. Bear with me.

If we change Waterfall to remove the due date, we can build a very different process. The typical Waterfall project runs a number of phases (analysis, design, coding, etc.) and there is pressure to, once a phase has been completed, to never go back. One cannot go back; the schedule demands that the next phase begin. Going back from coding, say, because you find ambiguities in the requirements, means spending more time in the analysis phase and that will (most likely) delay the coding phase, which will then delay the testing phase, ... and the delays reach all the way to the delivery date.

But if we remove the delivery date, then there is no pressure of missing the delivery date! We can move back from coding to analysis, or from testing to coding, with no risk. What would that give us?

For starters, the process would be more like Agile development. Agile makes no promise about a specific delivery date, and neither does what I call the "Relaxed Waterfall" method.

A second effect is that we can now move backwards in the cycle. If we complete the first phase (Analysis) and start the second phase (Design) and then find errors or inconsistencies, we can move back to the first phase. We are under no pressure to complete the Design phase "on schedule" so we can restart the analysis and get better information.

The same holds for the shift from Design to the third phase (Coding). If we start coding and find ambiguities, we can easily jump back to Design (or even Analysis) to resolve questions and ensure a complete specification.

While Relaxed Waterfall may sound exactly like Agile, it has differences. We can divide the work into different teams, one team handling each phase. You can have a team that specializes in analysis and the documentation of requirements, a second team that specializes in design, a third team for coding, and a fourth team for testing. The advantage is that people can specialize; Agile requires that all team members know how to design, code, test, and deploy a product. For large projects the latter approach may be infeasible.

This is all speculation. I have not tried to manage a project with Relaxed Waterfall techniques. I suspect that my first attempt might fail. (But then, early attempts with traditional Waterfall failed, too. We would need practice.) And there is no proof that a project run with Relaxed Waterfall would yield a better result.

It was merely an interesting musing.

But maybe it could work.

Monday, March 4, 2019

There is no Linux desktop

Every year, Linux enthusiasts hope that the new year will be the "year of the Linux desktop", the year that Linux dethrones Microsoft Windows as the chief desktop operating system.

I have bad news for the Linux enthusiasts.

There is no Linux desktop.

More specifically, there is not one Linux desktop. Instead, there is a multitude. There are multiple Linux distributions ("distros" in jargon) and it seems that each has its own ideas about the desktop. Some emulate Microsoft Windows, in an attempt to make it easy for people to convert from Windows to Linux. Other distros do things their own (and presumably better) way. Some distros focus on low-end hardware, others focus on privacy. Some focus on forensics, and others are tailored for tinkerers.

Distributions include: Debian, Ubuntu, Mint, SuSE, Red Hat, Fedora, Arch Linux, Elementary, Tails, Kubuntu, CentOS, and more.

The plethora of distributions splits the market. No one distribution is the "gold standard". No one distribution is the leader.

Here's what I consider the big problem for Linux: The split market discourages some software vendors from entering it. If you have a new application, do you support all of the distros or just some? Which ones? How do you test all of the distros that you support? What do you do with customers who use distros that you don't support?

Compared to Linux, the choice of releasing for Windows and macOS is rather simple. Either you support Windows or you don't. (And by "Windows" I mean "Windows 10".) Either you support mac OS or you don't. (The latest version of mac OS.) Windows and macOS each provide a single platform, with a single installation method, and a single API. (Yes, I am simplifying here. Windows has multiple ways to install an application, but it is clear that Microsoft is transitioning to the Universal app.)

I see nothing to reduce the number of Linux distros, so this condition will continue. We will continue to enjoy the benefits of multiple Linux distributions, and I believe that to be good for Linux.

But it does mean that the Evil Plan to take over all desktops will have to wait.

Wednesday, February 27, 2019

R shows that open source permits poor quality

We like to think that open source projects are better than closed source projects. Not just cheaper, but better -- higher quality, more reliable, and easier to use. But while high quality and reliability and usability may be the result of some open source projects, they are not guaranteed.

Consider the R tool chain, which includes the R interpreter, the Rmd markdown language, the Rstudio IDE, and commonly used models built in R. All of these are open source, and all have significant problems.

The R interpreter is a large and complicated program. It is implemented in multiple programming languages: C, C++, Fortran -- and Java seems to be part of the build as well. To build R you need compilers for all of these languages, and you also need lots of libraries. The source code is not trivial; it takes quite a bit of time to compile the R source and get a working executable.

The time for building R concerns me less than the mix of languages and the number of libraries. R sits on top of a large stack of technologies, and a problem in any piece can percolate up and become a problem in R. If one is lucky, R will fail to run; if not, R will run and use whatever data happens to be available after the failure.

The R language itself has problems. It uses one-letter names for common functions ('t' to transpose a matrix, 'c' to combine values into a list) which means that these letters are not available for "normal" variables. (Or perhaps they are, if R keeps variables and functions in separate namespaces. But even then, a program would be confusing to read.)

R also suffers from too many data containers. One can have a list, which is different from a vector, which is different from a matrix, which is different from a data frame. The built-in libraries all expect data of some type, but woe to he that uses one when a different structure is expected. (Some functions do the right thing, and others complain about a type mismatch.)

Problems are not confined to the R language. The Rmd markdown language is another problem area. Rmd is based on Markdown, which has problems of its own. Rmd inherits these problems and adds more. A document in Rmd can contain plain text, markdown for text effects such as bold and underline, blocks of R code, blocks of Tex. Rmd is processed into regular Markdown, which is then processed into the output form of your choice (PDF, HTML, MS-Word, and a boatload of other formats).

Markdown allows you to specify line breaks by typing two space characters at the end of a line. (Invisible markup at the end of a line! I thought 'make' had poor design with TAB characters at the front of lines.) Markdown also allows you to force a line break with a backslash at the end of a line, which is at least visible -- but Rmd removes this capability and requires the invisible characters at the end of a line.

The Rstudio IDE is perhaps the best of the different components of R, yet it too has problems. It adds packages as needed, but when it does it displays status messages in red, a color usually associated with errors or warnings. It allows one to create documents in R or Rmd format, asking for a name. But for Rmd documents, the name you enter is not the name of the file; it is inserted into the file as part of a template. (Rmd documents contain metadata in the document, and a title is inserted into the metadata.) When creating an Rmd document in Rstudio, you have to start the process, enter a name to satisfy the metadata, see the file displayed, and then save the file -- Rstudio then asks you (again) for a name -- but this time it is for the file, not the metadata.

The commonly used models (small or not-so-small programs written in R or a mix of languages) is probably the worst area of the R ecosystem. The models can perform all sorts of calculations, and the quality of the models ranges from good to bad. Some models, such as those for linear programming, use variables and formulas to specify the problem you want solved. But the variables of the model are not variables in R; do not confuse the two separate things with the same name. There are two namespaces (one for R and one for the model) and each namespace holds variables. The programmer must mentally keep variables sorted. Referring to a variable in the wrong namespace yields the expected "variable not found" error.

Some models have good error messages, others do not. One popular model for linear programming, upon finding a variable name that has not been specified for the model's namespace, simply reports "A variable was specified that is not part of the model." (That's the entire message. It does not report the offending name, nor even a program line number. You have to hunt through your code to find the problem name.)

Given the complexity of R, the mix of languages in Rmd, the foibles of Rstudio, and the mediocre quality of commonly used extensions to R, I can say that R is a problem. The question is, how did this situation arise? All of the components are open source. How did open source "allow" such poor quality?

Its a good question, and I don't have a definite answer. But I do have an idea.

We've been conditioned to think of open source as a way to develop quality projects from the commonly known successful open source projects: Linux, Perl, Python, Ruby, and OfficeLibre. These projects are well-respected and popular with many users. They have endured; each is over ten years old. (There are other well-respected open source projects, too.)

When we think of these projects, we see success and quality. Success and quality is the result of hard work, dedication, and a bit of luck. These projects had all of those elements. Open source, by itself is not enough to force a result of high quality.

These successful projects have been run by developers, and more importantly, for developers. That is certainly true of the early, formative years of Linux, and true for any open source programming language. I suspect that the people working on OfficeLibre are primarily developers.

I believe that this second concept does not hold for the R ecosystem. I suspect that the people working on the R language, the Rmd markdown format, and especially the people building the commonly used models are first and foremost data scientists and analysts, and developers second. They are building R for their needs as data scientists.

(I omit Rstudio from this list. It appears that Rstudio is built to be a commercial endeavor, which means that their developers are paid to be developers. It makes status messages in red even more embarrassing.)

I will note that the successful open source projects have had an individual as a strong leader for the project. (Linux has Linus Torvalds, Perl has Larry Wall, etc.) I don't see a strong individual leading the R ecosystem or any component. These projects are -- apparently -- run by committee, or built by separate teams. It is a very Jeffersonian approach to software development, one which may have an effect on the quality of the result. (Again, this is all an idea. I have not confirmed this.)

Where does this leave us?

First, am reluctant to trust any important work to R. There are too many "moving pieces" for my taste -- too many technologies, too many impediments to good code, too many things that can go wrong. The risks outweigh the benefits.

Second, in the long term, we may move away from R. The popularity of R is not the R language, it is the ability to create (or use) linear programming models. Someone will create a platform for analysis, with the ability to define and run linear programming models. It will "just work", to borrow a phrase from Apple.

Moving away from R and the current toolchain is not guaranteed. The current tools may have achieved a "critical mass" of acceptance in the data science community, and the cost of moving to a different toolchain may viewed as unacceptable. In that case, the data science community can look forward to decades of struggles with the tools.

The real lesson is that open source does not guarantee quality software. R and the models are open source, but the quality is... mediocre. High quality requires more than just open source. Open source may be, as the mathematicians say, "necessary but not sufficient". We should consider this when managing any of our projects. Starting a project as open source will not guarantee success, nor will converting an existing project to open source. A successful project needs more: capable management, good people, and a vision and definition of success.