Wednesday, April 22, 2020
Three levels of Python programming
The first level is what we typically think of as programming in Python. It is writing Python code. This is the impression one gets when one has an "introduction to Python" class. The first program of "Hello, World" is written in Python, as are the successive programs in the class. Programs become more complex, with the addition of functions and later classes to organize larger and larger programs.
In this level, all of the code is Python. It is Python from top to bottom. And it works, for simple applications.
For some applications, it is not "Python all the way down". Some applications are complex. They must manage large quantities of data, and perform a significant number of calculations, and they must do it quickly. A Python-only solution is not a satisfactory solution, because Python is interpreted and slow.
At this point, programmers include carefully-constructed modules that perform calculations quickly. The modules "numpy" and "scipy" are the common modules, but there are many.
This is the second level of programming in Python. It is not often thought of as "programming in Python" or even "programming". It is more often though of as "importing modules and using the classes and functions in those modules".
That mindset makes sense. This work is less about Python and more about knowing which modules are available and which functions they provide. The task of programming is different; instead of writing all of the code, one assembles a solution from pre-packaged modules and uses Python to connect the various pieces.
That is why I think of it as a second level of programming. It is a different type of programming, a different type of thinking. It is not "how can I write code?" but instead "what existing code will perform this computation?".
Which brings us to the third level.
The third level of Python programming is building your own module. The existing Python modules, if they do what you need, are fast and effective. But if they do not do what you need, then they are not helpful.
Writing your own solution in Python will result is a slow program -- perhaps unacceptably slow. Therefore, as a last resort, one writes one's own module (in C or C++) and imports it into the main Python program.
This is, purists will argue, programming not in Python but in C or C++. They have a point -- it is writing C or C++ code.
But when the objective is to build a system to perform a specific task, and the top layer of the application is written in Python, then one can argue that the C code is merely an extension of the same application.
Or, one can think of the task as creating a system in multiple modules and multiple languages, not a single program in a single programming language, and using the best language for each piece of the system.
Python programming (or systems development) is often less about coding in a particular language and more about solving problems. With Python, we have three levels at which we can solve those problems.
Wednesday, February 28, 2018
Backwards compatibility but not for Python
Most languages are designed and changes are carefully constructed to avoid breaking older programs. This is a tradition from the earliest days of programming. New versions of FORTRAN and COBOL were introduced with new features, yet the newer compilers accepted the older programs. (Probably because the customers of the expensive computers would be very mad to learn that an "upgrade" had broken their existing programs.)
Since then, almost every language has followed this tradition. BASIC, Pascal, dBase (and Clipper and XBase), Java, Perl, ... they all strove for (and still strive for) backwards compatibility.
The record is not perfect. A few exceptions do come to mind:
- In the 1990s, multiple releases of Visual Basic broke compatibility with older versions as Microsoft decided to improve the syntax.
- Early versions of Perl changed syntax. Those changes were Larry Wall deciding on improvements to the syntax.
- The C language changed syntax for the addition-assignment and related operators (from =+ to +=) which resolved an ambiguity in the syntax.
- C++ broke compatibility with a scoping change in "for" statements. That was its only such change, to my knowledge.
But there is one language that is an exception.
That language is Python.
Python has seen a number of changes over time. I should say "Pythons", as there are two paths for Python development: Python 2 and Python 3. Each path has multiple versions (Python 2.4, 2.5, 2.6, and Python 3.4, 3.5, 3.6, etc.).
The Python 3 path was started as the "next generation" of Python interpreters, and it was started with the explicit statement that it would not be compatible with the Python 2 path.
Not only are the two paths different (and incompatible), versions within each path (or at least the Python 3 path) are sometimes incompatible. That is, some things in Python 3.6 are different in Python 3.7.
I should point out that the changes between versions (Python 3.6 and 3.7, or even Python 2 and 3) are small. Most of the language remains the same across versions. If you know Python 2, you will find Python 3 familiar. (The familiarity may cause frustration as you stumble across one of the compatibility-breaking changes, though.)
Should we care? What does it mean for Python? What does it mean for programming in general?
One could argue that changes to a programming language are necessary. The underlying technology changes, and programming languages must "keep up". Thus, changes will happen, either in many small changes or one big change. The latter often is a shift away from one programming language to another. (One could cite the transition from FORTRAN to BASIC as computing changed from batch to interactive, for example.)
But that argument doesn't hold up against other evidence. COBOL, for example, has been popular for transaction processing and remains so. C and C++ have been popular for operating systems, device drivers, and other low-level applications, and remain so. Their backwards-compatible growth has not appreciably diminished their roles in development.
Other languages have gained popularity and remain popular too. Java and C# have strong followings. They, too, have not been hurt by backwards-compatibility.
Python is an opportunity to observe the behavior of the market. We have been working on the assumption that backwards-compatibility is desired by the user base. This assumption may be a false one, and the Python approach may be a good start to observe the true desires of the market. If successful (and Python is successful, so far) then we may see other languages adopt the "break a few things" philosophy for changes.
Of course, there may be some demand for languages that keep compatibility across versions. It may be a subset of the market, something that isn't visible with only one language breaking compatibility, but only visible when more languages change their approach. If that is the case, we may see some languages advertising their backwards-compatibility as a feature.
Who knows? It may be that the market demand for backwards-compatibility may come from Python users. As Python gains popularity (and it is gaining popularity), more and more individuals and organizations build Python projects, they may find Python's approach unappealing.
Let's see what happens!
Monday, June 22, 2015
Static languages for big and dynamic languages for small
Now that we have the disclaimer out of the way...
I have found that I write programs differently in dynamically-typed languages than I do in statically-typed languages.
There are many differences between the two language sets. C++ is a big jump up from C. Java and C# are, in my mind, a bit simpler than C++. Python and Ruby are simpler than Java and C# -- yet more powerful.
Putting language capabilities aside, I have examined the programs I have written. I have two distinct styles, one for statically-typed languages and a different style for dynamically typed languages. The big difference in my two styles? The names of things.
Programs in any language need names. Even the old Dartmouth BASIC needs names for variables, and one can argue that with the limited namespace (one letter and one optional digit) one must give more thought to names in BASIC than in any other language.
My style for statically-type languages is to name variables and functions with semantic names. Names for functions are usually verb phrases, describe the action performed by the function. Names for objects usually describe the data contained by the variable.
My style for dynamically-typed languages is different. Names for functions typically describe the data structure that is returned by the function. Names for variables typically describe the data structure contained by the variable (or referenced by it, if you insist).
Perhaps this difference is due to my familiarity with the older statically-typed languages. Perhaps it is due to the robust IDEs for C++, Java, and C# (for Python and Ruby I typically use a simple text editor).
I find dynamically-typed languages much harder to debug than statically-typed languages. Perhaps this is due to the difference in tools. Perhaps it is due to my unfamiliarity with dynamically-typed languages. But perhaps it is simply easier to analyze and debug statically-typed languages.
If that is true, then I can further state that it is better to use a statically-typed languages for large projects. It may also be better to use a dynamically-typed language for smaller programs. I'm not sure how large 'large' is in this context, nor am I sure how small 'small' is. Nor do I know the cross-over point, at which it is better to switch from one to the other.
But I think it is worth thinking about.
Actually, I tend to write FORTRAN in all programming languages. But that is another matter.
Friday, December 26, 2014
Google, Oracle, and Java
Apple has a cozy walled garden for its technology: Apple devices running Apple operating systems and Apple-approved apps written in Apple-controlled languages (Objective-C and now Swift).
Microsoft is building a walled garden for its technology. Commodity devices with standards set by Microsoft, running Microsoft operating systems and apps written in Microsoft-controlled languages (C#, F#, and possibly VB.NET). Microsoft does not have the same level of control over applications as Apple; desktop PCs allow anyone with administrator privileges to install any app from any source.
Google has a walled garden for its technology (Android), but its control is less than that of Apple or Microsoft. Android runs on commodity hardware, with standards set by Google. Almost anyone can install apps on their Google phone or tablet. And interestingly, the Android platform apps run in Java, a language controlled by... Oracle.
This last aspect must be worrying to Google. Oracle and Google have a less than chummy relationship, with lawsuits about the Java API. Basing a walled garden on someone else's technology is risky.
What to do? If I were Google, I would consider changing the language for the Android platform. That's not a small task, but the benefits may outweigh the costs. Certainly their current apps would have to be re-written for the New language. A run-time engine would have to be included in Android. The biggest task would be convincing the third-party developers to change their development process and their existing apps. (Some apps may never be converted.)
Which language to pick? That's an easy call. It should be a language that Google controls: Dart or Go. Dart is designed as a replacement for JavaScript, yet could be used for general applications. Go is, in my opinion, the better choice. It *is* designed for general applications, and includes support for concurrency.
A third candidate is Python. Google supports Python in their App Engine cloud platform, so they have some familiarity with it. No one company controls it (Java was controlled by Sun prior to Oracle) so it is unlikely to be purchased.
Java was a good choice for launching the Android platform. I think the languages Go and Python are better choices for Android now.
Let's see what Google thinks.
Sunday, August 17, 2014
Reducing the cost of programming
Costs include infrastructure (disk space for compiler, memory) and programmer training (how to write programs, how to compile, how to debug). Notice that the load on the programmer can be divided into three: infrastructure (editor, compiler), housekeeping (declarations, memory allocation), and business logic (the code that gets stuff done).
Symbolic assembly code was better than machine code. In machine code, every instruction and memory location must be laid out by the programmer. With a symbolic assembler, the computer did that work.
COBOL and FORTRAN reduced cost by letting the programmer not worry about the machine architecture, register assignment, and call stack management.
BASIC (and time-sharing) made editing easy, eliminated compiling, and made running a program easy. Results were available immediately.
Today we are awash in programming languages. The big ones today (C, Java, Objective C, C++, BASIC, Python, PHP, Perl, and JavaScript -- according to Tiobe) are all good at different things. That is perhaps not a coincidence. People pick the language best suited to the task at hand.
Still, it would be nice to calculate the cost of the different languages. Or if numeric metrics are not possible, at least rank the languages. Yet even that is difficult.
One can easily state that C++ is more complex than C, and therefore conclude that programming in C++ is more expensive that C. Yet that's not quite true. Small programs in C are easier to write than equivalent programs in C++. Large programs are easier to write in C++, since the ability to encapsulate data and group functions into classes helps one organize the code. (Where 'small' and 'large' are left to the reader to define.)
Some languages are compiled and some that are interpreted, and one can argue that a separate step to compile is an expense. (It certainly seems like an expense when I am waiting for the compiler to finish.) Yet languages with compilers (C, C++, Java, C#, Objective-C) all have static typing, which means that the editor built into an IDE can provide information about variables and functions. When editing a program written in one of the interpreted languages, on the other hand, one does not have that help from the editor. The interpreted languages (Perl, Python, PHP, and JavaScript) have dynamic typing, which means that the type of a variable (or function) is not constant but can change as the program runs.
Switching from an "expensive" programming language (let's say C++) to a "reduced cost" programming language (perhaps Python) is not always possible. Programs written in C++ perform better. (On one project, the C++ program ran for several hours; the equivalent program in Perl ran for several days.) C and C++ let one have access to the underlying hardware, something that is not possible in Java or C# (at least not without some add-in trickery, usually involving... C++.)
The line between "cost of programming" and "best language" quickly blurs, and nailing down the costs for the different dimensions of programming (program design, speed of coding, speed of execution, ability to control hardware) get in our way.
In the end, I find that it is easy to rank languages in the order of my preference rather than in an unbiased scheme. And even my preferences are subject to change, given the nature of the project. (Is there existing code? What are other team members using? What performance constraints must we meet?)
Reducing the cost of programming is really about trade-offs. What capabilities do we desire, and what capabilities are we willing to cede? To switch from C++ to C# may mean faster development but slower performance. To switch from PHP to Java may mean better organization of code through classes but slower development. What is it that we really want?
Monday, May 19, 2014
The shift to cloud is bigger than we think
Our programs ran "under" (or "on top of") an operating system. Our programs were also fussy -- they would run on one operating system and only that operating system. (I'm ignoring the various emulators that have come and gone over time.)
The operating system was the "target", it was the "core", it was the sun around which our programs orbited.
So it is rather interesting that the shift to cloud computing is also a shift away from operating systems.
Not that cloud computing is doing away with operating systems. Cloud computing coordinates the activities of multiple, usually virtualized, systems, and those systems run operating systems. What changes in cloud computing is the programming target.
Instead of a single computer, a cloud system is composed of multiple systems: web servers, database servers, and message queues, typically. While those servers and queues must run on computers (with operating systems), we don't care about them. We don't insist that they run any specific operating system (or even use a specific processor). We care only that they provide the necessary services.
In cloud computing, the notion of "operating system" fades into the infrastructure.
As cloud programmers, we don't care if our web server is running Windows. Nor do we care if it is running Linux. (The system administrators do care, but I am taking a programmer-centric view.) We don't care which operating system manages our message queues.
The level of abstraction for programmers has moved from operating system to web services.
That is a significant change. It means that programmers can focus on a higher level of work.
Hardware-tuned programming languages like C and C++ will become less important. Not completely forgotten, but used only by the specialists. Languages such as Python, Ruby, and Java will be popular.
Operating systems will be less important. They will be ignored by the application level programmers. The system architects and sysadmins, who design and maintain the cloud infrastructure, will care a lot about operating systems. But they will be a minority.
The change to services is perhaps not surprising. We long ago shifted away from processor-specific code, burying they work in our compilers. COBOL and FORTRAN, the earliest languages, were designed to run on different processors. Microsoft insulated us from the Windows API with MFC and later the .NET framework. Java separated us from the processor with its virtual machine. Now we take the next step and bury the operating system inside of web services.
Operating systems won't go away. But they will become less visible, less important in conversations and strategic plans. They will be more of a commodity and less of a strategic advantage.
Tuesday, July 31, 2012
... or the few
I suppose that is has been since the Eldar days, when programming meant wiring plug-boards. Programming languages and tools have been changing since the first symbolic assemblers.
There have been two types of changes ("improvements"?) for programming: changes that benefit the individual programmer and changes that benefit the team.
Changes that benefit the individual include:
- FORTRAN
- Unix, which let a programmer use many tools and even build his own
- Integrated development environments
- Interactive debuggers
- Faster processors and increased memory
- Unit tests (especially automated unit tests)
- Diagnostic programs such as 'lint'
- Managed environments with garbage collection (eliminating the need for 'delete' operations)
- Early forms of version control systems
Changes that benefit the team include:
- COBOL, with its separation of concerns and support for discrete modules
- Early operating systems that scheduled jobs and managed common resources
- The 'patch' program, which allowed for updates without transmitting the entire source
- Network connections (especially internet connections)
- Code reviews
- Distributed version control systems
- github
One can argue that any of these technologies help individuals and teams. A better debugger may directly help the programmer, yet it helps the team by making the developer more productive. Some technologies are better for the individual, and others better for the team. But that's not the point.
The point is that we, as an industry, look to improve performance for individuals and teams, not just individuals. We look at teams as more than a collection of individuals. If we considered teams nothing more than a group of people working on the same project, we would focus our efforts on individual performance and nothing else. We would develop better tools for individuals, and not for teams.
Some languages are designed for teams, or at least have aspects designed for teams. The Python and Go languages have strong ideas about code style; Python enforces rules for indentation and Go rules for bracket placement. They are not the first; Visual Basic would auto-space and auto-capitalize your source code. That may have been a function of the editor, but since the editor and interpreter were distributed together one can consider the action part of the language.
Python's indentation rules, Go's bracket rules, and Visual Basic's auto-capitalization are all beneficial to the individual programmer, but they are more beneficial to the team. They enforce style upon the source code. Such enforced style ensures that all members of the team (indeed, all members of those programming communities) use the same style. Programmers can more easily move from one project to another, and from code contributed by one person to another. Some programming teams () using other languages) enforce local styles, but these languages have it built into their DNA.
Enforced style is a move against the "cowboy coder", the rebel programmer who insists on doing things "his way". Programming languages are not a democracies -- team members don't get to vote on the rules of the language that they are using -- but it is a step towards government.
Let's count that as maturation.
Saturday, January 7, 2012
Predictions for 2012
Here are my predictions for computing in the coming year: