2017-05-20

On scripting languages


Foreword


Historically, the difference between scripting languages and real programming languages has been seen as presence or absence of a compilation step. However, from time to time we have seen interpreters for languages that were originally meant to be compiled, and we have also seen compilers for languages that were originally meant to be interpreted. Furthermore, some scripting engines today internally compile to bytecode, while many compiled languages do the same, and in both cases bytecode can be executed either by first translating it to machine code or by directly interpreting it, thus further blurring the distinction. So, compiled vs. interpreted does not seem to be the real differentiating factor between scripting languages and real programming languages. Nonetheless, we can usually tell a scripting language when we see one. So, what is it that we see?


I would suggest that the actual differentiating factor between scripting languages and real programming languages is nothing but the presence or absence of strong typing. In other words, it boils down to the presence or absence of semantic checking. 

The trend of strongly typed languages to be compiled and of weakly typed languages to be interpreted can be explained in full as a consequence of the primary choice of strong vs. weak typing:
  • If a language is strongly typed then semantic errors in the code are detectable, so a compilation step is very useful because it will unfailingly locate all of them before attempting to run.
  • If a language is weakly typed then semantic errors can only be detected by running the code and observing the resulting failures. Compilation can only reveal syntactic errors, which can also be detected by any halfway decent IDE, so the need to parse all of the code in advance is lessened. 
TypeScript is the odd exception to the rule, and this is to be expected, because the impetus for the creation of TypeScript was vastly different from that of other scripting languages: as we shall see further down, most scripting languages come into existence as little toy projects and are invariably one-man efforts. TypeScript was the result of a group effort backed by a big company (Microsoft) starting with the realization that JavaScript is unfortunately here to stay, and setting out specifically to correct one of its major deficiencies, namely the lack of typing.

So, this leaves us with the following postulation:
Real Languages = Strongly Typed = Semantic Checking = Usually compiled.
Scripting Languages = Weakly Typed = No Semantic Checking = Usually interpreted.
And yet, many people tend to like scripting languages, and tend to actually write lots of code in them, supposedly because they are "easier". This brings to mind the famous quote by Edsger W. Dijkstra:
[...] some people found error messages they couldn't ignore more annoying than wrong results, and, when judging the relative merits of programming languages, some still seem to equate "the ease of programming" with the ease of making undetected mistakes.
(From Edsger W. Dijkstra, On the foolishness of "natural language programming". The original context of the quote was Natural Language Programming, but the quote itself refers to programming languages in general.)

Arguments I hear in favor of scripting languages


Argument: It is easy to write code in it; look, the "hello, world!" program is a one-liner.

Rebuttal: What this means is that this scripting language is a very good choice for writing the "hello, world!" program. The ease with which you may write "hello, world!" is no indication whatsoever about the ease with which a non-trivial system may be developed, tested, debugged, maintained, and extended. On the contrary, a scripting language which makes it possible for you to write "hello, world!" in a single line achieves this by introducing a few trade-offs; it offers built-in functionality without the need to explicitly import it, which in turn means that there are identifiers always in scope even when not needed; it does not require code to be placed in classes, which means that it is either not object-oriented, or it mixes paradigms; and it does not require code to be placed in functions, which means that either its syntax is trivial, or again, it mixes paradigms. The moment you write anything non-trivial, you will of course need to be able to import namespaces, to put all code in methods, and to put all methods in classes, so the fact that the language does not require them buys you absolutely nothing.

Argument: No, I mean it is really terse. There are many things besides "hello, world!" that I can write in one line.

Rebuttal: Sure, you can write them in one line. But can you read them? One of the most important aspects of code is readability, (second only to correctness,) but terse code is not necessarily easy to read. (If that was the case, then Perl would be the most readable language ever.) Terseness usually represents a tradeoff between verbosity and complexity. The more terse the language, the less code you have to read, but the more complex the code is, so it is up to debate whether it is more readable, and it certainly tends to be harder to debug. Terseness appears to be the modern trend, so as real programming languages keep evolving they are also receiving features that make them more and more terse. Take lambdas and the fluent style of invocations for example. So, terseness is not the exclusive domain of scripting languages, and to the extent that scripting languages fare better in this domain it is debatable whether it is an advantage or a disadvantage.

Argument: There are lots of libraries for it.

Rebuttal: Seriously? There are more libraries for your scripting language than there are for Java?

Argument: I don't have to compile it; I just write my code and run it.

Rebuttal: I also just write my code and run it. When I hit the "launch" button, my IDE compiles my code in the blink of an eye and runs it. The difference between you and me is that if I have made any semantic mistakes in my code, I will be told so before wasting my time trying to run it. But what am I saying, being told that there are semantic mistakes in your code probably counts as a disadvantage for you, right?

Argument: I am not worried about errors, because I use testing.

Rebuttal: Testing is an indispensable quality assurance mechanism for software, but it does not, in and by itself, guarantee correctness. It is too custom-made, too subjective, and too fragmentary. You can easily forget to test something, you can easily test the wrong thing, and you can easily test "around" a bug, accidentally creating tests that pretty much require the bug to be in place in order to pass. Despite these deficiencies, testing is still very important, but it is nothing more than a weapon in our arsenal against bugs. This arsenal happens to also include another weapon, which is closer to the forefront of the battle against bugs than testing is, and it is comprehensive, generic, 100% objective, and definitive. This weapon is called strong typing. Alas, this hard won realization from times of yore seems to be lost in the modern generation of programmers, who think they are going to re-invent everything.

Argument: It has lots and lots of built-in features.

Rebuttal: Sure, and that's why scripting languages are not entirely useless. If the only thing that matters is to accomplish a certain highly self-contained goal of severely limited scope in as little time as possible, then please, by all means, do go ahead and use your favorite scripting language with its awesome built-in features. However, if the project is bound to take a life of its own, you are far better off investing a couple of minutes to create a project in a real programming language, and to include the external libraries that will give you the same functionality in that language. Built-in features do not only come with benefits; in contrast to libraries, they are much more difficult to evolve, because even a minute change in them may break existing code, resulting in people being reluctant to migrate to the latest version of the language. Furthermore, built-in features usually have to be supported forever, even after better alternatives have been invented, or after they simply go out of style, so over time scripting languages tend to gather lots of unnecessary baggage. We have tried feature bloated programming languages with ADA for example, and the consensus is that they are not the way to go.

Argument: But really, it is so much easier! Look here, in one statement I obtain a list and assign its elements to individual variables!

Rebuttal: That's great, I bet this has slashed your time to market by half. What happens if the number of elements in the list differs from the number of variables that you assign it to? I bet there is no error, because you don't want to be bothered with errors, right? Seriously, my compiled language of choice has its own unique, arcane syntax quirks that I could, if I wanted to, claim that they make things so much easier for me. Some of them are not even that arcane. For example, instead of having to add comments within a method about each one of the typeless arguments that it accepts, explaining what the actual type of the argument is, so that the IDE can parse those comments and provide me with some rudimentary argument type checking, I get to simply declare the type of each argument together with the argument, as part of the syntax of the language! Imagine that!

Argument: It is trendy. It is hip.

No contest here. I can't argue with hipsters.

The problems with scripting languages


The nonsense


I don't need to say much here, just watch "Wat" by Gary Bernhardt from CodeMash 2012, it is only 4 minutes long:  https://www.destroyallsoftware.com/talks/wat

The reason for all this nonsense is that these languages are hacks.

When the foundation that you are working on is a hack, then either anything you will build upon it will in turn be a hack, or you are going to be putting an enormous effort to circumvent the hackiness of the foundation and build something reasonable over it.  Why handicap yourself?

The errors


Lack of semantic checking means that mistakes can be made, which will not be caught at the earliest moment possible, that is, during compilation, or better yet, during editing in any decent IDE. Therefore, lack of semantic checking necessarily means that there will be more errors. 

It is a given fact that a certain percentage of errors will always pass testing and make it to production, which in turn inescapably means that there will be a somewhat increased number of bugs in production.

This alone is enough to classify scripting languages as unsuitable for anything but the most trivial tasks, and the debate should be over right there; we should not need to say anything more.

The handicapped IDE


Lack of semantic checking means that your IDE cannot provide you with many useful features that you get with strongly typed languages. Specifically, you either have limited functionality, or you do not have at all, some or all of the following features:
  1. Context-sensitive auto-completion. Since any parameter to any function can be of any type, the IDE usually has no clue as to which of the variables in scope may be passed as a certain parameter of a certain function and which may not. Therefore, it cannot be smart about suggesting what to auto-complete, so it has to suggest everything that happens to be in scope. Most of these suggestions are preposterous, and some can even be classified as treacherous.
  2. Member Auto-completion. Since any variable can be of any type, the IDE usually has no clue as to what member fields and functions are exposed by any given variable. Therefore, either it cannot give suggestions, or it will suggest every single member of every single known type and the kitchen sink.
  3. Listing all usages of a type. Since any variable can be of any type, the IDE usually has no clue as to where a given type is used, or if it is used at all. Contrast this with strongly typed languages where the IDE can very accurately list all usages of any given type and even provide you with visual clues about unused types.
  4. Type sensitive search. If you have multiple different types where each one of them contains, say, a `Name` member, you cannot search for all references of the `Name` member of only one of those types. You have to use text search, which will yield all irrelevant synonyms in the results. This can be okay in tiny projects, but it very quickly becomes non-viable as the project size increases.
  5. Refactoring. When the IDE has no knowledge of the semantics of your code, it is incapable of performing various useful refactoring operations on it. IDEs that offer refactoring features on untyped languages are actually faking it; they should not be calling it refactoring, they should be calling it cunning search and replace. Needless to say, it is not always correct, it does sometimes severely mess up the code, and you have no way of knowing that your code has been messed up until you try running it, because remember, there is no semantic checking.


That little performance issue


Performance is generally not an issue, for the most part because scripting languages tend to be used in situations where performance is not required. (There are of course some few odd cases where performance matters, and yet a scripting language is chosen, and they do in fact suffer from performance consequences, take node.js for example.) 

In cases where performance matters but the task at hand is relatively isolated, performance is again not an issue for scripting languages because external libraries are quickly developed to handle those tasks. (These external libraries are written in guess what -- real programming languages.)

Having explained that performance is usually not an issue, let us also quickly mention before moving on that on computationally expensive tasks, such as iterating over all pixels of an image to manipulate each one of them, there is no way that a scripting language will perform anywhere close to Java, just as there is no way that Java will perform anywhere close to C++, just as there is no way that C++ will perform anywhere close to Assembly. Stop arguing about this.

The horrible syntax


Most scripting languages suffer from a severe case of capriciously arcane and miserably grotesque syntax. No, beauty is not in the eye of the beholder, and there is only a certain extent up to which aesthetics are subjective. PHP is way past that extent. The syntax of scripting languages tends to suffer either because their priorities are all wrong by design, or because they were hacked together in a weekend without too much thought, or simply due to plain incompetence on behalf of their creators.

Scripting languages that have their priorities wrong are, for example, all the shell scripting languages. Their priorities are wrong by design, because they aim to make strings (filenames) look and feel as if they are identifiers, so that you can type commands without having to enclose them in quotes, as if this convenience was the most important thing ever. If all we ever wanted to do with scripts was to list sequences of programs to execute, then this convenience would perhaps be all that we would care for, but the moment we need to use any actual programming constructs in the scripts, what we have in our hands is a string escaping nightmare of epic proportions.

A scripting language that owes its bad syntax to being hastily hacked together is JavaScript. Brendan Eich, its creator, has admitted that the prototype of JavaScript was developed in 10 days, and that the language was never meant for anything but short snippets. He is honest enough to speak of his own creation in derogatory terms, and to accept blame. (See TEDxVienna 2016, opening statement, "Hello, I am to blame for Javascript".)  Also, pretty much anyone deeply involved with JavaScript will admit that it has serious problems. One of the most highly acclaimed books on the language is Douglas Crockford's JavaScript: The Good Parts by O'Reilly. You can take the title of the book as a hint.

A scripting language that owes its horrific syntax to lack of competence on behalf of its creator is PHP. Rasmus Lerdorf, its creator, is quoted on the Wikipedia article about PHP as saying "I don’t know how to stop it, there was never any intent to write a programming language […] I have absolutely no idea how to write a programming language, I just kept adding the next logical step on the way."

So, from the above it should be obvious that most scripting languages are little pet projects that were created by individuals who simply wanted to prove to themselves that they could actually build something like that, without originally intending them to be used outside their own workbench.

The cheapness


The lack of semantic checking in scripting languages is often not a deliberate design choice, but a consequence of the very limited effort that usually goes into creating them. In many cases the creators of scripting languages would not know how to add semantic checking to the language even if they wanted to. In all cases, the amount of work required to add semantic checking would be orders of magnitude more than the total amount of work that went into the creation of the language in the first place. 

In this sense, the comparison between scripting languages and real programming languages is a lot like comparing children's tinker toy tools with tools for professionals. Sure, a plastic screwdriver is inexpensive, very lightweight and easy to use, but try screwing anything but plastic screws with it.  (I was going to add "you cannot hurt yourself with it", but this analogy does not transfer to programming: you can very easily hurt yourself with a scripting language.)

What scripting languages are good for


Scripting languages are useful when embedded within applications written in real programming languages, mainly as evaluators of user-supplied expressions, or, in the worst case, as executors of user-supplied code snippets.

Scripting languages are useful when shortening the development time from the initial launch of the code editor to the first run of the program is far more important than everything else. Under "everything else" we really mean everything: understandability, maintainability, performance, even correctness.

Scripting languages are useful when the program to be written is so trivial, and its expected lifetime is so short, that it is hardly worth the effort of creating a new folder with a new project file in it. The corollary of this is that if it is worth creating a project for it, then it is worth using a real programming language for it.

Scripting languages are useful when the code to be written is so small and simple that bugs can be detected by simply skimming through the code. The corollary of this is that if the program is to be even slightly complex, it should be written in a real programming language. (Adding insult to injury, many scripting languages tend to have such capricious write-only syntax that it is very hard to grasp what any piece of code does, let alone skim through it and vouch for it being bug-free.)

Conclusion


The fact that some scripting languages catch on and spread like wildfire simply shows how eager the industry is to adopt any contemptible piece of nonsense without any critical thinking whatsoever, as long as it helps solve some immediate problem at hand. It is a truly deplorable situation that kids nowadays learn JavaScript as their first programming language because it is so accessible to them: all you need is a browser, and one day instead of F11 for full-screen you accidentally hit F12 which opens up the developer tools, and you realize that you have an entire development environment sitting right there. The availability of JavaScript to children is truly frightening.

Usually, once a language becomes extremely popular, tools are added that try to lessen the impact of its deficiencies, so today it is possible to have some resemblance of semantic checking when programming in Python or in Javascript, but since the checking has been added as an afterthought, it is always partial, unreliable, hacky, and generally an uphill battle.

So, you might ask, what about the hundreds of thousands of successful projects written in scripting languages? Are they all junk? Do they represent a massive waste of time?  And what about the hundreds of thousands of programmers all over the world who are making extensive use of scripting languages every day and are happy with them? Are they all misguided? Can't they see all these problems? Are they all ensnared in a monstrous collective delusion?

Yep, that's exactly it. You nailed it.






Note: 
This is a draft. 
It may contain inaccuracies or mistakes. 
There are bound to be corrections after I receive some feedback.


---------------------------------------------------------------------
Scratch

See:
http://stackoverflow.com/questions/397418/when-to-use-a-scripting-language

From http://wiki.c2.com/?SeriousVersusScriptingLanguages

Scripting Languages emphasize quickly writing one-off programs
serious languages emphasize writing long-lived, maintainable, fast-running programs.
light-duty "gluing" of components and languages.

From https://danluu.com/empirical-pl/

“I think programmers who doubt that type systems help are basically the tech equivalent of an anti-vaxxer”
the effect isn’t quantifiable by a controlled experiment.
Misinformation people want to believe spreads faster than information people don’t want to believe.

No comments:

Post a Comment