2017-05-20

On scripting languages

Teething rings (pacifiers) found on the great interwebz.

Foreword

Historically, the difference between scripting languages and real programming languages has been understood as the presence or absence of a compilation step. However, in recent decades the distinction has blurred; from time to time we have seen:

  • Interpreters for languages that were originally meant to be compiled.
  • Compilers for languages that were originally meant to be interpreted.
  • Scripting engines internally converting source code to bytecode before interpreting it.
  • Real languages compiling to bytecode which is then mostly interpreted and rarely converted to machine code.

So, compiled vs. interpreted does not seem to be the real differentiating factor; nonetheless, we can usually tell a scripting language when we see one. So, what is it that we see?

(Useful pre-reading: About these papers)

First, let us identify the three different kinds of error that can potentially occur in program code:

  • Syntax Error: this represents a violation of fundamental rules governing the form of the language; for example, in most programming languages the statement [a = ;] is a syntax error, because something is obviously missing after the equals sign and before the semicolon.
  • Semantic Error: this represents failure to respect the meaning of things; for example, in most languages the statement [a = "x" / 5;] is syntactically correct but semantically incorrect, because dividing a string by a number does not make sense. Similarly, the statement [a.increment();] may represent a semantic error if object 'a' has no method called 'increment'.
  • Logic Error: this corresponds to a mistake in our reasoning. For example, the statement [circumference = radius * π] can be correct both syntactically and semantically, but it is nonetheless flawed, because this is not how you calculate a circumference given a radius; the correct formula also involves a multiplication by 2.

From the three types of error that we have identified, the first and the last are unaffected by our choice of programming language: 

  • Syntax error will be detected by any halfway decent IDE regardless of whether we are using a scripting language or a real programming language.
  • Logic error is just as easy to make in any programming language, and the way we protect ourselves against it is by writing copious amounts of software tests.

Semantic Error is where different languages take vastly different approaches. This type of error is closely associated with the concept of data types: the expression ["x" / 5] is flawed because the left operand is of type string, while the right operand is of a numeric type. If the programming language is strongly typed, then semantic mistakes such as this one will always be detected during compilation, so there is never any danger of attempting to run (or ship to the customer) a program containing this kind of mistake; however, if the programming language is weakly typed, then a mistake of this kind will go undetected until an attempt is made to execute that statement, at which point there will be either a runtime error, or some other severe malfunction.

In light of the above, I would suggest that the actual differentiating factor between real programming and scripting languages is nothing but the presence or absence of semantic checking, in other words the use of strong vs. weak typing.

TypeScript is the odd exception to the rule, and this is to be expected, because the impetus for the creation of TypeScript was vastly different from that of other scripting languages, which tend to be one-man efforts, and usually come into existence as nothing more than toy projects. In contrast, TypeScript was the result of a deliberate group effort backed by a big company (Microsoft) starting with the realization that JavaScript is unfortunately here to stay, and setting out specifically to correct one of its major deficiencies, namely the lack of strong typing.

The trend of real programming languages to be compiled and of scripting languages to be interpreted can be explained in full as a consequence of the primary choice of strong vs. weak typing:

  • If a language is strongly typed, then a compilation step is very useful to have, because it will unfailingly locate all errors that are detectable via static semantic analysis before attempting to run. 
  • If a language is weakly typed, then semantic errors are undetectable, so there is no need to parse code in advance. A compilation step would only reveal syntactic errors, which can also be detected by any halfway decent IDE.

So, allowing for the exception of TypeScript, this leaves us with the following soft rule:

Real languages are strongly typed, employ semantic checking, and are therefore usually compiled.

Scripting languages are weakly typed, lack semantic checking, and are therefore usually interpreted.

And yet, many people like scripting languages, and write lots of code in them, supposedly because they are "easier". This brings to mind the famous quote by Edsger W. Dijkstra:

[...] some people found error messages they couldn't ignore more annoying than wrong results, and, when judging the relative merits of programming languages, some still seem to equate "the ease of programming" with the ease of making undetected mistakes.

(From Edsger W. Dijkstra, On the foolishness of "natural language programming".)

Note that the above quote is from a paper about Natural Language Programming (NLP) but the particular passage containing the quote pertains to programming languages in general. Dijkstra wrote against NLP back in the 1980s because at that time it was being considered by some fools as a viable prospect; luckily, it failed to catch on, (or naturally, if you would permit the pun,) but little did ol' Edsger know that in the decades that would follow his nightmares would come true, because scripting languages did catch on. Apparently, people love making undetected mistakes.

Various arguments I keep hearing in favor of scripting languages

Argument: It is easy to write code in it; look, the "hello, world!" program is a one-liner.

Rebuttal: What this means is that this scripting language is a very good choice, possibly even the ideal choice, for writing the "hello, world!" program. 

The ease with which you may write "hello, world!" is no indication whatsoever about the ease with which a non-trivial system may be collaboratively developed, tested, debugged, maintained, and extended.

Argument: No, I mean it is really terse. There are many things besides "hello, world!" that I can write in one line.

Rebuttal: Sure, you can write them in one line; but can you read them? 

One of the most important aspects of code is readability, (second only to correctness,) but terse code is not necessarily easy to read; if that was the case, then Perl would be the most readable language ever, but instead it enjoys the dubious distinction of being the least readable among all programming languages in general use.

Terseness usually represents a tradeoff between verbosity and understandability: the more terse the code, the less of it you have to read, but also the harder it is to untangle its complexity. Thus, it is debatable whether terseness correlates with readability. Terseness appears to be the modern trend, so as real programming languages keep evolving they are also receiving features that make them more and more terse, for example tuples, lambdas, the fluent style of invocations, etc. So, terseness is not the exclusive domain of scripting languages, and to the extent that scripting languages go further in this regard it is debatable whether it is an advantage or a disadvantage.

Argument: There are lots of libraries for it.

Rebuttal: Seriously? There are more libraries for your scripting language than there are for Java?

Argument: I don't have to compile it; I just write my code and run it.

Rebuttal: I also just write my code and run it. When I hit the "launch" button, my IDE compiles my code in the blink of an eye and runs it. The difference between you and me is that if I have made any semantic mistakes in my code, I will be told so before wasting my time trying to run it. But what am I saying, being told that there are semantic mistakes in your code probably counts as a disadvantage for you, right?

The ability to just write your code and run it without any semantic checking is causing real harm in scripting languages because it prevents them from evolving. This is, for example, a reason why Python version 2.x is still enjoying widespread use despite the language having moved on to version 3.x by now: people are afraid to make the transition to version 3.x in existing projects, even though it is mostly backwards compatible with version 2.x, because it is not 100% compatible, and lack of semantic checking means that there is no way of knowing which lines of code will break unless these lines get executed.

Argument: I can modify my program as it runs.

Rebuttal: I can also modify my program as it runs; the ability to do this is available in most real programming languages, and it is called "edit and continue" or "hot reload" depending on the language; look it up.

Modification of running code is not always applicable in real programming languages, and it does not always work, but then again nor does it always work when you modify running code in a scripting language, because usually, you already have data structures in memory that were created by the code before it was modified. In real programming languages, you are prevented from making edits to running code that would seriously foul things up; in scripting languages, you are allowed to do whatever you please, and the catastrophic consequences of doing so are your own problem.

Argument: I do not like to have to declare the type of every single variable because it is a pain.

Rebuttal: This is akin to arguing against seat belts because putting them on and taking them off is a pain. Do you have any idea of what kind of pain you are looking at if you get in a traffic accident without a seat belt?

Furthermore, the ability to not have to declare the type of every single variable is not the exclusive privilege of scripting languages, because in recent years type inference has been gaining ground in real programming languages, allowing us to omit declaring the type of many of the variables that we use. The difference is that in real programming languages this is done right, by means of type inference instead of type ostrichism:

  • Type inference is deterministic extra work that the compiler does for us, and it relies on having already assigned specific types to other variables, so that we do not have to repeat things that are already known to, or can be inferred by, the compiler.
  • Type ostrichism is scripting language programmers preferring to not see types and to not deal with types, as if that will make the types go away.
It might be worth taking a look at PEP 483 (https://peps.python.org/pep-0483/) where the people responsible for the advancement of Python are acknowledging that behind the scenes every variable is of course of a specific type, and discussing the potential benefits of adding a type annotation system to the language which will allow programmers to make their intentions about types explicit, so as to be able to at least partially, and at least as an afterthought, enjoy some of the benefits of strong typing. I quote:
These annotations can be used to avoid many kind of bugs, for documentation purposes, or maybe even to increase speed of program execution.

Argument: I am not worried about errors, because I use testing.

Rebuttal: Oh really? Are your tests achieving even a mere 60% code coverage as we speak? And supposing that they do, how do you feel about the fact that in the remaining 40%, every single line is liable to break due to reasons as trivial and yet as common as a typo? 

Testing is an indispensable quality assurance mechanism for software, but it does not, in and by itself, guarantee correctness. You can easily forget to test something, and you can easily test "around" a bug, essentially creating tests that pretty much require the bug to be in place in order to pass. Despite these deficiencies, testing is still very important, but it is nothing more than a weapon in our arsenal against bugs. This arsenal also happens to include another weapon, which is closer to the forefront in the battle against bugs, and it is 100% objective, and definitive. This weapon is called strong typing.

Argument: It has lots and lots of built-in features.

Rebuttal: Sure, and that's why scripting languages are not entirely useless. If the only thing that matters is to accomplish a certain highly self-contained goal of severely limited scope in as little time as possible, then please, by all means, do go ahead and use your favorite scripting language with its awesome built-in features. However, if the project is bound to take a life of its own, you are far better off investing a couple of minutes to create a project in a real programming language, and to include the external libraries that will give you any extra features that you might need.

Built-in features do not only come with benefits; in contrast to libraries, they are much more difficult to evolve, because even a minute change in them may break existing code, resulting in people being reluctant to migrate to the latest version of the language. (Take the Python 2.x vs. 3.x conundrum for example.)

Furthermore, built-in features usually have to be supported forever, even after better alternatives have been invented, or after they simply go out of style and fall out of grace, so over time scripting languages tend to gather lots of unnecessary baggage. We have tried feature-bloated programming languages before, (with ADA for example,) and the consensus is that they are not the way to go.

Argument: But really, it is so much easier! Look here, in one statement I obtain a list and assign its elements to individual variables!

Rebuttal: That's great, I bet this has slashed your time-to-market by half. What happens if the number of elements in the list differs from the number of variables that you decompose it into? I bet there is no error, because you do not like being bothered with errors, right? 

In any case, my compiled language of choice has its own unique, arcane syntax quirks that I could, if I wanted to, claim that they make things so much easier for me. 

Some of them are not even that arcane; for example, instead of using clunky annotations to hint to the IDE the types of my variables, so that it can then provide me with some rudimentary type checking, I get to simply declare the type of each variable as part of the actual syntax of the language! Imagine that!

Argument: I like dynamic typing. It gives me freedom.

Rebuttal: Yes, freedom to shoot yourself in the foot. Also please note that there is no such thing as "dynamic" typing; this term is just a euphemism invented by scripting language aficionados to down-play the detrimental nature of this practice. The proper term is weak typing.

Argument: I do not need type safety. I am better off without it.

Rebuttal: Right. So, you are the technological equivalent of an anti-vaxxer. (Credit: danluu)

Argument: I do not have to use an IDE, I can just use my favorite text editor.

Rebuttal: Oh sure. You are also the technological equivalent of an Amish farmer.

Argument: My scripting language is trendy. It is hip.

No contest here. I can't argue with hipsters.

The problems with scripting languages

1. The nonsense

I don't need to say much here, just watch the legendary "Wat" video by Gary Bernhardt from CodeMash 2012, it is only 4 minutes long: https://www.destroyallsoftware.com/talks/wat

The reason for all this nonsense is that all these languages are hacks.

When the foundation that you are working on is a hack, then either anything you will build on top of it will in turn be a hack, or you are going to be putting an enormous effort to circumvent the hackiness of the foundation and build something reasonable over it. Why handicap yourself?

2. The errors

Lack of semantic checking means that the mistakes that will inevitably be made will not be caught by a compilation step. Therefore, lack of semantic checking necessarily means that there will be more errors.

It is an established fact that a certain percentage of errors will always pass testing and make it to production, which in turn inescapably means that there will be a somewhat increased number of bugs in production.

This alone is enough to classify scripting languages as unsuitable for anything but tinkering, and the debate should be over right there.

3. The crippled IDE

Lack of semantic checking means that your IDE cannot provide you with many useful features that you get with strongly typed languages. Specifically, you either have limited functionality, or you do not have at all, some or all of the following features:

  1. Context-sensitive argument auto-completion. Since any parameter to any function can be of any type, the IDE usually has no clue as to which of the variables in scope may be passed to a certain parameter of a certain function. Therefore, it has to suggest everything that happens to be in scope. Most of these suggestions are preposterous, some are even treacherous.

  2. Member Auto-completion. Since a variable does not have a specific type, the IDE usually has no clue as to what member fields and functions are exposed by that variable. Therefore, either it cannot give any suggestions, or it has to suggest every single member of every single known type and the kitchen sink.

  3. Listing all usages of a type. Since any variable can be of any type, the IDE usually has no clue as to where a given type is used, or if it is used at all. Contrast this with strongly typed languages where the IDE can very accurately list all usages of any given type and even provide you with visual clues about unused types.

  4. Type sensitive search. If you have multiple different types where each one of them contains, say, a `Name` member, you cannot search for all references of the `Name` member of only one of those types. You have to use text search, which will yield all irrelevant synonyms in the results. This can be okay in tiny projects, but it very quickly becomes non-viable as the project size increases.

  5. Refactoring. When the IDE has no knowledge of the semantics of your code, it is incapable of performing various useful refactoring operations on it. IDEs that nonetheless offer some limited set of refactoring features on untyped languages are actually faking it; they should not be calling it refactoring, they should be calling it Cunning Search and Replace. Needless to say, it does not always work as intended, and it does sometimes severely mess up the code. (When this happens, it is called Search and Destroy.) Furthermore, since there is no compiler, you have no way of knowing that a line of code has been messed up until that line of code gets executed, which is something that may happen very rarely for some lines of code.

4. That little performance issue

Performance is generally not an issue for scripting languages, because they tend to be used in situations where performance is not required. 

(There are of course some situations where people opt to use a scripting language despite the fact that performance matters, and in those situations people do in fact suffer the consequences of poor performance, take web servers written in node.js for example.) 

In today's world where the majority of personal computers are running on precious battery power, it can be argued that even the tiniest bit of performance matters, but we can let that one slide, since battery technology is constantly improving.

In cases where performance matters but the task at hand is well-defined and relatively isolated, performance is again not an issue for scripting languages because external libraries tend to be quickly developed to handle those tasks. (These external libraries are written in guess what: real programming languages.)

Having explained that performance is usually not an issue, let us also quickly mention before moving on that on computationally expensive tasks, such as iterating over all pixels of an image to manipulate each one of them, and assuming a competent programmer in each language, the following statements hold true:

  • there is no way that a scripting language will perform as well as Java, just as:
  • there is no way that Java will perform as well as C++, just as:
  • there is no way that C++ will perform as well as Assembly. 

Stop arguing about this.

5. The horrible syntax

Most scripting languages suffer from a severe case of capriciously arcane and miserably grotesque syntax. No, beauty is not in the eye of the beholder, and there is only a certain extent up to which aesthetics are subjective.  

The syntax of scripting languages tends to suffer due to various reasons, the most common being:

  • Their priorities are all wrong to begin with.
  • They were hastily hacked together in a very short amount of time.
  • Plain incompetence on behalf of their creators.

Scripting languages that have their priorities wrong are, for example, all the shell scripting languages. These languages aim to make strings (filenames) look and feel as if they are identifiers, so that you can type commands without having to enclose them in quotes, as if the convenience of not having to use quotes was the most important thing ever. If all we want to do in a shell script is to list a sequence of commands to execute, then this convenience is perhaps all we care for, but the moment we try to use any actual programming construct, like variables and flow control statements, what we have in our hands is a string-escaping nightmare of epic proportions.

A scripting language that owes its bad syntax to being hastily hacked together is JavaScript. Brendan Eich, its creator, has admitted that JavaScript was developed within a couple of weeks, and that the language was not meant for anything but short isolated snippets of code. He is honest enough to speak of his own creation in derogatory terms, and to accept blame. (See TEDxVienna 2016, opening statement, "Hello, I am to blame for JavaScript".)  Also, pretty much anyone deeply involved with JavaScript will admit that it has serious problems. One of the most highly acclaimed books on the language is JavaScript: The Good Parts, authored by Douglas Crockford and published by O'Reilly; you can take the title of the book as a hint.

A scripting language that owes its horrific syntax to lack of competence is PHP. Its creator, Rasmus Lerdorf, is quoted on the Wikipedia article about PHP as saying "I don’t know how to stop it, there was never any intent to write a programming language […] I have absolutely no idea how to write a programming language, I just kept adding the next logical step on the way."

So, from the above it should be obvious that most scripting languages are little toy projects that were created by individuals who simply wanted to prove that they could build something like that, without actually intending it to be used outside of their own workbench.

6. The cheapness

The lack of semantic checking in scripting languages is usually not a deliberate design choice, but instead a consequence of the very limited effort that has gone into creating them. In many cases the creators of scripting languages would not know how to add semantic checking to the language even if they wanted to. In all cases, the amount of work required to add semantic checking would have been several orders of magnitude greater than the total amount of work that went into the creation of the language in the first place.

In this sense, the comparison between scripting languages and real programming languages is a lot like comparing children's tinker toy tools with tools for professionals: sure, a plastic screwdriver is inexpensive, lightweight and easy to use, but try screwing anything but plastic screws with it.  

(I was going to also add "you cannot hurt yourself with it", but this analogy does not transfer to programming: you can very easily hurt yourself with a scripting language.)

What scripting languages are good for

  • Scripting languages used to be an easy way to write cross-platform software. This does not hold true anymore, since most major real programming languages are pretty much cross-platform nowadays.
  • Scripting languages are useful when embedded within applications, (applications written in real programming languages,) as evaluators of user-supplied expressions. (E.g. spreadsheet cell formulas.)
  • Scripting languages are useful when shortening the time from the moment you fire up the code editor to the moment you first run your program is more important than everything else. By "everything else" we really mean everything: understandability, maintainability, performance, even correctness.
  • Scripting languages are useful when the program to be written is so trivial, and its expected lifetime is so short, that it is hardly worth the effort of creating a new folder with a new project file in it. The corollary to this is that if it is worth creating a project for it, then it is worth using a real programming language.
  • Scripting languages are useful when the code to be written is so small and simple that bugs can be detected by simply skimming through the code. The corollary to this is that if the program is to be even slightly complex, it should be written in a real programming language. (Adding insult to injury, many scripting languages tend to have such a cryptic write-only syntax that it is very hard to grasp what any piece of code does, let alone skim through it and vouch for it being bug-free.)
  • The most important thing about scripting languages (and the main reason why they have become so wildly popular in recent years) is that they are useful in getting non-programmers into programming as quickly as possible.
    Most of us programmers have had a friend, who was not a programmer, and who one day asked us how to get into programming. The thought process should be familiar: you think about it for a moment, you start making a mental list of things they would need in order to get started with a real programming language, and you quickly change your mind and suggest that they try Python, because this answer stands some chance of fitting within our friend's attention span. However, the truth of the matter is that this recommendation will only save our friend from maybe a few hours of preparatory work, and it would be a crime if it condemns them to thousands of hours wasted over the course of a several year long career due to the use of an inferior programming language. This brings us to the following realization:
    Scripting languages are a lot like teething rings (pacifiers):
    It is okay to start with one; you must get rid of it as soon as you grow some teeth.

Conclusion

The fact that some scripting languages catch on and spread like wildfire simply shows how eager the industry is to adopt any contemptible piece of nonsense without any critical thinking whatsoever, as long as it helps optimize some short-sighted concern, such as how to get non-programmers into programming as quickly as possible. It is a truly deplorable situation that kids nowadays learn JavaScript as their first programming language due to it being so accessible to them: all you need is a web browser, and one day instead of F11 for full-screen you accidentally hit F12 which opens up the developer tools, and you realize that you have an entire integrated development environment for JavaScript sitting right there, ready to use. The availability of JavaScript to small children is frightening.

Usually, once a language becomes extremely popular, tools are created to lessen the impact of its deficiencies. Thanks to the herculean efforts of teams that develop scripting engines, and through all kinds of sorcery being done under the hood in these engines, the most popular scripting languages are considerably faster today than they used to be. However, the sorcery is not always applicable, even when it is applicable it is imperfect, and besides, it incurs a penalty of its own, so scripting languages will never match the performance of real programming languages. Also, modern IDEs have evolved to provide some resemblance of semantic checking in some scripting languages, but since this checking has been added as an afterthought, it is always partial, unreliable, hacky, and generally an uphill battle.

So, you might ask, what about the hundreds of thousands of successful projects written in scripting languages? Are they all junk? And what about the hundreds of thousands of programmers all over the world who are making extensive use of scripting languages every day and are happy with them? Are they all misguided? Can't they see all these problems? Are they all ensnared in a monstrous collective delusion?

Yep, that's exactly it. You took the words from my mouth.

mandatory grumpy cat meme: "Scripting Languages - I Hate Them."

Also read: michael.gr - Tablecloth - A short high-tech sci-fi horror story




Note: This is a draft. It may contain inaccuracies or mistakes. There are bound to be corrections after I receive some feedback.

---------------------------------------------------------------------

Scratch

See:

http://stackoverflow.com/questions/397418/when-to-use-a-scripting-language

From http://wiki.c2.com/?SeriousVersusScriptingLanguages

Scripting Languages emphasize quickly writing one-off programs

serious languages emphasize writing long-lived, maintainable, fast-running programs.

light-duty "gluing" of components and languages.

From https://danluu.com/empirical-pl/

“I think programmers who doubt that type systems help are basically the tech equivalent of an anti-vaxxer”

the effect isn’t quantifiable by a controlled experiment.

Misinformation people want to believe spreads faster than information people don’t want to believe.

https://stackoverflow.blog/2023/01/19/adding-structure-to-dynamic-languages


No comments:

Post a Comment