Tutorial :How is a new programming language actually formed/created?


Fortran->Algol->Cpl->Bcpl->C->C++->Java .....

Seems like every language is built upon an ancestor language. My question : New languages extend parent ones or there is some kind of a trick?

e.g. System.out.print() in Java ; is it actually printf() in C, and so on (printf is actually .. in Cpl)?

If so, doesn't this make every further language be slower and need more memory? What separates a new language from a framework?


Each language builds on the concepts of another (language design). Each new language learns what worked in previous languages and what did not work. Also languages are targeted at different groups. If you write a language for long-term maintainability and reliability, that language will probably not be appropriate for shell scripts.

Many languages are testing beds for features or concepts. These are usually extremely flexible and fun and can be very quick to code in. Basic was one of these, as are Perl, Ruby, ...

Other languages are reduced to a bear minimum and rarely change--they focus on backwards compatibility and consistency. New features would be avoided and, when they are added, would be tested in-depth before being added to the standard (Yeah, they tend to be standards based more than the previous group). C, Java, Ada and C++ were designed to fit here. C# is possibly a crossover with more features being added than these others, but more stability than the previous group.

Now, besides what drives language features, there is how the language is built. Lanuages are often initially written in another language, however not in the way you are assuming. Java is mostly written in Java now, but the JVM is probably mostly hand-coded assembly, however you can be SURE that C's printf is nowhere to be found in Java.

A compiled language generally consists of your program reduced to a certain set of codes (machine language or bytecode) and is then packaged with a set of routines (like System.out.println). However, println doesn't just call C's println, instead a library is created (written in some combination of Java, C, assembly) that knows how to do the output itself.

In C the library would be generated the same way, a combination of C and assembly that generates the assembly code that can execute "printf"


How is a new programming language actually formed/created ?

It's a multistage process:

  1. Pointy-headed type theorists and other professionals are continually proposing new language features. You can read about them in places like the Proceedings of the ACM Symposium on Principles of Programming Languages (POPL), which has been held annually since 1973.

  2. Many of these proposals are actually implemented in some research language; some research languages I personally find promising include Coq and Agda. Haskell is a former research language that made it big. A research language that gets 10 users is often considered a success by its designers. Many research languages never get that far.

    From research to deployment I know of two models:

  3. Model A: A talented amateur comes along and synthesizes a whole bunch of existing features, maybe including some new ideas, into a new language. The amateur has talent, charisma, and maybe a killer app. Thus C, Perl, Python, Ruby, and Tcl are born.

  4. Model P: A talented professional make career sacrifices in order to build and promulgate a new language. The professional has talent, a deep knowledge of the field, and maybe a killer app. Thus Haskell, Lua, ML, Pascal, Scala, and Scheme are born.

    You propose another model:

  5. Model E: A talented person, whether amateur or professional, extends or modifies another language. There's a built-in user base who may be interested in the extension. Maybe users can explore new ideas without paying heavy transition costs. Thus C# and C++ are born.

My definition of a professional is someone who is paid to know about programming languages, to pass on that knowledge, and to develop new knowledge in programming languages. Unfortunately this is not the same as designing and implementing new languages, and it is not the same as making implementations that many people can use. This is why most successful programming languages are designed and built by amateurs, not professionals.

There have been quite a few interesting research languages that have had hundreds or even thousands of users but yet never quite made it big. Of these one of my favorites is probably Icon. I have argued elsewhere that nobody really knows why languages become popular.

Do new languages extend parent ones?

This is actually very rare. C++ is the most popular example, and perhaps Algol-W or PL/S came close. But it's much more common to use the parent language for inspiration only, and some languages (Python comes to mind) acknowledge multiple "parents" as sources of inspiration.

Doesn't this make every further language be slower and need more memory?

Not necessarily. C++ is slower and uses more memory not because it is descended from C but because it was developed by accretion over time to the point where only a very skilled user can reliably write C++ code that is as fast as similar C code. (I want to be very clear: it's not that C++ can't be fast; it's that the cost of C code is always obvious from reading the source, and the cost of C++ code is sometimes not at all obvious from reading the source.) For more information about the evolution of C++, read Jim Waldo's book The Evolution of C++.

Java was initially slow because of just-in-time compilation and other wacky things in the implementation. They also saddled themselves with this dynamic class-loading stuff which is really hard to get things to be fast (because a class could be extended dynamically at any moment). Kenny Zadeck, Roger Hoover, and David Chase built a really fast native-code compiler for Java without dynamic class loading.

For a counterexample, I think Scheme programs ran faster and used less memory than the Lisp programs that preceded themâ€"in part because Guy Steele is both a brilliant designer and a brilliant implementor. (Rare combination, that.)

But there is something to what you say: people who lack the expertise to build a good compiler from scratch or who lack the expertise to design a whole language from scratch may well hack up an implementation of something not too different from a parent. In such cases one is quite likely to wind up with a language that is less well designed, less well implemented, slower, and using more memory than its predecessor. (Tony Hoare famously said that Algol 60 was an improvement on most of its successors [sic]).

It's also true that the more recently a language is designed, the more computing resources are available for the same price. Early C compilers had to operate effectively in as little as 128K of RAM. Today's C++ compilers face no such constraints, and there is every reason for them to use more memory: it is really cheap to populate a machine with gigabytes of RAM, and to limit ones use to mere megabytes saves nothing; the larger memory is already paid for.

Summary: Languages come into being because people want to make programming better, and they have new ideas. Languages get their start when somebody takes a whole bunch of ideas, some new and some proven, and synthesizes them into a coherent whole. It's a big job. One way to make the job easier is to draw not just on proven features, but proven designs, of one or more predecessor languages. This kind of design creates the impression of "parenthood", but actual extension or near-extension (in the case of C++ extending C) is rare. Time and space costs don't necessarily get larger as languages evolve, but it's often the case that people create languages by making existing designs more complex, and the more complex the design, the harder it is to implement efficiently. It's therefore not unusual that programs written in a new language seem slower or to use more memory than similar programs written in an ancestor language. Finally, as with all other forms of software, compilers designed and built recently tend to use more RAM and CPU than compilers built ten years ago, simply because large quantities of RAM and CPU cycles are available at bargain-basement prices.


Languages are not slow, Implementations [created by compilers to assembly] are slow. In theory, you could have a C++ interpreter that ran slower than a PHP compiler or whatever. Languages also do not consume memory, implementations consume memory.

A project is a language when the grammar (or syntax) is different. (You could have both a language and framework in the same project)

Languages are formed just by the general creative process. Someone sees something kinda cool and thinks it could be better so they make it that way and eventually you get an entirely different language.


There is a big difference, which may not be obvious, between saying that a language is built on the concepts of a predecessor and actually being built with it. C compiles into ASM so it is built on it but not necissarily with it. Often times (most?) C compilers are actually written in C. C++ is built on C, it supports the full spectrum of C and adds a bunch of stuff to it. Java is a totally different thing, likewise with .NET. They "compile" to a pseudo ASM referred to as IL for .NET and ByteCode for Java. Both require some other step or VM (virtual machine) to run.


Languages can (and often are) written from scratch. But languages can (and often do) build on the concepts of prior languages. Why reinvent the wheel when there is a perfectly round one laying at your feet?


At a superficial level, languages grow from predecessors because there are users who already know the syntax and there are not too many different ways to actually do the syntax.

At a more meaningful level, prior languages may be being used in repetitive ways that could be made easier by adding some syntax and behavior. I'm thinking of how one would do OOP in C before C++ came along. Or the distinction between Fortran with GOTO and Algol with block structure. It was a pain to keep making labels, and they could be automatically generated.

Personally, I could be wrong, but I don't see general purpose languages (GPL) evolving much further (not to say small languages won't proliferate). I do think special-purpose languages (DSLs) will continue to grow, and I think one of the key features of any GPL will be how well it assists the creation of new DSLs.

I think this because there is a continuum of representations between problem-specific data structure on the one hand, and programming languages on the other. Any data that is read by some program is, in a sense, expressed in a language, and that program is its interpreter. So the only thing that really separates the extremes is the degree of sophistication and generality of its interpreter.

What I look for in a DSL is the property of minimum redundancy with respect to its intended problem domain. The idea is if there are some requirements, and if a program is written (by a human, at a keyboard) to correctly implement those requirements, and then if a single coherent change is made to the requirements, there is some amount of editing that must be done to the program to correctly implement the change, then the redundancy of the language w.r.t. the domain is the size of such edits, averaged (somehow) over the space of possible changes. A very simple way to measure this is to use a diff program between the before-and-after code. The number of differences is a measure of the redundancy for that change. This is a bit long-winded, but that's what I look for to be minimized, to say that a language is well adapted to a domain.

If redundancy of a language is minimized, then it means fewer edit changes are required to implement functional changes, and not only is the code likely to be shorter, but there are thus fewer chances to put in bugs.

The way programmers are currently taught, these ideas are in their future. Minimizing redundancy of source code is not yet seriously valued. What we have instead is bandwagons like OOP that, in spite of their obvious value, tend to lead to massively redundant code. One promissing development is the rise of code generators, but again they are in danger of becoming ends in themselves rather than serving the goal of reducing source code redundancy.


It seems to me there are 2 main ways that new languages get created:

1) someone decides they need to do some specific work, and a special purpose language would help them get that work done. That person (along with maybe some other users) find the language useful and start extending to to other purposes. Eventually the language supports enough stuff that it can be considered a general purpose language.

2) someone versed in programming languages decides that there are problems with a current language that might be solved with a new approach (that might be a radical change or an incremental change from before). The new language is designed from the ground up to be a general purpose language. This might be a description of how Java or C# came about.

Most languages will have some similarity to others (such as your printing example) because those operations are pretty useful in most any computing context.

Then there are languages like FORTH or APL which I just can't explain...


Programming languages build upon the prior experience and knowledge of the language designer, and of the community in general, just as new cars are built using what was learned by building cars a few years ago.

I don't think it's accurate to make a blanket statement that all languages are built upon some other language that preceded it. The designer of the new language certainly should have experience using multiple existing languages, understand their strengths and weaknesses for a specific purpose, and then design their new language to incorporate all the strengths / great ideas of others and avoid the weaknesses of others as much as possible. Learn from others successes and failures. (Those who ignore history are doomed to repeat it)

As noted in other responses, building a new language is in no way related to the performance of the implementation of other languages. New languages and new implementations of the same language usually replace, not extend, prior examples. Even the performance of one implementation of a language compared to another implementation can vary considerably: consider the Borland C++ compilers that ran circles around other C++ compilers of the same era, or compare the runtime performance of the 1.0 Java virtual machine to later implementations that added performance boosts like the hotspot compiler.

To use the car example: because the Ford Model T was a car and belched black smoke, and the Tesla Roadster is a car and came after the Model T, is it true that the Tesla Roadster must also belch black smoke simply because it is a car and it came after the Model T? Of course not. The Tesla is a very different implementation of the idea of a car. Its design reflects a lot of things that have been learned since the Model T was made, but it is not a Model T.


Language designers have to take into account popularity. I've no doubt that a large part of C#'s popularity is due to the fact that there isn't that much difference from Java's syntax, which isn't too different from C, and so on.

It's already hard work learning a new language's foibles, so it's easier for everybody if the syntax isn't too different from other languages.

As regards speed, it's not dependent on the language, but on the compiler used to convert that language into another language, which could be straight out machine code, or assembly, or in the case of C#, Java, etc, byte code which is then run on a Virtual Machine.

Your final question is also interesting. C# and .NET are quite different beasts. When a language (like C#) is targeted for .NET, a compiler is made that can convert that language into bytecode that can run on that VM. This means that C#.net code can quite happily call Assemblies written in VB.NET, for instance.

The same applies to Java and Scala, both written for the JVM. Scala is a functional language, while Java is an OOP language, yet both can happily call eachother, since in the end, it's just bytecode running on a VM.

Hope this answers your question.


As someone who works primarily on language-based projects, I think that there are two important reasons that new languages are created: boredom and frustration. When programmers are bored, they'll come up with all kinds of ridiculously ambitious projects to fill their time, and languages provide nearly endlessly interesting new challenges. When programmers are frustrated with existing tools, they tend to create new ones to fill the perceived gap. From DSLs to general-purpose languages, I find that all language projects that I have seen boil down to basically these two reasons.


Actually new languages can be much faster. For example C/C++ or even Java can be much faster then hand-written assembler. Go/Haskell/... which can be panellized easily can be much faster then assembler on modern hardware - what's that assembler code is '25% faster' then compared single thread Haskell/Go program if you can turn on switch which makes Haskell/Go 4x faster (i.e. 3x then optimized assembler) on Core Quad not mentioning that the code is much less buggy?

There are studies that platforms with GC are actually faster - since programmers have more time optimizing programs/seeking for others errors rather then finding memory leaks - although the 'ideal' programs will be slower.

Also you can have many reimplementation of language:

  • Based on native code (assembler)
  • Based on low-level language (C, LLVM)
  • Based on cross-platform framework (Java, parrot)
  • Based on interpreter
  • Based on single-platform framework (.Net - yes I know about mono ;) but it is still mostly single-platform)

For example Ruby have Ruby MRI (interpreter), JRuby (Java), .Net (IronRuby) etc. They usually much differ in terms of speed. C have numerous compilers and can in theory have interpreter. Haskell have native code generator (GHC & co.), low-level language (GHC -fvia-c and new GHC + LLVM) and intepreter (ghci).

Languages generally are created by:

  1. Authors like language A and B so he combines it into language C

  2. Authors have a brand new idea (like objects, duck-typing) which cannot be expressed in existing languages/a new idiom which cannot be expressed as good in existing languages so they create new language which brand new feature.

  3. Authors finds some feature terrible and wants to get rid of it (for any reasons). Like enum in old Java.

In first and third case new language 'inherits' from one or more languages (usually many). But usually it is combination of them.


I've voted up some answers above, but note also that the Lisp class of languages actually in some ways encourage the programmer to effectively make a new language feature whenever needed, creating a new dialect of lisp. You can argue the toss about what the difference is between creating a new function of your own and creating a new language feature, but that is the claim of Lispers.

Note:If u also have question or solution just comment us below or mail us on toontricks1994@gmail.com
Next Post »