Question

I'm implementing a programming language and I'm considering the following syntax:

@NamespaceX
 {
   +@ClassY <> : BaseTypeA 
    {
      +@NestedClassW<>
       {
       }

      +@MethodZ() : ReturnTypeC
       { 
         //".?" is a null-coallescing member access operator 
         @varD : ClassY =   predicateP ? objectQ.?PropertyS 
                          : predicateR ? valueB;
         @varE = predicateP ? varD  : throw Exception1();
         @f : ClassY -> ClassZ = param1 => MethodV(param1);
       }
    }

   +@UnitTest() [Test]
    { 
    }
 }

The following is the equivelant code in C# 3.0:

namespace NamespaceX
 { 
   public class ClassY : BaseTypeA
    { 
      public class NestedClassW
       { 
       }

      public ReturnTypeC MethodZ()
       { 
         ClassY varD =   predicateP ? ((objectQ!=null) ? objectQ.PropertyS
                                                       : null) 
                       : predicateR ? valueB
                       : null;

         ClassY varE;
         if (predicateP) varE = varD;
         else throw new Exception1();

         Func<ClassY,ClassZ> f = param1 => MethodV(param1);                 
       }
    } 

   public _NamespaceX() //non-type namespace members
    { 
      [Test] 
      public void UnitTest() {}
    }
 }

Which one would you rather write in? Which one is more readable? The code is terse in both examples but in the case of the C# example, once you're familiar with the idioms used I think it should be readable enough. I just want to know what you find good or bad about this syntax, not necessarily suggested improvements although those are fine, too, to explain what you're more into.

The language I'm designing is primarily an alternative to C# (but not necessarily meant for existing C# users to wholeheartedly adopt). The main consideration I had for readability is for having a single character declaration operator "@" with a single character accessibility operator ("+" for public or "-" for private) so that the name is close to the left margin where it can be easily scanned for instead of rooting through "public static void TheName ()" and such. I think that the "@" also makes it easier to scan for declarations of any kind as opposed to using a keyword which is indistinguishable from other keywords when scanning.

I'm going for both the stream-of-consciousness writing and at-a-glance reading aspects so I want feedback on more than just the "+@" part—for me, english keywords get in the way for these two activities (moreso in VB than C#) so I've decided to go with terse symbols to imply the declaration constructs:

 @foo { }          //namespace foo {}
 +@foo() : Bar {}  //public Bar foo() {}
 +@Foo<> : Bar {}  //public class Foo : Bar {}
 @foo : Bar = baz; //Bar foo = baz;
 @foo = p ? bar    //var foo = p ? bar : null;
 @foo = bar.?baz;  //var foo = (bar != null) ? bar.baz : null; 
 foo  = p  ? bar : throw E(); //if (p) foo = bar; else throw new E();

Feel free to change the identifiers to something more meaningful but please preserve the code's structure. I am open to suggestions, however, for making the C# example more readable while remaining faithful as a translation of the first example.

[I put in a close vote for this "being no longer relevant" since to really do this justice would take too much work for my purposes - Mark]

Answer 1

I would go with the C# style syntax.

I'm assuming that because you are comparing this to C#, that you are targeting your language at C# programmers.

If that's the case, you would be better off using the syntax they are already familiar with. That would allow them to focus on the unique value added by your language, and would eliminate the need for them to spend time learning a lot of new syntax. If it looks familiar, except for some cool new features, people will think "wow.. look at the cool stuff you can do in Mark's language". If the language looks too different, many people will get scared away by it, and will never notice all the cool new things your language enables.

If the only value you are planning on adding is a "shortened syntax", then I would rethink your design. People pick programming languages because of the cool things they can do with them, not because of the syntax.

Answer 2

Well, as a novice, it is far easier to guess what the C# code is doing than what your code is doing. What you've got might make sense to someone immersed in the language, but you've beaten Perl at its own game with your notation.

I think that it would be very hard to teach your notation; it would also be hard to learn. When familiar with it, you might be productive because there is less to type, but I think that redundancy helps the human (and compiler, and debugger) understand what is going on. You have to strike a balance between minimalism and verbosity (maximalism?).

Is this piece a syntax error:

     @d : y = p ? q.?s : 
              r ? b;

At least, there'd have to be an explanation of what the 'q.?s' bit means (it might somehow combine a non-null test with the member access, but it is hardly transparent).

If you're going down this route of more extensive use of symbols, you need to target Unicode. See also Fortress ^[1]. (Yes, you run into problems with current systems that are not as conversant with Unicode as they should be, but you have a far wider choice of symbols that can be used to good effect, if you are careful.)

For the benefit of those who come later, the notation used at the time I wrote my answer was:

@x
 {
   +@y <> : a 
    {
      +@z() : c
       { 
         @d : y = p ? q.?s : 
                  r ? b;
         @e = p ? d : throw exception1();
         @f : y -> y2 = x1 => z2(x1);
       }
    }

   +@x() [g]
    { 
    }
 }

When I last looked, the identifiers had been changed to the long forms now on display - so that @x became @NamespaceX, for example. While I understand that it is an illustrative example, the need to explain that what follows the first @is a namespace and that what follows other @ symbols are classes, methods, functions, etc is indicative of the problem with overly compact notation. Keywords used sparingly - but not too sparingly - help a lot. COBOL and SQL both have lots of keywords; C has a few keywords.

[1] http://research.sun.com/projects/plrg/

Answer 3

I'd write in http://en.wikipedia.org/wiki/Brainfuck

Answer 4

Readability and writeability of a language tend to be opposites. Your example is very writable – once one becomes familiar with the syntax and idioms of your language, they will be able to easily write in it. You have removed some of the verbosity of C#, but as a result, you have lost readability.

As an example, imagine a programmer who has worked in OOP languages, but never one with C-style syntax. Which do you think will be easier for them to reverse engineer, C# or your example? With C#, it's very clear what each code block is – a namespace, a class, and it's not hard to guess about methods and properties. Jonathan Leffler's example shows what your language is like when removing the name of the construct from the type or member name. It gets a lot more confusing.

And, what about nested types? In order to read into nested types, you'll have to start counting brackets!

Answer 5

I actually prefer languages with no more symbols than than neccesary. This is because I'm a touch-typist and find the effort to type a symbol, especially ones on the number keys, to be slower than typing a lower-case word with a few letters. That's not to say I'm a big fan of cobol (actually can't even write hello world in it), but I usually get annoyed when, in any language, I have to type long sequences of ^&^%$)(&^$$^&*('s, So I'd prefer those only appear infrequently, and where it follows from an obvious notation, like arithmetic.

Answer 6

Personally, I like symbols, as long as they are used consistently and in a logical manner.
For example in C it's really straightforward: you use &var to get an address and use *ptr to dereference it, pointers are declared by appending a * to the type. Once you have understood the 2 characters and what they do, you just use them without thinking about it and are happy as you don't have to write "address(var)" or "deref(ptr)".

However, in Perl, you have scalars ($), arrays(@), hashes(%) and references($).
This is where the trouble starts: to create an array, you write @array=(1,2,3). To create a hash reference, you can either prefix the @ with a backslash ($arrayrefref=\@array) or create an anonymous hash reference by putting the elements in brackets ($arrayref=[1,2,3]), for dereferencing you can either prefix the reference with the corresponding sign (@$arrayref[1]) or put -> between name and subscript operator ($arrayref->[1]).

So if you have to create your own syntax, plan it from the beginning and only use them in ways it's intuitive to understand, for readability you should make sure the operator precedence isn't different from what users coming from a similiar language would expect and makes commonly used operations (e.g. *ptr++) possible without using many braces.