data Pippo = Blabla
| Azerty Double
data Xyzzy a b = Foo
| Bar Bool
| Qux a
| Thud b
| Fred (Xyzzy a b) Int
Terminology 1
Expression: computation that calculates a value
"Computation" in functional programming means evaluating expressions.
Statement: computation that is performed for its effect (modifying the state of the system)
Statements may CHANGE values, but they do not, themselves have value
"Computation" in functional programming means executing a series of statements.
Side-Effect: Anything an expression does besides calculate a value.
Functional languages allow few (Racket) or no (Haskell) side-effects.
Some people say that a statements are code with side-effects, but that's silly since that's their main goal. Statements have effects.
Identify the Expressions and Statements
In C and C++:
10 * z + 4
An expression!
Actually several different expressions which combine into one large expression.
Why? it evaluates to a value.
x++;
This is a statement, it has the effect of incrementing the value of x.
Interestingly, x++ is an expression with a side effect (since it returns a value)
while (x > 3) { x = x-1; y = y+1; }
A statement! I couldn't say x= (while (..) {..})
Terminology 2
Syntax: What the programmer writes (or is allowed to write)
the fact that "int x;" is valid code is syntax
Semantics: the meaning of what the programmer writes
the fact that the above code defines a new variable `x` as an integer is semantics
Syntax and/or Semantics?
According to the C++ standard
^ is a valid binary operator.
Syntax! is says what you are allowed to do
When applied to two integers, the ^ operator computes "bitwise exclusive-or"
Semantics! tells you want it does
If your program adds two int values and the sum is a number too big to fit an int, the results are undefined (e.g., 9223372036854775807 + 1).
Semantics, describes what happens.
A semicolon by itself is a legal statement (a "no-op")
BOTH! tells you both what you can write, and what it does.
The expression new int allocates an uninitialized integer on the heap, whereas new int() allocates the integer 0 on the heap.
Mostly Semantics, because it tells you what happens.
but also a little syntax because it gives you examples of ok code.
Two Views of Syntax
Terminology 3
Concrete Syntax: what the programmer writes
Program as a linear sequence from a character set
Keywords, punctuation, legal identifiers, ...
Rules for resolving ambiguity (e.g., interpreting 3-7 * 2)
Specified by a grammar
Abstract Syntax: essence/representation of the code
Programs as structured data (trees)
Exact keywords, punctuation discarded
No ambiguities (due to tree structure)
Variation in Concrete Syntax
\(x \,\mapsto\, x{+}1\)
math
\(\lambda x.\, x{+}1\)
logic
lambda x: x+1
Python
(lambda (x) (+ x 1))
Scheme/Racket
\x -> x+1
Haskell
[](int x) { return x+1; }
C++11
x => x+1
C#
(x) => x+1
Dart, JavaScript 6
{ x in x+1 }
Swift
fn x => x+1
SML
function x -> x+1
OCaml
function(x) { return x+1; }
JavaScript 5
Function[x,x+1]
Mathematica
Note: We're not expecting you to memorize the details concrete syntax of any non-Haskell language mentioned today. Focus on the big picture: different ways that concrete syntaxes can vary.
Wiki Extra: Variation in Concrete Syntax
Wiki Extra: Human-Friendly (?) Concrete Syntax
Cobol
MOVE 28 TO D(2).
ADD W TO X GIVING SUM.

DIVIDE 100 INTO Y GIVING Q REMAINDER R.
AppleScript
set my text item delimiters to (item i of theVariables)
set theText to every text item of theText
if class of (item i of theReplacements) is date then
set my text item delimiters to my customDateStyle(item i of theReplacements)
else
set my text item delimiters to (item i of theReplacements)
end if
set theText to theText as string
set my text item delimiters to ""
Wiki Extra: Expressive(?) Concrete Syntax
APL (Iverson, 1960's)
Fortress (Sun, 2000's)
Wiki Extra: Two Concrete Syntaxes in One Language
Dylan (Apple et al., 1990's)
define method double(x)
2 * x;
end method;
or
(define-method double (x)
(* 2 x))
C++ (up through C++17)
bool eq(int a, int b)
{
return !( (a < b) || (a > b));
}
or
bool eq(int a, int b)
<%
return not( (a < b) or (a > b));
%>
ASTs Ignore Irrelevant Details
Preserve only the important structural information from the parse
ASTs are just trees; any tree representation (from your previous experience with Racket, C++, Java, etc.) will work.
Today we'll focus on trees in Haskell
Syntax: Exercise
Write the expression below in writing in as many different ways you can:
Possible options?
ADD TWO AND THREE AND MULTIPLY THAT BY SEVEN, DIVIDING THE RESULT BY ONE PLUS ONE AND THEN SUBTRACT THAT RESULT FROM TWO
SUBTRACT FROM TWO THE RESULT OF DIVIDING SEVEN TIMES THE SUM OF TWO AND THREE BY THE SUM OF ONE AND ONE
2 7 2 3 + * 1 1 + / -
Representing Abstract Syntax
Terminology #4:
Haskell’s data declarations (which are ideal to use in representing abstract syntax and similar structured types) have been (re)invented multiple times, called:
Algebraic datatypes (or just datatypes)
Specifically, sum types (whereas tuples are product types)
Enumerations with associated values
Variant records (or just variants)
Tagged unions (or discriminated unions)
Case classes (same goals, slightly different details)
Option 1: Wrong!
What's wrong with this?
data Exp = Double
| Plus Exp Exp
| Minus Exp Exp
| Times Exp Exp
| Divide Exp Exp
The datatype is missing a label for the case with the Double.
This will not cause an error, but it also won't do what you want. Haskell will just create a value for you called Double, but it won't have any associated value.
Option 2: Inelegant!
This would be correct code:
data Exp = Num Double
| Plus Exp Exp
| Minus Exp Exp
| Times Exp Exp
| Divide Exp Exp
Extra Reminder / Example
But this is a bit inelegant. There is redundancy in that we have to pattern match four different cases when we might not care what the operator is.
Option 3: Nice!
data Op = PlusOp | MinusOp | TimesOp | DivOp
data Exp = Num Double
| BinOp Exp Op Exp
What about...
Option 4: Overgeneralizing
type Op = String
data Exp = Num Double
| BinOp Exp Op Exp
For our purposes, this is not a good implementation, because very few strings are valid operators. There are only four valid operators so it makes more sense to define a distinct Op type using data. (In a more industrial-strength and/or flexible language with user-defined operators, we might make different choices.)
Option 5: Crazy Town
Or, why not this...?
data Op = PlusOp | MinusOp | TimesOp | DivOp
data Paren = LeftParen | RightParen
data Exp = Num Double
| BinOp Exp Op Exp
| Parenthesized Paren Exp Paren
The parentheses aren't necessary since the subexpressions in the BinOp already allow us to nest expressions.
Specifically, the tree structure of Exp already encodes the order of operations (which is what parentheses are normally used for).
Also, the way Parenthesized is defined allows us to make nonsensical expressions such as Parenthesized RightParen? (Num 2.0) RightParen?
(Precedence rules that allow us to omit parentheses and say how operators are grouped belong to concrete syntax, not to abstract syntax!)
Let's make a (simple!) programming language...
S-K Combinators? It's simple, but a little too simple.
Lambda Calculus!
Lambda Calculus: Syntax
Haskell Representation
type VarName = String
data Exp = Var VarName -- variable
| Lam VarName Exp -- anonymous function
| App Exp Exp -- function application
deriving (Show)
Example
First, let's consider a simple lambda function, this is our PAIR function:
λm.λn.λs.s m n
or, more explicitly with (redundant) parentheses that's
(λm.(λn.(λs.((s m) n))))
and with the space for function application clearly marked (using the ␣ character to mark spaces), that's
λm.(λn.(λs.(s␣m)␣n))
In our Haskell representation of the abstract syntax, that'd be this Haskell expression.