For a long time I have felt that a general theory of ordinals comparable to that of the von Neumann ordinals in classical set theory is lacking in constructive type theory, and indeed that this is something of a "scandal". Ordinal notations are one thing; but "the" ordinals are another - meaning by "the" ordinals canonical representatives of well-orderings such as the von Neumann ordinals. (There is a typical lament for example in this page: search for the term "spine-chilling".) On the other hand, it's unlikely one needs to fill exactly the rôle of the notion of ordinal in classical set theory.
It seems important to `go slowly'. At certain points one stares at the ordinals defined so far, then formulates explicitly a principle that underlies their definition, and then exploits the principle to push further. If one is too hasty, the patterns of definition will come out slightly awry, the reflection principles will accordingly come out slightly wrong, and the whole enterprise will veer off in the wrong direction. (I expect there will be quite a lot of veering-off.)
It is also important and difficult to settle on some workable notation. I have not done so yet; so what follows may be less easy to follow than it needs to be.
Arguably the first few `watershed' ordinals are
0 zero, the first ordinal, with no predecessors.
Closed under practically anything: addition, multiplication,
exponentiation, ..., for trivial reasons. (In some ways zero is the
most mysterious ordinal of all!)1 one, the first non-zero ordinal, with
predecessor 0. Closed under addition and
multiplication, but not exponentiation. Closed under sums (but not
products) of series of length < 1.2 two. Closed under multiplication, and
exponentiation, but not binary addition. Closed under sums (and
products) of series of length < 2.w omega, the first infinite ordinal, with
predecessors the finite ordinals. Closed under addition,
multiplication, exponentiation, etc etc. Closed under sums (and
products) of series of finite length.w^w omega-to-the-omega, the ordinal of primitive
recursive arithmetic (or arithmetic with recursion restricted to
lowest type). This is (for example) the order type of the unary
polynomial functions from the natural numbers to the natural
numbers ordered by eventual dominance (so, highest different
exponent). The predecessors of w^w are describable by
terms built up from 0, 1 and 
w by addition and multiplication.e_0 epsilon_zero, the ordinal of first order
arithmetic, with recursion allowed at all finite types. The
predecessors of e_0 are describable by terms built up
from 0 and w by addition and
(multiplication and) exponentiation.phi_w(0), the ordinal of primitive recursive
arithmetic including (transfinite) recursion over well-founded countably
branching trees.  Actually, maybe this one's just for real obsessives.
G_0 Gamma_zero, the ordinal of type theory with
finitary data types, and an external sequence of universes closed
under the Pi quantifier. (Also ATR0, and dozens of other
systems.) Its
predecessors are described by terms built up from the terms below
e_0 by closing them additionally under a binary
operator for Veblen's 2-place function (based on the function
w^a which enumerates the non-zero ordinals closed
under binary addition).Given a normal (ie. continuous and strictly increasing) ordinal
function f, by the Veblen hierarchy over 
f, I mean the 2 place function V_f(a,b) defined
so that for each a, V_f(a) enumerates the
common fixed points of the family of functions { V_f(a') | a'
< a }.
There are lots of interesting normal functions to start with, some of them unfamiliar to me. Here are a few.
w^a, giving
the classical Veblen hierarchy. So V_f(1,0) is 
e_0, the first epsilon number, and the proof-theoretic
ordinal of first order arithmetic. Many weak subsystems of second
order arithmetic crop up in this hierarchy. The only one I can
remember is sigma-1-1 axiom of choice, which has ordinal 
V_f(e_0,0). Perhaps ramified analysis of level 1 has ordinal
V_f(2,0); this was all explored by Feferman and the
late Kurt Schütte while dinosaurs were roaming the earth, in the 50's.f, V_f(a,0) is again
normal, so one may form a normal hierarchy (which I call the second
Veblen hierarchy) on that function -- and so on. (This takes us
into the realms of the so-called "n-place" Veblen functions; I
can't remember the exact definition, or even where to look it
up.)1 + a, which
is surely the simplest non-trivial normal function. It enumerates
the non-zero ordinals. In this case V_f(a,b) equals
w^a + b. The second Veblen hierarchy on this function
is again the classical (first) Veblen hierarchy.w.(1+a) which enumerates the limit ordinals. Its
derivative is (w^w).(1+a), unless I am mistaken. The
derivative of that is (w^w^2).(1+a), UIAM. In general,
the b-th derivative V_f(b,a) stands some
chance of being (w^(w^b)).(1+a). (I should check
this.) If that's true, then the second Veblen hierarchy is based on
w^(w^a) which (strangely enough) enumerates the
infinite ordinals closed under binary multiplication (the finite
ones being 0, 1 and 2).As mentioned already, there are the `n-place' Veblen functions, and beyond that Shütte's `Klammersymbolen'. I think Anton Setzer has a way of extending the Klammersymbolen (multi-dimensionally?) up to the Bachmann-Howard ordinal. There may even be some definitions of these things in his drafts on ordinal systems.
Let U0 = {} : Set denote the empty set, and 
T0 : U0 -> Set the empty set-valued function with domain
U0.  There is surely no other choice.
Let U1 = {0} : Set denote the set with a single
element 0, and T1 : U1 -> Set the
set-valued function with domain U1 whose value at
0 is the set U0 = {}.
Let U2 : Set denote the set with a two elements
0, 1, and T2 : U2 -> Set the
set-valued function with domain U2 whose value at
0 is the set U0 = {}, and whose value at
1 is the set U1 = {0}. This family of
sets is sometimes called the boolean universe, or Jan Smith's microscopic
universe.
And so on.
Suppose we are given a family A,B. What should the
successor be? In either case the index set needs to be extended by
a new element, so we take A' = A + 1. Here 
+ means the binary disjoint union operation on sets, and
1 means the singleton set { 0 } with sole
element 0.
B' is defined by 
                  B' (i a) = B a + 1
                  B' (j 0) = {}
One alters the function B to B', having
one more new point in its domain. However the new element `goes at the
bottom'.  The old elements are themselves `shifted up by 1'.
B' is defined by 
             
                  B' (i a) = B a
                  B' (j 0) = A
One extends the function B to B', having
one more new point in its domain, and at this point taking as its
value the entire domain of B.
This operation is the analogue of the von Neumann successor
operation which maps a set x to x' = x U
{x}. One extends x by one new element, namely
the singleton {x}.
This is the same as what one gets if one looks at Erik Palmgren's next-universe operator, (the one under which the superuniverse is closed) and strips out all the closure conditions pertaining to quantifiers, given sets, logical constants, the identity type etc.
Succ^ and the second (with the empty set at the
bottom) Succ_. 
In both definitions above, the operation (on sets) X
|-> X + 1 is of key importance. Martin-Löf has
suggested that this operation deserves its own name. (See pp.
103-104 of Nordström, Petersson and Smith.) So let's take
S to be a unary type constructor with the following
formation and introduction rules:
     X : Set                                x : X
     --------- formation      0 : S X      ---------
     S X : Set                             s x : S X
I would prefer to write this information as follows, using a more compact notation vaguely like Agda's.
     S : Set -> Set 
     S X = data 0 | S (x : X )
Here `data' is a form of coproduct operator, such that one has the
opportunity to associate constructor names with the various
cofactors: in this case the names are 0 (with no components), and
s (with one component, which is an X). To
analyse such coproducts, there is a `case' construction.
        case z of 0    -> a
               |  S x  -> b x
Sometimes I miss out the `z of': in that case I mean 
\z->case z of ... . Sometimes I use pattern-matching
definitions rather than the case construction - these should always
be reducible to case constructions.
Note that when we step from (U,T) to Succ^
(U,T), or Succ_ (U,T), there is no `interesting'
interaction between the two components. (I don't mean anything
disparaging by this!) That is, U' is defined in terms of
U, but not T -- it is really an operation on
sets, `artificially' made into an operator on families. (Compare this to
the closure of a universe under a quantifier like Pi,
Sigma, or W, where there is an
`interesting' interaction.) However it is true that in the definition
of Succ^, T' depends on U (as
well of course as T), which is perhaps `mildly'
interesting.
ww is the least ordinal which contains
0 and is closed under the successor operation 
Succ_. It reflects the property of the class of ordinals
that it is non-empty (hence there is a least ordinal), and for any
ordinal there is a greater (hence a least ordinal which is strictly
greater). Clearly, one way to define w is as follows. 
       Uw = Nat = S Nat     -- the set of natural numbers
       Tw 0     = {}
       Tw (s x) = S (Tw x)
But perhaps this is already to leap ahead too far. That is what we
expect. 
Something occurs to me here: the distinction between
(A,B) to (A',B'), where the latter is the least
family extending (A,B) enjoying certain
closure properties. 
    (A',B') : Fam Set
         A' = A + ...
         B' (i a) = B a
         B' ...   = ...
(A,B) to (A',B'), where the latter is the least
family containing (A,B) enjoying certain
closure properties. 
    (A',B') : Fam Set
         A' = S A + ...
         B' (i 0)     = A 
         B' (i (s a)) = B a
         B' ...       = ...
Succ^
(A,B)? Yes! 
In this connection it may be interesting to look at one way of
expressing the Veblen derivative of a single function
f : On -> On:
       f' a  =  (f^w . (+1))^a (f^w 0)
The `(+1)' which occurs here is perhaps the correlate
of the successor operation Succ^ on families. 
Let us (for the time being) consider a `universe' to be a family
of sets containing the empty set and closed under (set) successor
S. (This is a reasonable notion of limit ordinal.)
Then the operation which given a family (A,B) produces
the least universe (A',B') extending (A,B)
is defined as follows.
     A' = A + S A' 
     B' (i a)      = B a
     B' (j 0)      = {}
     B' (j (s a')) = S (B' a')
Let us call this operation Sw. (Another name might be
Univ, if we are calling a universe a family which
contains the empty set and is closed under successor.) Note that it is
an operator on families. In a sense, it takes a family to the least
limit family extending it.  If we start with the empty family, then it
takes this to the family of finite ordinals.  There is something monadic
about Sw.  It is a closure operator, in the sense that
it is inflationary, monotone, and idempotent.
wThe ordinal w*2 is the least ordinal which contains
both 0 and w, and is closed under the
successor operation. Similarly w*(k+1) is the least
ordinal closed under the successor operation, which contains all of
0, w, w*2, ... , 
w*k. This gets us (arbitrarily close) to w^2.
But what really is the step to w^2 itself? Somehow, we
need to close under the general process of throwing in the last
closure ordinal, and then closing again under zero and successor.
If one wants to go really slowly one should probably say something like this: "after w we have sw, ssw, ... then we can build w*2 as the least universe closed under 0, s, w. Similarly we can go on and build the ordinals up to w*3 and w*3 itself is the universe closed under 0, s, w, w*2. And similarly we can define w*n for finite n. Then we wish to internalize the process which brings us from w*a to w*(a+1) as a construction on general ordinals a. Now, w*(a+1) is the ordinal closed under 0, s, extending and containing w*a."
wNow we can consider Sw as an
operator under which families can be closed, in addition to 
S. So we can define a set Uw^2 as follows:
     Uw^2 = data $0    
               | $S  (a : Uw^2) 
               | $Sw (a : Uw^2, b : Tw^2 a -> Uw^2) 
The family Tw^2 is defined as follows: 
     Tw^2 $0        = {}
     Tw^2 ($S a)    = S (Tw^2 a)
     Tw^2 ($Sw a b) = Sw (Tw^2 a) (Tw^2 . b)
It seems reasonable to regard (Uw^2,Tw^2) as one
analogue of the ordinal w^2 -- which can be regarded
as the first ordinal which is greater than 0, and
closed under both +1 and +w. But this may
not be the right point of view. 
Now w^2 is surely closed under Sw --
but should we throw in also 0 and closure under 
S (yet) again? Maybe this is a point at which we can `veer
off', and we should take care not to leap ahead too quickly.
The operator involved here may be this. Given a family of sets
(A,B), define (A',B') by
     A'   = data $b (a : A)
               | $0    
               | $S  (a' : A') 
               | $Sw (a' : A', b' : B' a' -> A') 
The family B' is defined as follows: 
     B' ($b a)      = B a
     B' $0          = {}
     B' ($S a')     = S (B' a')
     B' ($Sw a' b') = Sw (B' a') (B' . b')
Perhaps we might call this operator Sw^2. 
Let's consider (binary) addition. If we are given two families
of sets (A1,B1) and (A2,B2), we can
define another family (A,B) which is in some sense the
ordered sum of the first two: (A1,B1) + (A2,B2) (with
(A2,B2) `above' (A1,B1)).
       A = A1 + A2
       B (i a1) = B1 a1
       B (j a2) = A1 + B2 a2
This agrees with the definitions of successor, and `one'. (The
definition of addition makes more sense, perhaps, if we model
ordinals as transition
systems rather than just families of
sets.) The equations for the predecessor are 
       (i a1)[b1]   = i (a1[b1])
       (j a2)[i a1] = i a1
       (j a2)[j b2] = j (a2[b2])
Let's consider (binary) multiplication. If we are given two
families of sets (A,B) and (C,D), we can
define another family (E,F) which is in some sense the
ordered product of the first two: (C,D) * (A,B) (with
(A,B) `controlling' (C,D) so the operands
of * are `the wrong way round' for the usual sense of
ordinal arithmetic).
       E       = A * C
       F (a,c) = D c + B a * C
If we model ordinals as 
transition systems rather than just 
families of sets, the equations for the predecessor are The
equations for the predecessor are 
       (a,c)[i d]      = (a,c[d])
       (a,c)[j (b,c')] = (a[b],c')
Let's consider a general ordered sum. If we are given a family
of sets (A,B), and for each a : A a
family of sets (C a,D a), we can define another family
(E,F) which is in some sense the ordered product of
the family. This ought to generalise both binary addition (when the
index family is the family `2') and multiplication (when the
indexing is in fact constant).
       E       = (Sigma a : A) C a
       F (a,c) = D a c + (Sigma b : B a) C (a[b])
If we model ordinals as 
transition systems rather than just 
families of sets, the equations for the predecessor are The
equations for the predecessor are (just as above) 
       (a,c)[i d]      = (a,c[d])
       (a,c)[j (b,c')] = (a[b],c')
It is quite challenging to give a definition of exponentiation. Perhaps one can start with the `two to the _' operation, which in the case of linear orderings is the lexicographic order of chains which are descending in the exponent order. (I've not succeeded in defining this nicely as a transition system.) Then one might think of the `w to the _' operation, in which the descending chains are `decorated' with numerical `coefficients'. (One should also consider how to define the closely related multi-set order, better suited to non-linear orders: Thierry knows a great deal about this.) Finally, one could think about the general exponential.
General exponentiation is an enormously subtle operation. Almost
everyone overlooks the cases in which the base is zero or one. The
functions sgbar a = 0^a and sg a =
0^(0^a) are very important.
Suppose we have any family (U : Set, T : U ->
Set) and any operator Theta from
families of sets to families of sets. The operator 
Theta can be split apart into two simpler operators, as
follows.
        Phi : Fam Set -> Set        
        Psi : (Pi (A,B) : Fam Set) Phi(A,B) -> Set
        Theta (A,B) = (Phi(A,B),Psi(A,B)) : Fam Set
The least family of sets extending (U,T) and
closed under (Phi,Psi) could perhaps be
defined as follows. 
        U'  = data $t (u : U)
                 | $phi (a : U', b : T' a -> U')
                 | $psi (a : U', b : T' a -> U', c : Phi(T' a, T' . b))
        T' ($t u)       = T u 
        T' ($phi a b)   = Phi (T' a, T' . b)
        T' ($psi a b c) = Psi (T' a, T' . b) c
Let us call (U',T') Cl(Phi,Psi)(U,T). The
type of Cl is as follows. 
       Cl : (Fam Set -> Fam Set) -> Fam Set -> Fam Set
The existence of this functional is somehow equivalent to the
`Mahloness' of the universe of sets. (Maybe one should think about
whether this is some kind of `free' construction with a given
`indeterminate' family. There is something monadic about this.) 
It is natural to throw in zero and successor.
       Deriv : (Fam Set -> Fam Set) -> Fam Set
       Deriv Theta = (U',T') 
           where 
                 U' = data $0
                         | $s (a : U')
                         | $phi (a : U', b : T' a -> U')
                         | $psi (a : U', b : T' a -> U', c : Phi (T' a, T' . b) )
                 T' = case $0         -> U0
                         | $s a       -> S (T' a)
                         | $phi a b   -> Phi (T' a, T' . b)
                         | $psi a b c -> Psi (T' a, T' . b) c 
This is the least family containing the empty set, and closed under
both the successor operation, and the given operator 
Theta. The step from Theta to Der (Cl
Theta) is then something very like the Veblen derivative. 
The least family of sets containing (U,T)
and closed under Theta =
\(A,B)->(Phi(A,B),Psi(A,B)) could perhaps be defined as
follows.
         (U',T') = Cl Theta (Succ (U,T))
How does it look if we take (U0,T0) for the
`contained' family, and von-Neumann successor for the operator on
families? Then
        U = U0, which is empty,
        (so T is irrelevant), 
        Phi(A,B)       = S A,
        Psi(A,B) 0     = A
        Psi(A,B) (s a) = B a,
Hmm. I don't know what to make of that. 
Slight generalisation of the `singleton' functional. (I have the feeling that this can be subsumed in the simpler functional.) Given a family of families of sets, and a family of operators from families of sets to families of sets, to define the least family of sets containing each of the given family of sets, and closed under each of the given families of operators. (This is roughly speaking what Peter Dybjer appears to suggest gives us an analogue of the Veblen hierarchy.)
Suppose we have
(I,\i->(U i,T i)) of families of sets
(U i : Set, T i : U -> Set) where i :
I.
        Phi j : (Pi A : Set) (A -> Set) -> Set        
        Psi j : (Pi A : Set, B : A -> Set) Phi j (A,B) -> Set
where j : J and J : Set.
   U'  = data $u (i : I)
            | $t (i : I, u : U i)
            | $phi (j : J, a : U', b : T' u -> U')
            | $psi (j : J, a : U', b : T' u -> U', c : Phi j (T' u, T' . t) )
        T' ($u i)          = U i
        T' ($t i u)        = T i u 
        T' ($phi j a b)    = Phi j (T' a) (T' . b)
        T' ($psi j a b c)  = Psi j (T' a) (T' . b) c  
(A,B)
where A : Set and B : A -> Set.  So Fam Set
is a type, or large set.  Let us however ignore this "size consideration" for the moment 
and pretend that F = Fam Set is a set, and so can be the state space
of a transition system.  We can put the following transition structure on F. 
The transitions from (A,B) are defined to have type
        Tfam (A,B) = (Sigma a : A) B a -> A : Set
The destination states of the transitions are given by
        (A,B)[a,b] = (B a, B . b) : F 
where `_ . _' refers to composition.  To see that this
type checks, remember that a : A, b : B a ->
A and B : A -> Set, so B a : Set and
B . b : B a -> Set.
A chain starting from (A,B) : Fam Set has the form
            (A,B) 
              |
              |
              | (a,b) : (Sigma a : A) B a -> A
              |
              V
            (A',B') = (B a, B . b) 
              |
              |
              | (a',b') : (Sigma a' : B a) (B . b) a' -> B a
              |
              V
            (A'',B'') = ((B . b) a', B . b . b')
              |
              |
              | (a'',b'') : (Sigma a'' : (B . b) a') (B . b . b') a'' -> (B . b) a'
              |
              V
             (A''',B''') = ((B . b . b') a'', B . b . b' . b'')
and so on.  Each successive family of sets is a "internal refinement" or restriction of its predecessor.    
Now, what about size questions?  I have defined a transition system to
have a set of states.  The situation appears to be this.  For
each fixed family of sets (A,B) we can put a transition
structure on the set of "internal-families" of (A,B), ie
the set
            sfam (A,B) = (Sigma a : A) B a -> A 
Namely, we take as transition sets
             T : sfam (A,B) -> Set
             T(a,b) =  (Sigma c : B a) B (b c) -> B a
                    =  sfam (B a, B . b) 
and as destinations 
             _[_] : (Pi (a,b) : sfam (A,B)) T (a,b) -> sfam (A,B) 
             (a,b)[c,d] = (b c, b . d) 
                          where a : A
                                b : B a -> A
                                c : B a
                                d : B (b c) -> B a
In other words, we can replace talk of "sets" by talk of "small sets". 
It may be interesting to chase up a result of 
Michael Rathjen's on the strength of Martin-Löf type theory
with a super-universe, parts I and II. Part I is the Archive for
Mathematical Logic, somewhere; part II is on his web page.  I think it
concerns a version of Martin-Löf type theory without
generalised inductive definitions, but with something like a
super-universe closed under the `next universe' operator. (And natural
numbers?)  If I recall correctly, he located the strength of this
system using the Veblen hierarchy based upon the G
(Gamma) function: he calls the ordinal Phi (G_0) 0 , and
I guess that Phi is the Veblen hierarchy on 
G_a.
Another thing here is Peter Aczel's hierarchy of `diagonalisation' functionals from his JSL 1972 paper on describing ordinals by means of functionals of transfinite type. It could be that the functionals considered here are in some way analogous to these diagonalisation functions. Peter Aczel identified a certain ordinal above the Bachmann-Howard ordinal (as it were, using not epsilon numbers in the third number class, but Gamma> numbers to index the diagonalisations) with the least countable ordinal that cannot be described by diagonalisation functionals of autonomously transfinite types. Who knows, maybe this ordinal will turn up here if one thinks about functionals of transfinite type.
Another thing to look at may be Herman Jervell's constructions from the 70's. None of the Edinburgh libraries seem to have the relevant volume of LNM.
Can't one formulate in this framework a notion of regularity, and deal with regular ordinals? After all, an ordinal is regular if (it is infinite and) closed under sums of sequences which are indexed by a smaller ordinal; the property presents itself as a closure property.
There is a (very possibly completely confused) suggestion for a regularity-based definition of `the ordinals' (up to, I presume the first inaccessible) as a transition system in the middle of the following page.
How does one account for recursive definitions in Peter's style? Are these universes merely fixed points, or least fixed points?