[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: incremental linking
<---- MLj users ---- http://www.dcs.ed.ac.uk/home/mlj/doc/users.html ---->
Andrew Kennedy writes:
> David,
>
> Thanks very much for your comments.
Thank you and your co-workers for MLj.
> Before discussing incremental linking,
> I'd like to point out that in a future release of MLj (probably not ready
> for end September, but soon after that) the small amount of "separate
> compilation" that happens already (parsing, type checking, translation to
> intermediate language, limited optimisation) will be made "persistent".
(I read the paper and I am waiting eagerly for the new version. I very glad
that the new version takes the position that interlanguage programming should
be made painless where reasonable.)
> That
> is to say, the intermediate language term and signature for each module will
> be pickled in files so that modules don't need to be recompiled between MLj
> sessions.
Would it be possible to also pickle tymap, SE, supply, and e?
> Even so, there's an awful lot that happens "post-link". When designing MLj
> we did consider module-to-class compilation, but after a certain amount of
> experimentation rejected it as having too detrimental an impact on runtime
> performance.
I'm not thinking of separate compilation. By incremental
linking/compilation I'm asking for the compiler to emit jar files as
soon as it can and to remember the compiler state between compilations
so that entities can be added to a project without restarting the
compilation from scratch (ergo the request to pickle tymap etc). Each
jar file would contain the incremental bytecode generated by adding an
entity to a project. Since class inheritance is the JVM mechanism for
link time extension of classes, the classes of a jar file would inherit
from the classes of the jar file emitted prior to it (in dependency
order). The effect of inheritance would be a set of classes that are
equvalent to the classes generated by the current batch compiler.
Entities should be made immutable so a change in source would create a
new new entity and would result in an extended dependency list. New
jar files would be created for the new entity and for the entities that
depend upon it. If the source was changed for a entity early in the
dependency list, many new jar files would be generated. I can live
with that since most changes will be towards the leaves of the dependency
tree.
Of course I would like to have an interactive toplevel but that
is not my main concern. My main concern is that (at present)
conservative addtions to an application require a complete new jar
file. This negates one of JVM's greatest strengths--dynamic linking.
For example, a MLj application can use plug-ins, but only if the
plug-in implements a Java interface. This is both clumsy and inhibits
optimization.
> Suppose that a structure is compiled down to a class.
> Presumably each value binding would translate to a static final field (or
> possibly a static method for function values). Obviously we lose
> cross-module optimisation, but modern JITs do quite a bit of inlining so we
> might make up some lost ground (note that this wasn't the case when we
> started designing MLj). However, the biggest issue is uniform data
> representation, in particular *polymorphism*. The easiest compilation
> technique for polymorphism (still used by O'Caml) is for values to be given
> uniform representations so only one version of a polymorphic function is
> required. For native-code compilers, this entails using pointers for all
> values, "boxing" non-pointer values such as ints and floats where necessary.
> For Java, it would mean using Object for all values. This is very expensive,
> requiring extra allocations on the heap *and* many downcast operations to
> get code past the verifier.
I understand this very well. This is the primary reason I am
interested in MLj. (FWIW, I have read most of the O'Caml source and a
fair bit of the MLj source so feel free to instruct me to read relavant
bits of source.)
> Monomorphisation is easy and obviously more efficient, but relies on (a)
> there being a finite, statically-known number of instantiations of each
> polymorphic function (true for ML but not for Haskell) and (b) having the
> whole program available.
Since it is compiling in dependency order, an incremental compiler has
available all the the program that an entitiy depends upon, and since
it need not emit code before instanciation, I would think that it could
generate exactly the same code that is generated by the current batch
compiler. The difference would be that the code would be split over
multiple classes in multiple jar files. The hacked code that I wrote
to ilustrate my thoughts was mostly converting the "breadth first"
map D (map C (map B (map A dependency_list)))
to a "depth first"
map (D o C o B o A) dependency_list
Incremental linking is not exactly this trivial but the hacked code
I refered to in the previous note is a fairly straight forward mutation of
the existing code. I incrementalized the transformations up to the
ApplyOpts stage. Incrementalizing the ApplyOpts stage would prob. require
delaying some optimizations until instanciation. In pratice, would so
many delayed compilations be generated that there would be no advantage
to incremental linking? Or as Ian Stark suggested, could delays be
avoided by accepting less optimised code during developement? I am
guessing that the fact that code explosion is not a problem means that
not many delayed compilations will be needed in practice.
> One intriguing alternative might be to simulate parametric polymorphism in
> the Java classes by inventing an extended class format that describes
> parameterized classes and polymorphic methods. GJ does this already. These
> could be instantiated to real Java classes either statically by a special
> class linker, or dynamically at class load time (code generation on the
> fly!).
I was thinking of this but did not raise it in the previous message. I
think that it is prob. a good idea. PMG (Poor Man's Generics) does
this. It is a Java compiler that does class-load time specialization
of classfiles to implement generics. The PMG class loader does only a
trivial transformation so the added delay is acceptable. The
transformations for MLj could be more time consuming but so long as they are
not much worse than standard linkers (and there exists a method that
will force all classes to be loaded at program initialization time), I
would be happy.
In contrast to PMG, for MLj one would want a more flexible classfile
format where the polymorphism could be over base classes rather than
just subclasses of Object. The classfile representation used
internally by MLj might be sufficient. But I was wondering if Blocks
would not be better. Can the existing optimizer (including the
monomorphizer) be used on Blocks? How much optimization must be done
before closure conversion?
Along with polymorphic classfiles and classfile loadtime monomorphing,
you will prob. want classfile loadtime generation of curryed function
varients and apply functions (the same way that O'Caml generates these
at link time).
Thanks for reading such a long message.
-David Gurr