The recent thread "Can GC be beneficial" was quite beneficial :o) - to
me at least. I've reached a number of conclusions that allow me to
better place the conciliation between garbage collection and
deterministic finalization in the language design space - in C++ and in
general.
The following discussion focuses on C++-centric considerations, with
occasional escapes into "the right thing to do if we could break away
with the past.
Basic Tenets, Constraints, and Desiderata
=========================================
Garbage collection is desirable because:
(1) It automates a routine and error-prone task
(2) Reduces client code
(3) Improves type safety
(3) Can improve performance, particularly in multithreaded environments
On the other hand, C++ idioms based on constructors and destructors,
including, but not limited to, scoped resource management, have shown to
be highly useful. The large applicability of such idioms might actually
be the single most important reason for which C++ programmers shy away
from migrating to a garbage-collected C++ environment.
It follows that a set of principled methods that reconcile C++-style
programming based on object lifetime, with garbage collection, would be
highly desirable for fully exploiting garbage collection's advantages
within C++. This article discusses the challenges and to suggests
possible designs to address the challenges.
The constraints include compatibility with existing C++ code and styles
of coding, a preference for type safety at least when it doesn't
adversely incur a performance hit, and the functioning of today's
garbage collection algorithms.
A Causal Design
===============
Claim #1: The lifetime management of objects of a class is a decision of
the class implementer, not of the class user.
In support of this claim we come with the following examples:
a) A class such as complex<double> is oblivious to destruction
timeliness because it does not allocate scarce resource that need timely
release;
b) A class such as string doesn't need to worry about destruction
timeliness within a GC (Garbage Collected) environment;
c) A class such as temporary_file does need to worry about destruction
timeliness because it allocates scarce resources that transcend both the
lifetime of the object (a file handle) and the lifetime of the program
(the file on disk that presumably temporary_file needs to delete after
usage).
In all of these examples, the context in which the objects are used is
largely irrelevant (barring ill-designed types that employ logical
coupling to do entirely different actions depending on their state).
There is, therefore, a strong argument that the implementer of a class
decides entirely what the destruction regime of the class shall be. This
claim will guide design considerations below.
We'll therefore assume a C++ extension that allows a class definition to
include its destruction regime:
//
garbage collected
class [collected] Widget {...};
// deterministically destroyed
class [deterministic] Midget {...};
These two possible choices could be naturally complemented by the other
allowed storage classes of a class:
//
garbage collected or on stack
class [collected, auto] Widget {...};
// deterministically destroyed, stack, or static storage
class [deterministic, auto, static] Midget {...};
It is illegal, however, that a class specifies both collected and
deterministic regime:
//
illegal
class [collected, deterministic] Wrong {...};
Claim #2: Collected types cannot define a destruction-time action.
This proposal makes this claim in wake of negative experience with
Java's finalizers.
Claim #3: Collected types can transitively only embed fields of
collected types (or pointers thereof of any depth), and can only derive
from such types.
If a collected type would have a field of a non-collected type, that
type could not be destroyed (as per Claim #2).
If a collected type would have a field of pointer to a non-collected
type, one of two things happens:
a) A dangling pointer access might occur;
b) The resource is kept alive indeterminately and as such cannot be
destroyed (as per claim #2).
If a collected type would have a field of pointer to pointer to (notice
the double indirection) deterministic type, inevitably that pointer's
destination would have to be somehow accessible to the garbage-collected
object. This implies that at the some place in the points-to chain, a
"jump" must exist from the collected realm to the uncollected realm (be
it automatic, static, or deterministic) that would trigger either
post-destruction access, or would prevent the destructor to be called.
Design fork #1: Weak pointers could be supported. A collected type could
hold fields of type weak pointer to non-collected types. The weak
pointers are tracked and are zeroed automatically during destruction of
the resource that they point to. Further dereference attempts accesses
from the collected realm become hard errors.
Claim #4: Deterministic types must track all pointers to their
respective objects (via a precise mechanism such as reference counting
or reference linking).
If deterministic types did allow untracked pointer copying, then
post-destruction access via dangling pointers might occur. The recent
discussion in the thread "Can GC be beneficial" has shown that it is
undesirable to define post-destruction access, and it's best to leave it
as a hard run-time error.
Design branch #2: For type safety reasons, disallow type-erasing
conversions from/to pointers to deterministic types:
class [deterministic] Widget {...};
Widget * p = new Widget;
void * p1 = p; // error
p = static_cast<Widget *>(p1); // error, too
Or: For compatibility reasons, allow type-erasing conversion and incur
the risk of dangling pointer access.
Design branch #3: For purpose of having a type that stands in as a
pointer to any deterministic type (a sort of "deterministic void*"), all
deterministic classes could be thought as (or required to) inherit a
class std::deterministic.
Design branch #3.1: std::deterministic may or may not define virtuals,
and as such confines or not all deterministic classes to have virtuals
(and be suitable for dynamic_cast among other things).
Claim #5: When an object of deterministic type is constructed in
automatic or static storage, its destructor will automatically issue a
hard error if there are any outstanding pointers to it (e.g., the
reference count is greater than one).
If that didn't happen, dangling accesses to expired stack variables
might occur:
class [deterministic] Widget {...};
Widget * p;
int Fun() {
Widget w;
p = &w;
// hard runtime error upon exiting this scope
}
Discussion of the basic design
==============================
The desiderata set up and the constraints of the current C++ language
created a causal chain that narrowly guided the possible design of an
integrated garbage collection + deterministic destruction in C++:
* The class author decides whether the class is deterministic or garbage
collected
* As a natural extension, the class author can decide whether objects of
that type are allowed to sit on the stack or in static storage. (The
regime of automatic and static storage will be discussed below.)
* Depending on whether a type is deterministic versus collected, the
compiler generates different code for copying pointers to the object.
Basically the compiler automates usage of smart pointers, a
widely-followed semiautomatic discipline in C++.
* The heap is conceptually segregated into two realms. You can hold
unrestricted pointers to objects in the garbage-collected realm, but the
garbage-collected realm cannot hold pointers outside of itself.
* The operations allowed on pointers to deterministic objects are
restricted.
Regime of Automatic Storage
===========================
Claim 6: Pointers to either deterministic or collected objects that are
actually stack allocated should not escape the scope in which their
pointee object exists.
This obvious claim prompts a search in look for an efficient solution to
a class of problems. Here is an example:
class [auto, collected] Widget {...};
void Midgetize(Widget & obj) {
obj.Midgetize();
}
void Foo() {
Widget giantWidget;
Midgetize(giantWidget);
}
To make the example above work, Foo is forced to heap-allocate the
Widget object even though the Midgetize function works on it
transitorily and stack allocation would suffice.
To address this problem a pointer/reference modifier, "auto", can be
defined. Its semantics allow only "downward copying": an
pointer/reference to auto can only be copied to lesser scope, never to
object of larger scope. Examples:
void foo() {
Widget w;
Widget *auto p1 = &w1; // fine, p1 has lesser scope
{
Widget *auto p2 = &w; // fine
p2 = p1; // fine
p1 = p2; // error! Escaping assignment!
}
}
Then the example above can be made modularly typesafe and efficient like
this:
class [auto, collected] Widget {...};
void Midgetize(Widget &auto obj) {
obj.Midgetize();
}
void Foo() {
Widget giantWidget;
Midgetize(giantWidget); // fine
}
Claim #6: "auto"-modified pointers cannot be initialized or assigned
from heap-allocated deterministic objects.
If "auto"-modified pointers manipulated the reference count, their
efficiency advantage would be lost. If they didn't, a type-unsafe
situation can easily occur.
Does operator delete still exist?
=================================
For collected objects, delete is inoperant, as is for static or
automatic objects. On a heap-allocated deterministic object, delete can
simply check if the reference count is 1, and if so, reassign zero to
the pointer. If the reference count is greater than one, issue a hard
error.
Note that this makes delete entirely secure. There is no way to have a
working program that issues a dangling access after delete has been
invoked.
Regime of Static Storage
========================
Static storage has the peculiarity that it can easily engender
post-destruction access. This is because the order of module
initialization is not defined, and therefore cross-module dependencies
among objects of static duration are problematic.
This article delays discussion of the regime of static storage.
Hopefully with help from the community, a workable solution to the
cross-module initialization would ensue.
Templates
=========
Claim #7: The collection regime of any type must be accessible during
compilation to templated code.
Here's a simple question: is vector<T> deterministic or collected?
If it were collected, it couldn't hold deterministic types (because at
the end of the day vector<T> must embed a T*). If it were deterministic,
collected types couldn't hold vectors of pointers to collected types,
which would be a major and gratuitous restriction.
So the right answer is: vector<T> has the same regime as T.
template <class T, class A>
class [T::collection_regime] vector { // or some other syntax
...
};
The New World: How Does it Look Like?
=====================================
After this design almost happening as a natural consequence of an
initial set of constraints, the natural question arises: how would
programs look like in a C++ with these amenities?
Below are some considerations:
* Pointer arithmetic, unions, and casts must be reconsidered (a source
of unsafety not thoroughly discussed)
* Most types would be [collected]. Only a minority of types, those that
manage non-memory resources, would live in the deterministic realm.
* Efficiency of the system will not degrade compared to today's C++. The
reduced need for reference-counted resources would allow free and fast
pointer copying for many objects; the minority that need care in
lifetime management will stay tracked by the compiler, the way they
likely were manipulated (by hand) anyway.
* Given that the compiler can apply advanced analysis to eliminate
reference count manipulation in many cases, it is likely that the
quality of built-in reference counting would be superior to
manually-implemented reference counting, and on a par with advanced
manual careful manipulation of a mix of raw and smart pointers.
----------------------
Whew! Please send any comments you have to this group. Thanks!
Andrei
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]