This section outlines some of the library's major features. Except asneccessary to avoid confusion, details of library implementation areomitted.
导出类
C++ classes and structs are exposed with a similarly-terse interface.Given:
C++类和结构是用同样简洁的接口导出的。如有:
struct World
{
void set(std::string msg) { this->msg = msg; }
std::string greet() { return msg; }
std::string msg;
};
The following code will expose it in our extension module:
以下代码会将它导出到扩展模块:
#include <boost/python.hpp>
BOOST_PYTHON_MODULE(hello)
{
class_<World>("World")
.def("greet", &World::greet)
.def("set", &World::set)
;
}
Although this code has a certain pythonic familiarity, peoplesometimes find the syntax bit confusing because it doesn't look likemost of the C++ code they're used to. All the same, this is juststandard C++. Because of their flexible syntax and operatoroverloading, C++ and Python are great for defining domain-specific(sub)languages(DSLs), and that's what we've done in Boost.Python. To break it down:
尽管上述代码具有某种熟悉的Python风格,但语法还是有点令人迷惑,因为它看起来不像通常的C++代码。但是,这仍然是正确的标准C++。因为C++和Python具有灵活的语法和运算符重载,它们都很善于定义特定领域(子)语言(DSLs, domain-specific (sub)languages)。我们在Boost.Python里面就是定义了一个DSL。把代码拆开来看:
class_<World>("World")
constructs an unnamed object of type class_<World> and passes"World" to its constructor. This creates a new-style Python classcalled World in the extension module, and associates it with theC++ type World in the Boost.Python type conversion registry. Wemight have also written:
构造了一个匿名对象,类型为class_<World>,并把"World"传递给它的构造函数。这将在扩展模块里创建一个新型Python类World,并在Boost.Python的类型转换注册表里,把它和C++类型World关联起来。我们也可以这么写:
class_<World> w("World");
but that would've been more verbose, since we'd have to name wagain to invoke its def() member function:
但是那会显得更冗长,因为我们不得不再次通过w去调用它的def()成员函数:
w.def("greet", &World::greet)
There's nothing special about the location of the dot for memberaccess in the original example: C++ allows any amount of whitespace oneither side of a token, and placing the dot at the beginning of eachline allows us to chain as many successive calls to member functionsas we like with a uniform syntax. The other key fact that allowschaining is that class_<> member functions all return a referenceto *this.
原来的例子里的点表示成员访问,它的位置没有什么特别的:因为C++允许标记(token)的两边可以有任意数量的空白符。把点放在每行的开始,允许我们以一致的句法,链式串接连续的成员函数调用,想串多少都行。允许链式调用的另一关键是,class_<>的成员函数都返回对*this的引用。
So the example is equivalent to:
因此本例等同于:
class_<World> w("World");
w.def("greet", &World::greet);
w.def("set", &World::set);
It's occasionally useful to be able to break down the components of aBoost.Python class wrapper in this way, but the rest of this articlewill stick to the terse syntax.
这种方式将Boost.Python类包装的部件都拆分开来了,能这样拆分有时候是有用的。但本文下面仍将坚持使用简洁格式。
For completeness, here's the wrapped class in use:
最后来看封装类的使用:
>>> import hello
>>> planet = hello.World()
>>> planet.set('howdy')
>>> planet.greet()
'howdy'
构造函数
Since our World class is just a plain struct, it has animplicit no-argument (nullary) constructor. Boost.Python exposes thenullary constructor by default, which is why we were able to write:
由于我们的World类只是一个简单的struct,它有一个隐式的无参数的构造函数。Boost.Python默认会导出这个无参数的构造函数,所以我们可以这样写:
>>> planet = hello.World()
However, well-designed classes in any language may require constructorarguments in order to establish their invariants. Unlike Python,where __init__ is just a specially-named method, In C++constructors cannot be handled like ordinary member functions. Inparticular, we can't take their address: &World::World is anerror. The library provides a different interface for specifyingconstructors. Given:
然而,在任何语言里,对于设计良好的类,构造函数可能需要参数,以建立类的不变式(invariant)。Python的__init__只是一个特殊命名的方法,而C++的构造函数与Python不同,它不能像普通成员函数那样处理。特别是,我们不能取它的地址:&World::World是一个错误。Boost.Python库提供了一个不同的接口来指定构造函数。假设有:
struct World
{
World(std::string msg); // added constructor
...
we can modify our wrapping code as follows:
我们可以如下修改封装代码:
class_<World>("World", init<std::string>())
...
of course, a C++ class may have additional constructors, and we canexpose those as well by passing more instances of init<...> todef():
当然,C++类可能还有其他的构造函数,我们也可以导出它们,只需要向def()传入更多的init<...>实例:
class_<World>("World", init<std::string>())
.def(init<double, double>())
...
Boost.Python allows wrapped functions, member functions, andconstructors to be overloaded to mirror C++ overloading.
Boost.Python封装的函数、成员函数,以及构造函数都可以重载,以映射C++中的重载。
数据成员和属性
Any publicly-accessible data members in a C++ class can be easilyexposed as either readonly or readwrite attributes:
C++中任何可公有访问的数据成员,都能轻易地封装成readonly或者readwrite属性:
class_<World>("World", init<std::string>())
.def_readonly("msg", &World::msg)
...
and can be used directly in Python:
并直接在Python中使用:
>>> planet = hello.World('howdy')
>>> planet.msg
'howdy'
This does not result in adding attributes to the World instance__dict__, which can result in substantial memory savings whenwrapping large data structures. In fact, no instance __dict__will be created at all unless attributes are explicitly added fromPython. Boost.Python owes this capability to the new Python 2.2 typesystem, in particular the descriptor interface and property type.
这不会在World实例__dict__中添加属性,从而在封装大型数据结构时节省大量的内存。实际上,根本不会创建实例__dict__,除非从Python显式添加属性。Boost.Python的这种能力归功于Python 2.2新的类型系统,尤其是描述符(descriptor)接口和property类型。
In C++, publicly-accessible data members are considered a sign of poordesign because they break encapsulation, and style guides usuallydictate the use of "getter" and "setter" functions instead. InPython, however, __getattr__, __setattr__, and since 2.2,property mean that attribute access is just one morewell-encapsulated syntactic tool at the programmer's disposal.Boost.Python bridges this idiomatic gap by making Python propertycreation directly available to users. If msg were private, wecould still expose it as attribute in Python as follows:
在C++中,人们认为,可公有访问的数据成员是设计糟糕的标志,因为它们破坏了封装性,并且风格指南通常指示使用“getter”和“setter”函数作为替代。然而在Python里,__getattr__、__setattr__,和2.2版出现的property意味着,属性访问仅仅是一种任由程序员选用的、封装性更好的语法工具。Boost.Python让用户可直接创建Python property,从而消除了二者语言习惯上的差异。即使msg是私有的,我们仍可把它导出为Python中的属性,如下:
class_<World>("World", init<std::string>())
.add_property("msg", &World::greet, &World::set)
...
The example above mirrors the familiar usage of properties in Python2.2+:
上例等同于Python 2.2+里面熟悉的属性的用法:
>>> class World(object):
... __init__(self, msg):
... self.__msg = msg
... def greet(self):
... return self.__msg
... def set(self, msg):
... self.__msg = msg
... msg = property(greet, set)
运算符重载
The ability to write arithmetic operators for user-defined types hasbeen a major factor in the success of both languages for numericalcomputation, and the success of packages like NumPy attests to thepower of exposing operators in extension modules. Boost.Pythonprovides a concise mechanism for wrapping operator overloads. Theexample below shows a fragment from a wrapper for the Boost rationalnumber library:
两种语言都能够为用户自定义类型编写算术运算符,这是它们在数值计算上获得成功的主要因素,并且,像NumPy这样的软件包的成功证明了在扩展模块中导出运算符的威力。Boost.Python为封装运算符重载提供了简洁的机制。下面是Boost有理数库封装代码的片断:
class_<rational<int> >("rational_int")
.def(init<int, int>()) // constructor, e.g. rational_int(3,4)
.def("numerator", &rational<int>::numerator)
.def("denominator", &rational<int>::denominator)
.def(-self) // __neg__ (unary minus)
.def(self + self) // __add__ (homogeneous)
.def(self * self) // __mul__
.def(self + int()) // __add__ (heterogenous)
.def(int() + self) // __radd__
...
The magic is performed using a simplified application of "expressiontemplates" [VELD1995], a technique originally developed foroptimization of high-performance matrix algebra expressions. Theessence is that instead of performing the computation immediately,operators are overloaded to construct a type representing thecomputation. In matrix algebra, dramatic optimizations are oftenavailable when the structure of an entire expression can be taken intoaccount, rather than evaluating each operation "greedily".Boost.Python uses the same technique to build an appropriate Pythonmethod object based on expressions involving self.
魔法的施展只是简单应用了“表达式模板(expression templates)”[VELD1995],一种最初为高性能矩阵代数表达式优化而开发的技术。其精髓是,不是立即进行计算,而是利用运算符重载,来构造一个代表计算的类型。在矩阵代数里,当考虑整个表达式的结构,而不是“贪婪地”对每步运算求值时,经常可以获得显著的优化。Boost.Python使用了同样的技术,它用包含self的表达式,构建了一个适当的Python成员方法对象。
继承
C++ inheritance relationships can be represented to Boost.Python by addingan optional bases<...> argument to the class_<...> templateparameter list as follows:
要在Boost.Python里描述C++继承关系,可以在class_<...>模板参数列表里添加一个可选的bases<...>,如下:
class_<Derived, bases<Base1,Base2> >("Derived")
...
This has two effects:
这有两种作用:
- When the class_<...> is created, Python type objectscorresponding to Base1 and Base2 are looked up inBoost.Python's registry, and are used as bases for the new PythonDerived type object, so methods exposed for the Python Base1and Base2 types are automatically members of the Derivedtype. Because the registry is global, this works correctly even ifDerived is exposed in a different module from either of itsbases.
- C++ conversions from Derived to its bases are added to theBoost.Python registry. Thus wrapped C++ methods expecting (apointer or reference to) an object of either base type can becalled with an object wrapping a Derived instance. Wrappedmember functions of class T are treated as though they have animplicit first argument of T&, so these conversions areneccessary to allow the base class methods to be called for derivedobjects.
- 当class_<...>创建时,会在Boost.Python的注册表里查找Base1和Base2所对应的Python类型对象,并将它们作为新的Python Derived类型对象的基类,因此为Python的Base1和Base2类型导出的成员函数自动成为Derived类型的成员。因为注册表是全局的,所以Derived和它的基类可以在不同的模块中导出。
- 在Boost.Python的注册表里,添加了从Derived到它的基类的C++转换。这样,封装了Derived实例的对象就可以调用其基类的方法,而该封装的C++方法本该由一个基类对象(指针或引用)来调用。类T的成员方法封装后,可视为它们具有一个隐含的第一参数T&,所以为了允许派生类对象调用基类方法,这些转换是必须的。
Of course it's possible to derive new Python classes from wrapped C++class instances. Because Boost.Python uses the new-style classsystem, that works very much as for the Python built-in types. Thereis one significant detail in which it differs: the built-in typesgenerally establish their invariants in their __new__ function, sothat derived classes do not need to call __init__ on the baseclass before invoking its methods :
当然,也可以从封装的C++类实例派生新的Python类。因为Boost.Python使用了新型类系统,从封装类派生就像是从Python内置类型派生一样。但有一个重大区别:内置类型一般在__new__函数里建立不变式,因此其派生类不需要调用基类的__init__:
>>> class L(list):
... def __init__(self):
... pass
...
>>> L().reverse()
>>>
Because C++ object construction is a one-step operation, C++ instancedata cannot be constructed until the arguments are available, in the__init__ function:
因为C++的对象构造是一个单步操作,在__init__函数里,只有参数齐全,才能构造C++实例数据:
>>> class D(SomeBoostPythonClass):
... def __init__(self):
... pass
...
>>> D().some_boost_python_method()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: bad argument type for built-in operation
This happened because Boost.Python couldn't find instance data of typeSomeBoostPythonClass within the D instance; D's __init__function masked construction of the base class. It could be correctedby either removing D's __init__ function or having it callSomeBoostPythonClass.__init__(...) explicitly.
发生错误的原因是,Boost.Python在实例D中,找不到类型SomeBoostPythonClass的实例数据;D的__init__函数遮盖了基类的构造函数。纠正方法为,删除D的__init__函数,或者让它显式调用SomeBoostPythonClass.__init__(...)。
虚函数
Deriving new types in Python from extension classes is not veryinteresting unless they can be used polymorphically from C++. Inother words, Python method implementations should appear to overridethe implementation of C++ virtual functions when called through baseclass pointers/references from C++. Since the only way to alter thebehavior of a virtual function is to override it in a derived class,the user must build a special derived class to dispatch a polymorphicclass' virtual functions:
用Python从扩展类派生新的类型没有太大意思,除非可以在C++里面多态地使用派生类。换句话说,在C++里,通过基类指针或引用调用C++虚函数时,Python实现的方法应该看起来像是覆盖了C++虚函数的实现。因为改变虚函数行为的唯一方法是,在派生类里覆盖它,所以用户必须构建一个特殊的派生类,来分派多态类的虚函数:
//
// interface to wrap:
//
class Base
{
public:
virtual int f(std::string x) { return 42; }
virtual ~Base();
};
int calls_f(Base const& b, std::string x) { return b.f(x); }
//
// Wrapping Code
//
// Dispatcher class
struct BaseWrap : Base
{
// Store a pointer to the Python object
BaseWrap(PyObject* self_) : self(self_) {}
PyObject* self;
// Default implementation, for when f is not overridden
int f_default(std::string x) { return this->Base::f(x); }
// Dispatch implementation
int f(std::string x) { return call_method<int>(self, "f", x); }
};
...
def("calls_f", calls_f);
class_<Base, BaseWrap>("Base")
.def("f", &Base::f, &BaseWrap::f_default)
;
Now here's some Python code which demonstrates:
这是Python演示代码:
>>> class Derived(Base):
... def f(self, s):
... return len(s)
...
>>> calls_f(Base(), 'foo')
42
>>> calls_f(Derived(), 'forty-two')
9
Things to notice about the dispatcher class:
关于分派类需要注意:
- The key element which allows overriding in Python is thecall_method invocation, which uses the same global typeconversion registry as the C++ function wrapping does to convert itsarguments from C++ to Python and its return type from Python to C++.
- Any constructor signatures you wish to wrap must be replicated withan initial PyObject* argument
- The dispatcher must store this argument so that it can be used toinvoke call_method
- The f_default member function is needed when the function beingexposed is not pure virtual; there's no other way Base::f can becalled on an object of type BaseWrap, since it overrides f.
- 允许在Python里覆盖的关键因素是call_method调用,与C++函数封装一样,它使用同一个全局注册表,把参数从C++转换到Python,并把返回类型从Python转换到C++。
- 任何你希望封装的构造函数,其函数签名必须有一个的相同的初始化参数PyObject*。
- 分派者必须保存这个参数,以便调用call_method时使用。
- 当导出的函数不是纯虚函数时,就需要f_default成员函数;在BaseWrap类型的对象里,没有其他方式可以调用Base::f,因为f被覆盖了。
更深的反射即将出现?
Admittedly, this formula is tedious to repeat, especially on a projectwith many polymorphic classes. That it is neccessary reflects somelimitations in C++'s compile-time introspection capabilities: there'sno way to enumerate the members of a class and find out which arevirtual functions. At least one very promising project has beenstarted to write a front-end which can generate these dispatchers (andother wrapping code) automatically from C++ headers.
无可否认,重复这种公式化动作是冗长乏味的,尤其是项目里有大量多态类的时候。这里有必要反映一些C++编译时内省能力的限制:C++无法列举类的成员并找出虚函数。不过,至少有一个项目已经启动,有希望编写出一个前端程序,可以从C++头文件自动生成这些分派类(和其他封装代码),
Pyste is being developed by Bruno da Silva de Oliveira. It builds onGCC_XML, which generates an XML version of GCC's internal programrepresentation. Since GCC is a highly-conformant C++ compiler, thisensures correct handling of the most-sophisticated template code andfull access to the underlying type system. In keeping with theBoost.Python philosophy, a Pyste interface description is neitherintrusive on the code being wrapped, nor expressed in some unfamiliarlanguage: instead it is a 100% pure Python script. If Pyste issuccessful it will mark a move away from wrapping everything directlyin C++ for many of our users. It will also allow us the choice toshift some of the metaprogram code from C++ to Python. We expect thatsoon, not only our users but the Boost.Python developers themselveswill be "thinking hybrid" about their own code.
Bruno da Silva de Oliveira正在开发Pyste。Pyste基于GCC_XML构建,而GCC_XML可以生成XML版本的GCC内部程序描述。因为GCC是一种高度兼容标准的C++编译器,从而确保了对最复杂的模板代码的正确处理,和对底层类型系统的完全访问。和Boost.Python的哲学一致,Pyste接口描述既不侵入待封装的代码,也不使用某种不熟悉的语言来表达,相反,它是100%的纯Python脚本。如果Pyste成功的话,它将标志,我们的许多用户不必直接用C++封装所有东西。Pyste也将允许我们选择性地把一些元编程代码从C++转移到Python。我们期待不久以后,不仅用户,而且Boost.Python开发者也能,“混合地思考”他们自己的代码。(译注:Pyste已不再维护,更新的是Py++。)
序列化
Serialization is the process of converting objects in memory to aform that can be stored on disk or sent over a network connection. Theserialized object (most often a plain string) can be retrieved andconverted back to the original object. A good serialization system willautomatically convert entire object hierarchies. Python's standardpickle module is just such a system. It leverages the language's strongruntime introspection facilities for serializing practically arbitraryuser-defined objects. With a few simple and unintrusive provisions thispowerful machinery can be extended to also work for wrapped C++ objects.Here is an example:
序列化(serialization)是指,把内存中的对象转换成可保存格式,使之可以保存到磁盘上,或通过网络传送。序列化后的对象(最常见的是普通字符串),可以恢复并转换回原来的对象。好的序列化系统会自动转换整个对象层次结构。Python的标准模块pickle正是这样的系统。它利用了语言强大的运行时内省机制,可以序列化几乎任意的用户自定义对象。只需加入一些简单的、非侵入的处理,就可以扩展这个威力巨大的系统,使它也能用于封装的C++对象。下面是一个例子:
#include <string>
struct World
{
World(std::string a_msg) : msg(a_msg) {}
std::string greet() const { return msg; }
std::string msg;
};
#include <boost/python.hpp>
using namespace boost::python;
struct World_picklers : pickle_suite
{
static tuple
getinitargs(World const& w) { return make_tuple(w.greet()); }
};
BOOST_PYTHON_MODULE(hello)
{
class_<World>("World", init<std::string>())
.def("greet", &World::greet)
.def_pickle(World_picklers())
;
}
Now let's create a World object and put it to rest on disk:
现在,让我们创建一个World对象并把它保存到磁盘:
>>> import hello
>>> import pickle
>>> a_world = hello.World("howdy")
>>> pickle.dump(a_world, open("my_world", "w"))
In a potentially different script on a potentially differentcomputer with a potentially different operating system:
然后,可能是在不同的计算机、不同的操作系统上,一个脚本可能这样恢复对象:
>>> import pickle
>>> resurrected_world = pickle.load(open("my_world", "r"))
>>> resurrected_world.greet()
'howdy'
Of course the cPickle module can also be used for fasterprocessing.
当然,使用cPickle模块可以更快速地处理。
Boost.Python's pickle_suite fully supports the pickle protocoldefined in the standard Python documentation. Like a __getinitargs__function in Python, the pickle_suite's getinitargs() is responsible forcreating the argument tuple that will be use to reconstruct the pickledobject. The other elements of the Python pickling protocol,__getstate__ and __setstate__ can be optionally provided via C++getstate and setstate functions. C++'s static type system allows thelibrary to ensure at compile-time that nonsensical combinations offunctions (e.g. getstate without setstate) are not used.
Boost.Python的pickle_suite完全支持标准Python文档定义的pickle协议。类似Python里的__getinitargs__函数,pickle_suite的getinitargs()负责创建参数元组,以重建pickle的对象。 Python pickle协议中的其他元素,__getstate__和__setstate__,可以通过C++ getstate和setstate函数来提供,也可以不提供。利用C++的静态类型系统,Boost.Python库在编译时保证,不会使用没有意义的函数组合(例如,有getstate无setstate)。
Enabling serialization of more complex C++ objects requires a littlemore work than is shown in the example above. Fortunately theobject interface (see next section) greatly helps in keeping thecode manageable.
要想序列化更复杂的C++对象,就需要做更多的工作。幸运的是,object接口(见下一节)帮了大忙,它保持了代码的可管理性。
Object接口
Experienced 'C' language extension module authors will be familiarwith the ubiquitous PyObject*, manual reference-counting, and theneed to remember which API calls return "new" (owned) references or"borrowed" (raw) references. These constraints are not justcumbersome but also a major source of errors, especially in thepresence of exceptions.
对于有经验的'C'语言扩展模块的作者,他们应该熟悉无所不在的PyObject*,手工引用计数,而且需要记住哪个API调用返回“新的”(拥有的)引用,哪个返回“借来的”(原始的)引用。这些约束不仅麻烦,而且是主要的错误源,尤其是面临异常的时候。
Boost.Python provides a class object which automates referencecounting and provides conversion to Python from C++ objects ofarbitrary type. This significantly reduces the learning effort forprospective extension module writers.
Boost.Python提供了一个object类,它能够自动进行引用计数,并且能把任意类型的C++对象转换到Python。对于未来的扩展模块的编写者来说,这极大地减轻了学习的负担。
Creating an object from any other type is extremely simple:
从任何其他类型创建object极其简单:
object s("hello, world"); // s manages a Python string
object has templated interactions with all other types, withautomatic to-python conversions. It happens so naturally that it'seasily overlooked:
object和所有其他类型的交互,以及到Python的自动转换,都已经模板化了。这一切进行得如此自然,以至于可以轻松地忽略掉它:
object ten_Os = 10 * s[4]; // -> "oooooooooo"
In the example above, 4 and 10 are converted to Python objectsbefore the indexing and multiplication operations are invoked.
上例中,在调用索引和乘法操作之前,4和10被转换成了Python对象。
The extract<T> class template can be used to convert Python objectsto C++ types:
用类模板extract<T>可以把Python对象转换成C++类型:
double x = extract<double>(o);
If a conversion in either direction cannot be performed, anappropriate exception is thrown at runtime.
如果有一个方向的转换不能进行,则将在运行时抛出一个适当的异常。
The object type is accompanied by a set of derived typesthat mirror the Python built-in types such as list, dict,tuple, etc. as much as possible. This enables convenientmanipulation of these high-level types from C++:
除了object类型,还有一组派生类型,它们尽可能地对应于Python内置类型,如list、dict、tuple等等。这样就能方便地从C++操作这些高级类型了:
dict d;
d["some"] = "thing";
d["lucky_number"] = 13;
list l = d.keys();
This almost looks and works like regular Python code, but it is pureC++. Of course we can wrap C++ functions which accept or returnobject instances.
这看起来几乎就像是正规的Python代码,运行起来也像,但它是纯的C++。当然我们也能封装接受或返回object实例的C++函数。