Original Post is
here.
Why I wrote this article?
When I learn more about Android’s graphics system, and do more work about how to use CPU/GPU in more paralleled way to improve the graphics performance in Android, I start to think that there are actually some big design mistakes in Android graphics system, especially the rendering architecture in the client side.
Some mistakes have been solved after 3.x, especially above 4.1, but others can never be solved due to the compatible reason. As developers, we need to know how the Android graphics system work, how to utilize the new features Android 3.x and 4.x provided, and how to do the optimazation and overcome the shortage of Android.
Direct Rendering
In the very beginning, Android’s client/server architecture of its graphics system choose the direct rendering mode, Android team develop a lightweight Window Compositor – SurfaceFlinger by themself, it response to create native windows, allocate graphics buffers and composite the window surface to display (via framebuffer). In client side, apps (through Android runtime) can use CPU to access the graphics buffer of native window directly (through Skia), or use GPU to manipulate it indirectly (through GLES). This direct rendeing mode is more appropriate than the traditional X in desktop Linux.
Window Composite in Server Side
SurfaceFlinger use GLES1 to do the window composition (via texture operation) in the very beginning, this cause two issues :
- GPU may have more efficient way to do the window composition than use GLES;
- When the client side also use GLES to render the window surface, it may competes with the server about the GPU resources;
Just mention, when GPU do not support GLES1, Android has a built-in SW version GLES1 stack, and it can use copybit HAL module provided by chip vendor to accelerated the texture operation.
So, after 3.0, Android introduce hwcomposer HAL module to solve these issues, also abandon the old copybit module. Chip vendor can through the implementation of hwcomposer module to declare they can do all of or part of windows’ composition by themselves, usually with dedicated hardware and always more efficient than use GLES. Also, after 4.1, hwcomposer can provide the vsync signal of the display, so that Android can sync three things together :
- the windows composition in server side
- the rendering of window surface in client side
- the refresh of display
Rendering Architecture in Client Side
The most mistake Android team make is the rendering architecture of its GUI framework :
- It do not has a layer rendering architecture (or called scene graph in some GUI fw);
- It do not has a dedicated rendering thread to render the window surface;
- It’s rendering only use CPU until 3.0;
The first one is partially support after 3.0, the third is support after 3.0, but the second problem can never be solved…
Compare to iOS
In iOS, every View has a Layer as its backing store, app can create more Layers for better performance. View’s content will be drew into its Layer, as long as the content do not changed, the View do not need to draw itself again. iOS do a lot of things to avoid change the content of a View, many many properties of a View can be changed without affect to its content, such as background, border, opacity, position, transformation, even the geometry!!!
The composition of Layers done by another dedicated rendering thread, it always use GPU to draw each Layer to the window surface. The main thread only reponse to handle touch event, relayout the Views, draw View’s content into its Layer when necessary, etc… So the main thread only use CPU and the rendering thread use GPU mostly, and I think there will be just a few synchronization between these two threads, and they can run concurrently in most of time.
But in Android, the main thread need to do everything, such as handle touch events, relayout views, dequeue/enqueue graphics buffer, draw views’ content on window surface, and other things need to be done by app… And it only use CPU before 3.0!!! Even the position of a View just change one pixel, Android need to redraw its content along with the contents of other Views overlapped, and the content drawing is very expensive for CPU!!!
The Improvements
A lot improvements have been made after 3.0 to overcome the shortage of previous versions. Android 3.0 introduce a new hwui module, and it can use GPU to accelerated the drawing of View’s content, it create a hw accelerated Canvas to replace the old sw Canvas, the new Canvas use OpenGL ES as its backend instead of use SkCanvas from Skia.
Android 3.0 also introduce the DisplayList mechanism, DisplayList can be considered as a 2D drawing commands buffer, every View has its own DisplayList , and when its onDraw method called, all drawing commands issue through the input Canvas actually store into its own DisplayList. When every DisplayList are ready, Android will draw all the DisplayLists, it actually turn the 2D drawing commands into GLES commands to use GPU to render the window surface. So the rendering of View Hierarchy actually separated into two steps, generate View’s DisplayList, and then draw the DisplayLists, and the second one use GPU mostly.
When app invalidate a View, Android need to regenerate its DisplayList, but the overlapped View’s DisplayList can keep untouched. Also, Android 4.1 introduce DisplayList properties, DisplayList now can has many properties such as opacity, transformation, etc…, and the changed of some properties of View will just cause changed of corresponding properties of its DisplayList and need not to regenerate it. These two improvements can save some times by avoid regenerate DisplayLists unnecessary.
Although Android can never has a layer rendering architecture, it actually introduce some Layer support after 3.0, a View can has a Layer as its backing store. The so called HW Layer actually back by a FBO, if the content of View is too complicated and unlikely to change in the future, use Layer may help. Also, when a View is animating (but content do not change), cache itself and its parent with Layers may also help. But use Layer with caution, because it increase the memory usage, if your want to use Layers during animation, your may need to release them when the animation is finish, Android 4.2 provide new animation API to help you about that. Also, because Android use GLES to draw the content of View, so most Views’ drawing will be fast enough, and the use of Layer may be unnecessary.
Android 4.0 also introduce a new type of native window – SurfaceTextureClient (back by a SurfaceTexture) and its Java wrapper TextureView, app can create and own this kind of native window and response to its composition. If the content of View is too complicated and continue to change, TextureView will be very helpful, app can use another thread to generate the content of TextureView and notify the main thread to update, and main thread can use the TextureView as a normal texture and draw it directly on the window of current Activity. (TextureView can also replace the usage of original SurfaceView and GLSurfaceView)
Android 4.1 make the touch event handling and ui drawing sync with the vsync signal of display, it also use triple buffers to avoid block the main thread too often because it need to wait the SurfaceFlinger to do the page flip and release the previous front buffer, and SurfaceFlinger will always sync with vsync signal of display.
The OpenGL Renderer for hw accelerated Canvas is continue be improved and become faster, especially for complicated shape drawing.
But…
But Android can never has a dedicated rendering thread… Although the drawing is much faster than before, and keep the changed of everything as little as possible during animating, it still need to share the 16ms interval with other jobs in main thread to achieve 60 fps.
So…
So, as developer, we need to utilize the improvements in higher version Android as much as possible :
- Turn on the GPU acceleration switch above Android 3.0;
- Use the new Animation Framework for your animation;
- Use Layer and TextureView when necessary;
- etc…
And avoid to block the main thread as much as possible, that means :
- If your handle the touch events too long, do it in another thread;
- If your need to load a file from sdcard or read data from database, do it in another thread;
- If your need to decode a bitmap, do it in another thread;
- If your View’s content is too complicated, use Layer, if it continue to change, use TextureView and generate its content in another thread;
- Even your can use another standalone window (such as SurfaceView) as a overlay and render it in another thread;
- etc…
Golden Rules for Butter Graphics
- Whatever you can do in another thread, then do it in another thread;
- Whatever you must do in main thread, then do it fast;
- Always profiling, it is your most dear friend;
All you need to do is keep the loop of main thread under 16ms interval, and every frame will be perfect!
The Last Word
When I finish this article, what make me think most is, when you make a big design mistake in the very beginning, and you can not change it due to some reasons, no matter how hard you try to patch it in future, it will never get perfect again.
Android team make huge effects to provide features like Strict Mode, AsyncTask, Concurrent & Background GC, hw accelerated Canvas, hw Layer, TextureView, vsync, triple buffers, etc… All they do just about two things, do everything you can in another thread, and make the must thing in main thread faster, and these actually help a lot. But no matter how hard you try to use these features to improve your app, it is nearly impossible to get every frame perfect, because it is so hard to forecast everything the main thread will do in every loop, it may suddenly jump into something totally uncontrollable by you app and make you break the 16ms curse.
And the worse is, if you has a good design, you can continue improve it (like utilize more advance hardware), but the developer need not to know anything about this and do not need to change even one line of their code, if the app run on higher version OS with faster device, it just got the performance boost. If you has a bad design, although you do a lot of things to patch it, but the developer need to know why, they need to rewrite their code to use these new features, they need to know how to avoid the trap, they need to know how to make compatible with older version system, and many many other things…
This is too hard for a developer just want to build a useful and beauty app…