Generic Image Library Tutorial
This document will give you a jump-start in using GIL. It does not discuss the underlying design of the library and does not cover all aspects of it. You can find a detailed library design document on the main GIL web page at http://opensource.adobe.com/gil
InstallationThe latest version of GIL can be downloaded from GIL's web page, at http://opensource.adobe.com/gil. GIL is approved for integration into Boost and in the future will be installed simply by installing Boost from http://www.boost.org. GIL consists of header files only and does not require any libraries to link against. It does not require Boost to be built. Includingboost/gil/gil_all .hpp will be sufficient for most projects.Example - Computing the Image GradientThis tutorial will walk through an example of using GIL to compute the image gradients. We will start with some very simple and non-generic code and make it more generic as we go along. Let us start with a horizontal gradient and use the simplest possible approximation to a gradient - central difference. The gradient at pixel x can be approximated with the half-difference of its two neighboring pixels: D[x] = (I[x-1] - I[x+1]) / 2For simplicity, we will also ignore the boundary cases - the pixels along the edges of the image for which one of the neighbors is not defined. The focus of this document is how to use GIL, not how to create a good gradient generation algorithm. Interface and Glue CodeLet us first start with 8-bit unsigned grayscale image as the input and 8-bit signed grayscale image as the output. Here is how the interface to our algorithm looks like:
#include <boost/gil/gil_all.hpp> using namespace boost::gil; void x_gradient(const gray8c_view_t& src, const gray8s_view_t& dst) { assert(src.dimensions() == dst.dimensions()); ... // compute the gradient }
GIL makes a distinction between an image and an image view. A GIL image view, is a shallow, lightweight view of a rectangular grid of pixels. It provides access to the pixels but does not own the pixels. Copy-constructing a view does not deep-copy the pixels. Image views do not propagate their constness to the pixels and should always be taken by a const reference. Whether a view is mutable or read-only (immutable) is a property of the view type. A GIL image, on the other hand, is a view with associated ownership. It is a container of pixels; its constructor/destructor allocates/deallocates the pixels, its copy-constructor performs deep-copy of the pixels and its operator== performs deep-compare of the pixels. Images also propagate their constness to their pixels - a constant reference to an image will not allow for modifying its pixels.
Most GIL algorithms operate on image views; images are rarely needed. GIL's design is very similar to that of the STL. The STL equivalent of GIL's image is a container, like GIL's image views can be constructed from raw data - the dimensions, the number of bytes per row and the pixels, which for chunky views are represented with one pointer. Here is how to provide the glue between your code and GIL:
void ComputeXGradientGray8(const unsigned char* src_pixels, ptrdiff_t src_row_bytes, int w, int h, signed char* dst_pixels, ptrdiff_t dst_row_bytes) { gray8c_view_t src = interleaved_view(w, h, (const gray8_pixel_t*)src_pixels,src_row_bytes); gray8s_view_t dst = interleaved_view(w, h, ( gray8s_pixel_t*)dst_pixels,dst_row_bytes); x_gradient(src,dst); } This glue code is very fast and views are lightweight - in the above example the views have a size of 16 bytes. They consist of a pointer to the top left pixel and three integers - the width, height, and number of bytes per row. First ImplementationFocusing on simplicity at the expense of speed, we can compute the horizontal gradient like this:
void x_gradient(const gray8c_view_t& src, const gray8s_view_t& dst) { for (int y=0; y<src.height(); ++y) for (int x=1; x<src.width()-1; ++x) dst(x,y) = (src(x-1,y) - src(x+1,y)) / 2; }
We use image view's
void x_gradient(const gray8c_view_t& src, const gray8s_view_t& dst) { for (int y=0; y<src.height(); ++y) { gray8c_view_t::x_iterator src_it = src.row_begin(y); gray8s_view_t::x_iterator dst_it = dst.row_begin(y); for (int x=1; x<src.width()-1; ++x) dst_it[x] = (src_it[x-1] - src_it[x+1]) / 2; } }
We use pixel iterators initialized at the beginning of each row. GIL's iterators are Random Access Traversal iterators. If you are not familiar with random access iterators, think of them as if they were pointers. In fact, in the above example the two iterator types are raw C pointers and their The code to compute gradient in the vertical direction is very similar:
void y_gradient(const gray8c_view_t& src, const gray8s_view_t& dst) { for (int x=0; x<src.width(); ++x) { gray8c_view_t::y_iterator src_it = src.col_begin(x); gray8s_view_t::y_iterator dst_it = dst.col_begin(x); for (int y=1; y<src.height()-1; ++y) dst_it[y] = (src_it[y-1] - src_it[y+1])/2; } }
Instead of looping over the rows, we loop over each column and create a
The above version of
void y_gradient(const gray8c_view_t& src, const gray8s_view_t& dst) { for (int y=1; y<src.height()-1; ++y) { gray8c_view_t::x_iterator src1_it = src.row_begin(y-1); gray8c_view_t::x_iterator src2_it = src.row_begin(y+1); gray8s_view_t::x_iterator dst_it = dst.row_begin(y); for (int x=0; x<src.width(); ++x) { *dst_it = ((*src1_it) - (*src2_it))/2; ++dst_it; ++src1_it; ++src2_it; } } }
This sample code also shows an alternative way of using pixel iterators - instead of Using LocatorsUnfortunately this cache-friendly version requires the extra hassle of maintaining two separate iterators in the source view. For every pixel, we want to access its neighbors above and below it. Such relative access can be done with GIL locators:
void y_gradient(const gray8c_view_t& src, const gray8s_view_t& dst) { gray8c_view_t::xy_locator src_loc = src.xy_at(0,1); for (int y=1; y<src.height()-1; ++y) { gray8s_view_t::x_iterator dst_it = dst.row_begin(y); for (int x=0; x<src.width(); ++x) { (*dst_it) = (src_loc(0,-1) - src_loc(0,1)) / 2; ++dst_it; ++src_loc.x(); // each dimension can be advanced separately } src_loc+=point2<std::ptrdiff_t>(-src.width(),1); // carriage return } }
The first line creates a locator pointing to the first pixel of the second row of the source view. A GIL pixel locator is very similar to an iterator, except that it can move both horizontally and vertically.
void y_gradient(const gray8c_view_t& src, const gray8s_view_t& dst) { gray8c_view_t::xy_locator src_loc = src.xy_at(0,1); gray8c_view_t::xy_locator::cached_location_t above = src_loc.cache_location(0,-1); gray8c_view_t::xy_locator::cached_location_t below = src_loc.cache_location(0, 1); for (int y=1; y<src.height()-1; ++y) { gray8s_view_t::x_iterator dst_it = dst.row_begin(y); for (int x=0; x<src.width(); ++x) { (*dst_it) = (src_loc[above] - src_loc[below])/2; ++dst_it; ++src_loc.x(); } src_loc+=point2<std::ptrdiff_t>(-src.width(),1); } }
In this example Creating a Generic Version of GIL AlgorithmsLet us make ourx_gradient more generic. It should work with any image views, as long as they have the same number of channels. The gradient operation is to be computed for each channel independently. Here is how the new interface looks like:
template <typename SrcView, typename DstView> void x_gradient(const SrcView& src, const DstView& dst) { gil_function_requires<ImageViewConcept<SrcView> >(); gil_function_requires<MutableImageViewConcept<DstView> >(); gil_function_requires<ColorSpacesCompatibleConcept< typename color_space_type<SrcView>::type, typename color_space_type<DstView>::type> >(); ... // compute the gradient }
The new algorithm now takes the types of the input and output image views as template parameters. That allows using both built-in GIL image views, as well as any user-defined image view classes. The first three lines are optional; they use GIL does not require using its own built-in constructs. You are free to use your own channels, color spaces, iterators, locators, views and images. However, to work with the rest of GIL they have to satisfy a set of requirements; in other words, they have to model the corresponding GIL concept. GIL's concepts are defined in the user guide.
One of the biggest drawbacks of using templates and generic programming in C++ is that compile errors can be very difficult to comprehend. This is a side-effect of the lack of early type checking - a generic argument may not satisfy the requirements of a function, but the incompatibility may be triggered deep into a nested call, in code unfamiliar and hardly related to the problem. GIL uses The body of the generic function is very similar to that of the concrete one. The biggest difference is that we need to loop over the channels of the pixel and compute the gradient for each channel:
template <typename SrcView, typename DstView> void x_gradient(const SrcView& src, const DstView& dst) { for (int y=0; y<src.height(); ++y) { typename SrcView::x_iterator src_it = src.row_begin(y); typename DstView::x_iterator dst_it = dst.row_begin(y); for (int x=1; x<src.width()-1; ++x) for (int c=0; c<num_channels<SrcView>::value; ++c) dst_it[x][c] = (src_it[x-1][c]- src_it[x+1][c])/2; } } Having an explicit loop for each channel could be a performance problem. GIL allows us to abstract out such per-channel operations:
template <typename Out> struct halfdiff_cast_channels { template <typename T> Out operator()(const T& in1, const T& in2) const { return Out((in1-in2)/2); } }; template <typename SrcView, typename DstView> void x_gradient(const SrcView& src, const DstView& dst) { typedef typename channel_type<DstView>::type dst_channel_t; for (int y=0; y<src.height(); ++y) { typename SrcView::x_iterator src_it = src.row_begin(y); typename DstView::x_iterator dst_it = dst.row_begin(y); for (int x=1; x<src.width()-1; ++x) static_transform(src_it[x-1], src_it[x+1], dst_it[x], halfdiff_cast_channels<dst_channel_t>()); } }
Here is how we can use our generic version with images of different types:
// Calling with 16-bit grayscale data void XGradientGray16_Gray32(const unsigned short* src_pixels, ptrdiff_t src_row_bytes, int w, int h, signed int* dst_pixels, ptrdiff_t dst_row_bytes) { gray16c_view_t src=interleaved_view(w,h,(const gray16_pixel_t*)src_pixels,src_row_bytes); gray32s_view_t dst=interleaved_view(w,h,( gray32s_pixel_t*)dst_pixels,dst_row_bytes); x_gradient(src,dst); } // Calling with 8-bit RGB data into 16-bit BGR void XGradientRGB8_BGR16(const unsigned char* src_pixels, ptrdiff_t src_row_bytes, int w, int h, signed short* dst_pixels, ptrdiff_t dst_row_bytes) { rgb8c_view_t src = interleaved_view(w,h,(const rgb8_pixel_t*)src_pixels,src_row_bytes); rgb16s_view_t dst = interleaved_view(w,h,( rgb16s_pixel_t*)dst_pixels,dst_row_bytes); x_gradient(src,dst); } // Either or both the source and the destination could be planar - the gradient code does not change void XGradientPlanarRGB8_RGB32( const unsigned short* src_r, const unsigned short* src_g, const unsigned short* src_b, ptrdiff_t src_row_bytes, int w, int h, signed int* dst_pixels, ptrdiff_t dst_row_bytes) { rgb16c_planar_view_t src=planar_rgb_view (w,h, src_r,src_g,src_b, src_row_bytes); rgb32s_view_t dst=interleaved_view(w,h,(rgb32s_pixel_t*)dst_pixels,dst_row_bytes); x_gradient(src,dst); } As these examples illustrate, both the source and the destination can be interleaved or planar, of any channel depth (assuming the destination channel is assignable to the source), and of any compatible color spaces. GIL 2.1 can also natively represent images whose channels are not byte-aligned, such as 6-bit RGB222 image or a 1-bit Gray1 image. GIL algorithms apply to these images natively. See the design guide or sample files for more on using such images. Image View TransformationsOne way to compute the y-gradient is to rotate the image by 90 degrees, compute the x-gradient and rotate the result back. Here is how to do this in GIL:
template <typename SrcView, typename DstView> void y_gradient(const SrcView& src, const DstView& dst) { x_gradient(rotated90ccw_view(src), rotated90ccw_view(dst)); }
Another example: suppose we want to compute the gradient of the N-th channel of a color image. Here is how to do that:
template <typename SrcView, typename DstView> void nth_channel_x_gradient(const SrcView& src, int n, const DstView& dst) { x_gradient(nth_channel_view(src, n), dst); }
y_gradient(subsampled_view(nth_channel_view(src, 1), 2,2), dst); GIL can sometimes simplify piped views. For example, two nested subsampled views (views that skip over pixels in X and in Y) can be represented as a single subsampled view whose step is the product of the steps of the two views. 1D pixel iteratorsLet's go back tox_gradient one more time. Many image view algorithms apply the same operation for each pixel and GIL provides an abstraction to handle them. However, our algorithm has an unusual access pattern, as it skips the first and the last column. It would be nice and instructional to see how we can rewrite it in canonical form. The way to do that in GIL is to write a version that works for every pixel, but apply it only on the subimage that excludes the first and last column:
void x_gradient_unguarded(const gray8c_view_t& src, const gray8s_view_t& dst) { for (int y=0; y<src.height(); ++y) { gray8c_view_t::x_iterator src_it = src.row_begin(y); gray8s_view_t::x_iterator dst_it = dst.row_begin(y); for (int x=0; x<src.width(); ++x) dst_it[x] = (src_it[x-1] - src_it[x+1]) / 2; } } void x_gradient(const gray8c_view_t& src, const gray8s_view_t& dst) { assert(src.width()>=2); x_gradient_unguarded(subimage_view(src, 1, 0, src.width()-2, src.height()), subimage_view(dst, 1, 0, src.width()-2, src.height())); }
Now that
void x_gradient_unguarded(const gray8c_view_t& src, const gray8s_view_t& dst) { gray8c_view_t::iterator src_it = src.begin(); for (gray8s_view_t::iterator dst_it = dst.begin(); dst_it!=dst.end(); ++dst_it, ++src_it) *dst_it = (src_it.x()[-1] - src_it.x()[1]) / 2; }
GIL image views provide STL Equivalent AlgorithmsGIL provides STL equivalents of many algorithms. For example,std::transform is an STL algorithm that sets each element in a destination range the result of a generic function taking the corresponding element of the source range. In our example, we want to assign to each destination pixel the value of the half-difference of the horizontal neighbors of the corresponding source pixel. If we abstract that operation in a function object, we can use GIL's transform_pixel_positions to do that:
struct half_x_difference { int operator()(const gray8c_loc_t& src_loc) const { return (src_loc.x()[-1] - src_loc.x()[1]) / 2; } }; void x_gradient_unguarded(const gray8c_view_t& src, const gray8s_view_t& dst) { transform_pixel_positions(src, dst, half_x_difference()); }
GIL provides the algorithms Color ConversionInstead of computing the gradient of each color plane of an image, we often want to compute the gradient of the luminosity. In other words, we want to convert the color image to grayscale and compute the gradient of the result. Here how to compute the luminosity gradient of a 32-bit float RGB image:
void x_gradient_rgb_luminosity(const rgb32fc_view_t& src, const gray8s_view_t& dst) { x_gradient(color_converted_view<gray8_pixel_t>(src), dst); }
In the generic version of this algorithm we might like to convert the color space to grayscale, but keep the channel depth the same. We do that by constructing the type of a GIL grayscale pixel with the same channel as the source, and color convert to that pixel type:
template <typename SrcView, typename DstView> void x_luminosity_gradient(const SrcView& src, const DstView& dst) { typedef pixel<typename channel_type<SrcView>::type, gray_layout_t> gray_pixel_t; x_gradient(color_converted_view<gray_pixel_t>(src), dst); }
When the destination color space and channel type happens to be the same as the source one, color conversion is unnecessary. GIL detects this case and avoids calling the color conversion code at all - i.e. ImageThe above example has a performance problem -x_gradient dereferences most source pixels twice, which will cause the above code to perform color conversion twice. Sometimes it may be more efficient to copy the color converted image into a temporary buffer and use it to compute the gradient - that way color conversion is invoked once per pixel. Using our non-generic version we can do it like this:
void x_luminosity_gradient(const rgb32fc_view_t& src, const gray8s_view_t& dst) { gray8_image_t ccv_image(src.dimensions()); copy_pixels(color_converted_view<gray8_pixel_t>(src), view(ccv_image)); x_gradient(const_view(ccv_image), dst); }
First we construct an 8-bit grayscale image with the same dimensions as our source. Then we copy a color-converted view of the source into the temporary image. Finally we use a read-only view of the temporary image in our Creating a generic version of the above is a bit trickier:
template <typename SrcView, typename DstView> void x_luminosity_gradient(const SrcView& src, const DstView& dst) { typedef typename channel_type<DstView>::type d_channel_t; typedef typename channel_convert_to_unsigned<d_channel_t>::type channel_t; typedef pixel<channel_t, gray_layout_t> gray_pixel_t; typedef image<gray_pixel_t, false> gray_image_t; gray_image_t ccv_image(src.dimensions()); copy_pixels(color_converted_view<gray_pixel_t>(src), view(ccv_image)); x_gradient(const_view(ccv_image), dst); }
First we use the
GIL constructs that have an associated pixel type, such as pixels, pixel iterators, locators, views and images, all model After we get the channel type of the destination view, we use another metafunction to remove its sign (if it is a signed integral type) and then use it to generate the type of a grayscale pixel. From the pixel type we create the image type. GIL's image class is templated over the pixel type and a boolean indicating whether the image should be planar or interleaved. Single-channel (grayscale) images in GIL must always be interleaved. There are multiple ways of constructing types in GIL. Instead of instantiating the classes directly we could have used type factory metafunctions. The following code is equivalent:
template <typename SrcView, typename DstView> void x_luminosity_gradient(const SrcView& src, const DstView& dst) { typedef typename channel_type<DstView>::type d_channel_t; typedef typename channel_convert_to_unsigned<d_channel_t>::type channel_t; typedef typename image_type<channel_t, gray_layout_t>::type gray_image_t; typedef typename gray_image_t::value_type gray_pixel_t; gray_image_t ccv_image(src.dimensions()); copy_and_convert_pixels(src, view(ccv_image)); x_gradient(const_view(ccv_image), dst); }
GIL provides a set of metafunctions that generate GIL types -
From the image type we can use the nested typedef Virtual Image ViewsSo far we have been dealing with images that have pixels stored in memory. GIL allows you to create an image view of an arbitrary image, including a synthetic function. To demonstrate this, let us create a view of the Mandelbrot set. First, we need to create a function object that computes the value of the Mandelbrot set at a given location (x,y) in the image:// models PixelDereferenceAdaptorConcept struct mandelbrot_fn { typedef point2<ptrdiff_t> point_t; typedef mandelbrot_fn const_t; typedef gray8_pixel_t value_type; typedef value_type reference; typedef value_type const_reference; typedef point_t argument_type; typedef reference result_type; BOOST_STATIC_CONSTANT(bool, is_mutable=false); mandelbrot_fn() {} mandelbrot_fn(const point_t& sz) : _img_size(sz) {} result_type operator()(const point_t& p) const { // normalize the coords to (-2..1, -1.5..1.5) double t=get_num_iter(point2<double>(p.x/(double)_img_size.x*3-2, p.y/(double)_img_size.y*3-1.5f)); return value_type((bits8)(pow(t,0.2)*255)); // raise to power suitable for viewing } private: point_t _img_size; double get_num_iter(const point2<double>& p) const { point2<double> Z(0,0); for (int i=0; i<100; ++i) { // 100 iterations Z = point2<double>(Z.x*Z.x - Z.y*Z.y + p.x, 2*Z.x*Z.y + p.y); if (Z.x*Z.x + Z.y*Z.y > 4) return i/(double)100; } return 0; } };
We can now use GIL's typedef mandelbrot_fn::point_t point_t; typedef virtual_2d_locator<mandelbrot_fn,false> locator_t; typedef image_view<locator_t> my_virt_view_t; point_t dims(200,200); // Construct a Mandelbrot view with a locator, taking top-left corner (0,0) and step (1,1) my_virt_view_t mandel(dims, locator_t(point_t(0,0), point_t(1,1), mandelbrot_fn(dims)));
We can treat the synthetic view just like a real one. For example, let's invoke our
gray8s_image_t img(dims); x_gradient(rotated90cw_view(mandel), view(img)); // Save the Mandelbrot set and its 90-degree rotated gradient (jpeg cannot save signed char; must convert to unsigned char) jpeg_write_view("mandel.jpg",mandel); jpeg_write_view("mandel_grad.jpg",color_converted_view<gray8_pixel_t>(const_view(img))); Here is what the two files look like:
Run-Time Specified Images and Image ViewsSo far we have created a generic function that computes the image gradient of a templated image view. Sometimes, however, the properties of an image view, such as its color space and channel depth, may not be available at compile time. GIL'sdynamic_image extension allows for working with GIL constructs that are specified at run time, also called variants. GIL provides models of a run-time instantiated image, any_image , and a run-time instantiated image view, any_image_view . The mechanisms are in place to create other variants, such as any_pixel , any_pixel_iterator , etc. Most of GIL's algorithms and all of the view transformation functions also work with run-time instantiated image views and binary algorithms, such as copy_pixels can have either or both arguments be variants.
Lets make our First, we need to make a function object that contains the templated destination view and has an application operator taking a templated source view:
#include <boost/gil/extension/dynamic_image/dynamic_image_all.hpp> template <typename DstView> struct x_gradient_obj { typedef void result_type; // required typedef const DstView& _dst; x_gradient_obj(const DstView& dst) : _dst(dst) {} template <typename SrcView> void operator()(const SrcView& src) const { x_luminosity_gradient(src, _dst); } };
The second step is to provide an overload of
template <typename SrcViews, typename DstView> void x_luminosity_gradient(const any_image_view<SrcViews>& src, const DstView& dst) { apply_operation(src, x_gradient_obj<DstView>(dst)); }
Here is how we can construct a variant and invoke the algorithm:
#include <boost/mpl/vector.hpp> #include <boost/gil/extension/io/jpeg_dynamic_io.hpp> typedef mpl::vector<gray8_image_t, gray16_image_t, rgb8_image_t, rgb16_image_t> my_img_types; any_image<my_img_types> runtime_image; jpeg_read_image("input.jpg", runtime_image); gray8s_image_t gradient(runtime_image.dimensions()); x_luminosity_gradient(const_view(runtime_image), view(gradient)); jpeg_write_view("x_gradient.jpg", color_converted_view<gray8_pixel_t>(const_view(gradient)));
In this example, we create an image variant that could be 8-bit or 16-bit RGB or grayscale image. We then use GIL's I/O extension to load the image from file in its native color space and channel depth. If none of the allowed image types matches the image on disk, an exception will be thrown. We then construct a 8 bit signed (i.e.
Note how free functions and methods such as A warning about using variants: instantiating an algorithm with a variant effectively instantiates it with every possible type the variant can take. For binary algorithms, the algorithm is instantiated with every possible combination of the two input types! This can take a toll on both the compile time and the executable size. ConclusionThis tutorial provides a glimpse at the challenges associated with writing generic and efficient image processing algorithms in GIL. We have taken a simple algorithm and shown how to make it work with image representations that vary in bit depth, color space, ordering of the channels, and planar/interleaved structure. We have demonstrated that the algorithm can work with fully abstracted virtual images, and even images whose type is specified at run time. The associated video presentation also demonstrates that even for complex scenarios the generated assembly is comparable to that of a C version of the algorithm, hand-written for the specific image types.
Yet, even for such a simple algorithm, we are far from making a fully generic and optimized code. In particular, the presented algorithms work on homogeneous images, i.e. images whose pixels have channels that are all of the same type. There are examples of images, such as a packed 565 RGB format, which contain channels of different types. While GIL provides concepts and algorithms operating on heterogeneous pixels, we leave the task of extending x_gradient as an exercise for the reader. Second, after computing the value of the gradient we are simply casting it to the destination channel type. This may not always be the desired operation. For example, if the source channel is a float with range [0..1] and the destination is unsigned char, casting the half-difference to unsigned char will result in either 0 or 1. Instead, what we might want to do is scale the result into the range of the destination channel. GIL's channel-level algorithms might be useful in such cases. For example, There is a lot to be done in improving the performance as well. Channel-level operations, such as the half-difference, could be abstracted out into atomic channel-level algorithms and performance overloads could be provided for concrete channel types. Processor-specific operations could be used, for example, to perform the operation over an entire row of pixels simultaneously, or the data could be prefetched. All of these optimizations can be realized as performance specializations of the generic algorithm. Finally, compilers, while getting better over time, are still failing to fully optimize generic code in some cases, such as failing to inline some functions or put some variables into registers. If performance is an issue, it might be worth trying your code with different compilers. AppendixNaming convention for GIL concrete typesConcrete (non-generic) GIL types follow this naming convention:
ColorSpace + BitDepth + [
Where ColorSpace also indicates the ordering of components. Examples are
bgr8_image_t a; // 8-bit interleaved BGR image cmyk16_pixel_t; b; // 16-bit CMYK pixel value; cmyk16c_planar_ref_t c(b); // const reference to a 16-bit planar CMYK pixel x. rgb32f_planar_step_ptr_t d; // step pointer to a 32-bit planar RGB pixel.
|