Tag Archives: GObject Introspection

Introducing libbytesize

Problem area

Many project have to deal with representing sizes of storage or memory. In general, sizes in bytes. What may seem to be a trivial thing turns into hundreds of lines of of code if the following things are to be covered properly:

  • using binary (GiB,…) and decimal (GB) units correctly
  • handling sizes bigger than MAXUINT64 (which is 16 EiB – 1)
  • parsing users’ input correctly with:
    • binary and decimal units
    • numeric values in various formats (traditional, scientific)
  • handling localization and internationalization correctly
    • different radix characters used in different languages
    • units being translated and typed in by users in their native format (even with non-latin scripts)
  • handling negative sizes
    • it sometimes make sense to work with these for example when some storage space is missing somewhere

Of course, not all projects working with sizes in bytes have hundreds of lines for dealing with the above points, but the result is a bad user experience. In some cases, valid localized inputs are not accepted and correctly parsed or no matter what the current locale and language configuration is the users always get the English format and unit. One of the biggest problems I see in many projects is that binary and decimal units are not used and differentiated correctly. If something shows the value 10 G, does it mean 10 GiB and thus 10240 MiB or is it 10 GB and thus 10000 MB? Sometimes one can find this piece of information in the documentation (e.g. man pages), but often one just have to guess and try. Fortunately quite rarely one can be really surprised with the documented behaviour. For example in case of the lvm utilities where g means GiB and G means GB. We should generally be doing a much better job in handling sizes right and consistently in all projects, that have to handle those. However, it’s obvious that having a few hundreds of lines of code in every such project is nonsense.

An existing solution

One of the projects that I can gladly call a good example of how to deal with sizes in bytes is the Blivet python package used mainly by the Anaconda OS (Fedora, RHEL,…) installer. It has all the concerns mentioned above addressed in a proper and well-tested way in its class called simply Size. As the title of this post reveals, I’m trying to introduce a new library here so the obvious question is: Why to invent and write something new when a good and well-tested solution already exists? The answer lies in the description of Blivet and it is the fact that it is written in Python which makes its implementation of the Size class hardly usable from any other language/environment.

One step further

The obvious solution to move further towards a widely reusable solution was to rewrite the Blivet’s Size class in C so that it can be used from this low-level language and many other languages that very often facilitate use of C libraries. However, again what may seem to be an easy thing to do is not at all that simple. The Blivet’s Python implementation is based on the Python’s type Decimal which is a numeric type supporting unlimitted precision and arbitrarily big numbers. Also, dealing with strings and their processing is way simpler in Python than in C.

Nevertheless, C also has some nice libraries for working with big and highly precise numbers, namely the GMP and MPFR libraries that were created as part of the GNU project and which are for example used by many tools and libraries doing some serious maths. So it soon became clear, that writing a C implementation of the Size class shouldn’t be an overly complicated task. And it turned out be the case.

Here it is

The result is the libbytesize library that uses GMP and MPFR together with GObject Introspection to provide a nice object-oriented API facilitating the work with sizes in bytes. It properly takes care of all the potential issues mentioned in the beginning of this post and is widely usable due to the broad support of GObject Introspection in many high-level languages. The library provides a single class called (warning: here comes the surprise) Size which right now is basically a very thin wrapper around the mpz_t type provided by the GMP library for arbitrarily big integer numbers and thus it actually stores byte sizes as numbers of bytes. That is actually the precision limitation, but since no storage provides or works with fractions of bytes, it’s no real limitation at all.

There are (at this point) four constructors 1:

  • bs_size_new() which creates a new instance initialized to 0 B,
  • bs_size_new_from_bytes() which creates a new instance initialized to a given number of bytes,
  • bs_size_new_from_str() which creates a new instance initialized to the number of bytes the given string (e.g. "10 GiB") represents,
  • bs_size_new_from_size() which is a copy constructor.

Then there are some query functions the most important of which are the following two:

  • bs_size_convert_to() which can be used to convert a given size to some particular unit and
  • bs_size_human_readable() which gives a human-readable representation of a given size – i.e. with such unit that the the resulting number is not too big nor too small

Last but not least there are many methods for doing arithmetic and logical operations with sizes in bytes. It’s probably wise to mention here that not all arithmetic operations implemented for the mpz_t type are implemented for sizes. Some of them just don’t make sense – multiplication of size by size (what is GiB**2?), the raising operation, (square) root and others. However, there are some extra ones that don’t really make much sense for generic numbers, but are quite useful when working with sizes namely the bs_size_round_to_nearest() which rounds a given size (up or down) to a nearest multiple of another size. Like for example if you need to know how much space an LVM LV of requested size will take in a VG with some particular extent size.

Since the GObject Introspection allows for having overrides and the new library is expected to be used by Blivet instead of its own Python-only implementation of the Size class, there already are Python overrides making the work with the libbytesize’s Size class really simple. Here as example python interpret session demostrating the simplicity of use:

>>> from gi.repository.ByteSize import Size
>>> s = Size("10 GiB")
>>> str(s)
'10 GiB'
>>> repr(s)
'Size (10 GiB)'
>>> s2 = Size(10 * 1024**3)
>>> s2
Size (10 GiB)
>>> s + s2
Size (20 GiB)
>>> s - s2
Size (0 B)
>>> s3 = Size(s2)
>>> sum([s, s2, s3])
Size (30 GiB)
>>> -s2
Size (-10 GiB)
>>> abs(-s2)
Size (10 GiB)

And here come the dogs

I mean docs. The project is hosted on GitHub together with its documentation. The current release is 0.2 where the zero in the beginning means that it is not a stable release yet. The API is unlikely to change in any significant way for the (stable) release 1.0, but since the library is not being used in any big project right now, we are leaving us with some "manipulation space" for potential changes. So if you find the API of the library wrong, feel free to let us know and we might change it according to your favor! If you want to get a quick but still quite comprehensive overview of the library’s API, have a look at the header file it provides.

The last thing I’d like to mention here is that the library is packaged for the Fedora GNU/Linux distribution so if you happen to be using this distribution, you can easily start playing with the library by typing this into your shell:

$ sudo dnf install libbytesize python-libbytesize ipython
$ ipython

Using ipython also gives you the TAB-completion. See the above intepret session example to get a better idea about what to type in then. Have fun and don’t forget to share your ideas in the comments!

  1. bs is the "namespace" prefix and size is the class prefix

libblockdev reaches the 1.0 milestone!

A year ago, I started working on a new storage library for low-level operations with various types of block devices — libblockdev. Today, I’m happy to announce that the library reached the 1.0 milestone which means that it covers all the functionality that has been stated in the initial goals and it’s going to keep the API stable.

A little bit of a background

Are you asking the question: "Why yet another code implementing what’s already been implemented in many other places?" That’s, of course, a very good and probably crucial question. The answer is that I and people who were at the birth of the idea think that this is for the first time such thing is implemented in a way that it is usable for a wide range of tools, applications, libraries, etc. Let’s start with the requirements every widely usable implementation should meet:

  1. it should be written in C so that it is usable for code written in low-level languages
  2. it should be a library as DBus is not usable together with chroot() and things like that and running subprocesses is suboptimal (slow, eating lot of random data entropy, need to parse the output, etc.)
  3. it should provide bindings for as many languages as possible, in particular the widely used high-level languages like Python, Ruby, etc.
  4. it shouldn’t be a single monolithic piece required by every user code no matter how much of the library it actually needs
  5. it should have a stable API
  6. it should support all major storage technologies (LVM, MD RAID, BTRFS, LUKS,…)

If we take the candidates potentially covering the low-level operations with blockdev devices — Blivet, ssm and udisks2 (now being replaced by storaged) — we can easily come to a conclusion that none of them meets the requirements above. Blivet 1 covers the functionality in a great way, but it’s written in Python and thus hardly usable from code written in other languages. The same applies to ssm 2 is also written in Python, it’s an application and it doesn’t cover all the technologies (it doesn’t try to). udisks2 3 and now storaged 4 provide a DBus API and don’t provide for example functions related to BTRFS (and even LVM in case of udisks2).

The libblockdev library is:
  • written in C,
  • using GLib and providing bindings for all languages supporting GObject instrospection (Python, Perl, Ruby, Haskell*,…),
  • modular — using separate plugins for all technologies (LVM, Btrfs,…),
  • covering all technologies Blivet supports 5 plus some more,

by which it fulfills all the requirements mentioned above. It’s only a wish, but a strong one, that every new piece of code written for low-level manipulation with block devices 6, should be written as part of the libblockdev library, tested and reused in as many places as possible instead of writing it again and again in many, many places with new, old, weird and surprising and custom bugs.


As mentioned above, the library loads plugins that provide the functionality, each related to one storage technology. Right now, there are lvm, btrfs, swap, loop, crypto, mpath, dm, mdraid, kbd and s390 plugins. 7 The library itself basically only provides a thin wrapper around its plugins so that it can all be easily used via GObject introspection and so that it is easy to setup logging (and probably more in the future). However, each of the plugins can be used as a standalone shared library in case that’s desired. The plugins are loaded when the bd_init() function is called 8 and changes (loading more/less plugins) can later be done with the bd_reinit() function. It is also possible to reload a plugin in a long-running process if it gets updated, for example. If a function provided by a plugin that was not loaded is called, the call fails with an error, but doesn’t crash and thus it is up to the caller code to deal with such situation.

The libblockdev library is stateless from the perspective of the block device manipulations. I.e., it has some internal state (like tracking if the library has been initialized or not), but it doesn’t hold any state information about the block devices. So if you e.g. use it to create some LVM volume groups and then try to create a logical volume in a different, non-existing VG, it just fails creating it at the point where LVM realizes that such volume group doesn’t exist. That makes the library a lot simpler and "almost thread-safe" with the word "almost" being there just because some of the technologies doesn’t provide any other API than running various utilities as subprocesses which cannot generally be considered thread-safe. 9

Scope (provided functionality)

The first goal for the library was to replace the Blivet’s devicelibs subpackage that provided all the low-level functions for manipulations with block devices. That fact also defined the original scope of the library. Later, we realized that we would like to add the LVM cache and bcache support to Blivet and the scope of the library got extended to the current state. The supported technologies are defined by the list of plugins the library uses (see above) and the full list of the functions can be seen either in the project’s features.rst file or by browsing the documentation.

Tests and reliability

Right now, there are 135 tests run manually and by a Jenkins instance hooked up to the project’s Git repository. The tests use loop devices to test vast majority of the functions the library provides 10. They must be run as root, but that’s unavoidable if they should really test the functionality and not just some mocked up stubs that we would believe behave like a real system.

The library is used by Fedora 22’s installation process as F22’s Blivet has been ported to use libblockdev before the Beta release. There have been few bugs reported against the library (majority of them were related to FW RAID setups) with all bugs being fixed and covered by tests for those particular use cases (based on data gathered from the logs in bug reports).

Future plans

Although the initial goals are all covered by the version 1.0 of the library there are already many suggestions for additional functionality and also extensions for some of the functions that are already implemented (extra arguments, etc.). The most important goal for the near future is to fix reported bugs in the current version and promote the library as much as possible so that the wish mentioned above gets fulfilled. The plan for a bit further future (let’s say 6-8 months) is to work on additional functionality targetting version 2.0 that will break the API for the purpose of extending and improving it.

To be more concrete, for example one of the planned new plugins is the fs plugin that will provide various functions related to file systems. One of such functions will definitely be the mkfs() function that will take a list (or dictionary) of extra options passed to the particular mkfs utility on top of the options constructed by the implementation of the function. The reason for that is the fact that some file systems support many configuration options during their creation and it would be cumbersome to cover them all with function parameters. In relation to that, at least some (if not all) of the LVM functions will also get such extra argument so that they are useful even in very specific use cases that require fine-tuning of the parameters not covered by functions’ arguments.

Another potential feature is to add some clever and nice way of progress reporting to some functions that are expected to take a lot of time to finish –like lvresize(), pvmove(), resizefs() and others. It’s not always possible to track the progress because even the underlying tools/libraries don’t report it, but where possible, libblockdev should be able to pass that information to its callers ideally in some unified way.

So a lot of work behind, much more ahead. It’s a challenging world, but I like taking challenges.

  1. a python package used by the Anaconda installer as a storage backend

  2. System Storage Manager

  3. daemon used by e.g. gnome-disks and the whole GNOME "storage stack"

  4. a fork of udisks2 adding an LVM API and being actively developed

  5. the first goal for the library was to replace Blivet’s devicelibs subpackage

  6. at higher than the most low-level layers, of course

  7. I hope that with the exception of kbd which stands for Kernel Block Devices the related technologies are clear, but don’t hesitate to ask in the comments if not.

  8. or e.g. BlockDev.init(plugins) in Python over the GObject introspection

  9. use Google and "fork shared library" for further reading

  10. 119 out of 132 to be more precise

Minimalistic example of the GLib’s GBoxedType usage


As I’ve explained in one of the previous posts, it is possible to use
advantages of the GObject introspection even with a plain C non-GObject
code. It is okay to write C functions taking arguments and returning values,
call g-ir-scanner and g-ir-compile on them and then call them from
Python or any other language supporting GObject introspection. However, that’s
not entirely true as it per se only works with elementary types like numbers and
strings and arrays of such values plus structs with no pointer fields.

So what if some functions need to take or return complex values not only numbers
or strings? And why it’s only structs with no pointer fields? Let’s start with
the second question. Imagine the following situation: caller (e.g. Python) calls
a function that returns a struct containing a number, a string and a pointer to
another struct and the ownership transfer (extra metadata for GObject
introspection) is set to full which means the caller takes the ownership of
the returned value. What if the caller wants to copy or delete such value? In
case of number or string or array of such values it is simple. The same applies
to a simple struct with no pointers (the introspection data documents struct’s
fields and their types).

GBoxedType declaration example

So the problem is missing code for copying and freeing the complex values and
the first question coming to mind is: "Can’t I simply tell the caller how to
copy and free such values?"
And that’s what GLib’s GBoxedType is all
about. It is a wrapper type around plain C structs which provides information
how to copy and free such values. Let’s have a look at a minimalistic example
showing how such type can be declared:

#include <glib-object.h>
#include <glib.h>

#define TEST_TYPE_DATA (test_data_get_type ())
GType test_data_get_type ();

typedef struct _TestData TestData;

struct _TestData {
    gchar *item1;
    gchar *item2;

  * test_data_copy: (skip)
  * Creates a copy of @data.

TestData* test_data_copy (TestData *data);

 * test_data_free: (skip)
 * Free's @data.
void test_data_free (TestData *data);

First the glib-object.h and glib.h header files need to be included
because they define types and functions necessary for a definition of a new
GBoxedType. Then a macro and a function for getting type of the new GBoxedType
need to be declared for the type system to work with the type. Of course, there
has to be a definition of the actual struct holding the data. It can be done in
two steps as in the above example or in one step as:

typedef struct TestData {
    type1 field1;
    type2 field2;
} TestData;

defining the struct type and "non-struct" type [1] both at once, but GLib coding
style recommends the two-steps definition. And the core are the two functions
for creating a new copy and freeing a value of the new GBoxedType with not much
surprising signatures.

[1] in C these are two different type namespaces

With definitions of the functions above it will be possible to call functions
that return a TestData* value and get the values of the item1 and
item2 fields. It would also be possible to create a new TestData object
and passing values to its fields. However, it is often useful to declare and
define one more function:

 * test_data_new: (constructor)
 * @str1: string to become the .item1 field
 * @str2: string to become the .item2 field
 * Returns: (transfer full): new data
TestData* test_data_new (gchar *str1, gchar *str2);

It is a constructor function that, given values of the fields, returns a new
object of type TestData. It is only a convenience function here where it
should just passing the values to the struct’s fields, but as you can imagine, it
can do a lot more if needed.

GObject definition example

The implementation of the functions declared above is really
straightforward. The only exception is the test_data_get_type function that
creates and registers the type in the type system:

GType test_data_get_type (void) {
    static GType type = 0;

    if (G_UNLIKELY (!type))
        type = g_boxed_type_register_static ("TestData",
                                             (GBoxedCopyFunc) test_data_copy,
                                             (GBoxedFreeFunc) test_data_free);

    return type;

It defines a global variable type of type GType and if it is not set
(i.e. set to 0), it assigns it a new value created by the
g_boxed_type_register_static with arguments that are quite clear, I’d
say. The use of G_UNLIKELY macro tells the compiler that this condition will
hardly ever be evaluated to TRUE which is a simple but useful optimization.


With the functions and types declared in the test_data.h and defined in the
test_data.c files the working introspectable library can be created with the
following commands:

$ gcc -c -o test_data.o -fPIC `pkg-config --cflags glib-2.0 gobject-2.0` test_data.c
$ gcc -shared -o libtest_data.so test_data.o
$ LD_LIBRARY_PATH=. g-ir-scanner `pkg-config --libs --cflags glib-2.0 gobject-2.0` --identifier-prefix=Test --symbol-prefix=test --namespace Test --nsversion=1.0 --library test_data --warn-all -o Test-1.0.gir test_data.c test_data.h
$ g-ir-compiler Test-1.0.gir > Test-1.0.typelib

The first two call gcc to produce the libtest_data.so shared dynamic
library that can be then loaded e.g. by Python. The third line is the invocation
of the g-ir-scanner utility that produces an XML containing the
introspection (meta)data. It gets compiler and linker flags for the libraries
required by the libtest_data.so, prefixes for identifiers (like types,
constants,…) and symbols (functions), namespace name and version, the name of
the library that should be scanned and paths to the sources that should be
scanned and -o Test-1.0.gir option that specifies the output file name. Name
of the file should match the namespace-nsversion.gir pattern. And finally
the last command compiles the Test-1.0.gir file to its binary representation
that is expected to match the same name pattern with the .typelib extension.
If you are reproducing the steps above, feel free to have a look at the produced
Test-1.0.gir file as it is quite easily readable and understandable, I’d
say. And if you are hardcore hacker, feel free to have a look at the
.typelib file too, of course. Just remember that running cat on it may
"nicely" change your terminal’s runtime configuration [2].

[2] use reset to get the defaults back in such cases


Having the definitions, declarations, introspection (meta)data available both in
the XML and binary forms, it’s time to test the result. The easiest way is
running ipython as it provides a TAB-TAB completion. It just have to be
told where to find the .typelib file and of course the libtest_data.so
library that it needs to load. Both are in the current directory so:


Runs the ipython in the properly set up environment. To test the library and
newly defined struct/class/object type it has to be loaded from the
gi.repository. Then it can be instantiated with the constructor or without it
and fields can be introspected (TAB-TAB) and used:

In [1]: from gi.repository import Test
In [2]: td = Test.Data()
In [3]: td.item1 = "ahoj"
In [4]: td.item2 = "cau"

In [5]: td.item1
Out[5]: 'ahoj'

In [6]: td.item2
Out[6]: 'cau'

In [7]: td2 = Test.Data.new("nazdar", "zdar")

In [8]: td2. # hit TAB-TAB
td2.copy   td2.item1  td2.item2  td2.new

In [8]: td2.item2
Out[8]: 'zdar'


That’s not entirely bad, is it? One doesn’t get an introspectable struct
completely for free if it is not trivial, but defining three (copy, free, new)
of the four functions defined above is a good practice anyway. So in the end
it’s all about adding one more function and two declarations (the TYPE macro
and the get_type function prototype) and calling two utilities producing the
introspection data. Quite easy if I think about writing language-tailored
bindings for any language that comes to my mind. And with these constructs one
gets bindings for all the languages supporting GObject introspection. To define
a new type’s method, it just needs to have a test_data prefix and take the
TestData* value as the first argument. Let me know in the comments if there
is anything unclear. If I know the answer, I’ll reply ASAP and possibly update
the post with such information.