WARNING

This is a draft

Recently, I worked on packaging different projects, on my past mission, I was working on packaging different Java project based on MVN, stored on Nexus, for RHEL distribution. Some weeks ago, I was working on packaging barrel-rocksdb library for Debian, Fedora, CentOS and Ubuntu (probably BSD systems sooner or later). That seems amazing, but... Really, we are in 2020 and it's always a nightmare to packaging things for different systems!

In my non-satisfaction, I decided to play around some packaging format to see if we can do something cool with all of that. I started to work on RPM format and... it was a shame to find nothing about the design of the RPM packet. I am not talking about the SPEC file used to build the packaging, but on the different structure and convention used to make the RPM portable between different architecture. To be clear, I was looking for RPM developer documentation. The first link found on DuckDuckGo was the description of the RPM file format (opens new window). Unfortunately, the subtitle of the paper was pretty clear: "NOTE That this is a draft, and does not perfectly match existing RPM format".

After some research, I found a nice description of the RPM file format from the Linux Standard Base Core (opens new window), where the C structures are defined and with a little explication. I also find Dissecting the RPM file format (opens new window), a really good article written by an hacker view. The topic is not about C, but about Erlang and how to port the different libraries and format in this language by using the many internal features.

Before writing code, I will assume you will have downloaded a small RPM packages to make all the tests possible. I decided to choose ed editor (opens new window) to be my guinea pig.

# Introduction

As usual, I like to show you all the steps of my projects and thinking. So, we will start by bootstrap our system by installing Erlang and rebar3. By the way, I am still using OpenBSD, but I think the steps will be pretty similar on any other Linux distributions or OSes. I will first prepare my environment.

mkdir ~/bin
echo 'export PATH=${PATH}:${HOME}/bin' >> ~/.profile

Erlang is needed, so, we will use pkg_add to install it.

pkg_add erlang

rebar3 is the standard project manager but it is not installed by default and not packaged on OpenBSD. Anyway, the build and installation is trivial.

mkdir ~/src
cd ~/src
git clone https://github.com/erlang/rebar3
cd rebar3
./bootstrap
ln -s $(pwd)rebar3 ${HOME}/bin/rebar3

Finally, I can create my library and initialize git.

mkdir ~/projects
cd projects
rebar3 new lib name=rpm
cd rpm
git init .

# Structure of RPM file

RPM file is split in four sections:

  • the lead section to identify the package;

  • the signature section to verify the integrity of the package;

  • the header section containing package information;

  • and finally the payload containing the files.

If you are interested to see the code source, you can take a look on the RPM github repository (opens new window).

# Lead Section

The lead section has a fixed size and is defined by the rpmlead (opens new window) C data structure. Here the code from the official documentation. The full size of the structure is 736bits ot 92bytes.

struct rpmlead {
    unsigned char magic[4];
    unsigned char major, minor;
    short type;
    short archnum;
    char name[66];
    short osnum;
    short signature_type;
    char reserved[16];
} ;

In Erlang, we can create a function called rpm:lead/1. This one will return a high level data-structure (a map) containing the same fields. The magic pattern will be a function too, called rpm:magic/0, returning the fixed bytes used to identify a RPM package. Another function will also be created and called rpm:magic/1 returning a boolean if a bitstring contain (or not) the magic string.

-spec magic() -> Return when
      Return :: bitstring().
magic() ->
  <<16#ed, 16#ab, 16#ee, 16#db>>.

-spec magic(Bitstring) -> Return when
    Bitstring :: bitstring(),
    Return :: boolean().
magic(<<Magic:32/bitstring, _/bitstring>> = _Bitstring) ->
    Magic =:= magic().

magic_test() ->
    ?assertEqual(magic(magic()) =:= true),
    ?assertEqual(magic(<<0,0,0,0>> =:= false).

rpm:lead/1 function is a bit more complex but uses Erlang pattern matching feature to easily cut the bitstring and convert it in map. Note, all the strings and values are not sanitized.

-spec lead(Bitstring) -> Return when
      Bitstring :: bitstring(),
      Return :: {map(), Rest},
      Rest :: bitstring()}.
lead(<< Magic:32
      , Major:8
      , Minor:8
      , Type:16
      , Arch:16
      , Name:(8*66)
      , OS:16
      , SignatureType:16
      , Reserved:(8*16)
      , Rest/bitstring >>) ->
  true = magic(Magic),
  { #{ major => Major
    , minor => Minor
    , type => Type
    , arch => Arch
    , os => OS
    , signature_type => SignatureType
    , reserved => Reserved
    },
    Rest
  }.
  

By using hexdump we can have a big picture of the content of the RPM.

$ hexdump -C ed-1.14.2-4.el8.x86_64.rpm | head 6
00000000  ed ab ee db 03 00 00 00  00 01 65 64 2d 31 2e 31  |..........ed-1.1|
00000010  34 2e 32 2d 34 2e 65 6c  38 00 00 00 00 00 00 00  |4.2-4.el8.......|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000040  00 00 00 00 00 00 00 00  00 00 00 00 00 01 00 05  |................|
00000050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

The same data can be extracted direcly in Erlang by using file:read_file/1 function. I tried to make the bitstring readable, the content is the same than the output of hexdump, but in decimal.

1> {ok, <<B:736, _/bitstring>> = file:read_file("ed-1.14.2-4.el8.x86_64.rpm").
2> B.
<< 237,171,238,219, 3,0,0,0, 
       0,1,101,100, 45,49,46,49,
       52,46,50,45, 52,46,101,108,
          56,0,0,0, 0,0,0,0,
           0,0,0,0, 0,0,0,0,
           0,0,0,0, 0,0,0,0,
           0,0,0,0, 0,0,0,0,
           0,0,0,0, 0,0,0,0,
           0,0,0,0, 0,0,0,0,
           0,0,0,0, 0,1,0,5,
           0,0,0,0, 0,0,0,0,
           0,0,0,0 
>>

# C structures and Erlang pattern matching

When I worked on my Erlang ZFS implementation proof of concept some years ago, I had the issue to migrate all the C internal structures in Erlang. It was not cool, at all. At this time, I decided to do it manually, but with the idea in mind to create something to easily import C structures (or structures from other languages) in Erlang. Maybe this is the time to do it?