WARNING
This is a draft
Recently, I worked on packaging different projects, on my past mission, I was working on packaging different Java project based on MVN, stored on Nexus, for RHEL distribution. Some weeks ago, I was working on packaging barrel-rocksdb library for Debian, Fedora, CentOS and Ubuntu (probably BSD systems sooner or later). That seems amazing, but... Really, we are in 2020 and it's always a nightmare to packaging things for different systems!
In my non-satisfaction, I decided to play around some packaging format
to see if we can do something cool with all of that. I started to work
on RPM format and... it was a shame to find nothing about the design
of the RPM packet. I am not talking about the SPEC file used to
build the packaging, but on the different structure and convention
used to make the RPM portable between different architecture. To be
clear, I was looking for RPM developer documentation. The first link
found on DuckDuckGo was the description of the RPM file
format (opens new window). Unfortunately,
the subtitle of the paper was pretty clear: "NOTE That this is a
draft, and does not perfectly match existing RPM format".
After some research, I found a nice description of the RPM file format from the Linux Standard Base Core (opens new window), where the C structures are defined and with a little explication. I also find Dissecting the RPM file format (opens new window), a really good article written by an hacker view. The topic is not about C, but about Erlang and how to port the different libraries and format in this language by using the many internal features.
Before writing code, I will assume you will have downloaded a small RPM packages to make all the tests possible. I decided to choose ed editor (opens new window) to be my guinea pig.
# Introduction
As usual, I like to show you all the steps of my projects and
thinking. So, we will start by bootstrap our system by installing
Erlang and rebar3. By the way, I am still using OpenBSD, but I think
the steps will be pretty similar on any other Linux distributions or
OSes. I will first prepare my environment.
mkdir ~/bin
echo 'export PATH=${PATH}:${HOME}/bin' >> ~/.profile
Erlang is needed, so, we will use pkg_add to install it.
pkg_add erlang
rebar3 is the standard project manager but it is not installed by
default and not packaged on OpenBSD. Anyway, the build and
installation is trivial.
mkdir ~/src
cd ~/src
git clone https://github.com/erlang/rebar3
cd rebar3
./bootstrap
ln -s $(pwd)rebar3 ${HOME}/bin/rebar3
Finally, I can create my library and initialize git.
mkdir ~/projects
cd projects
rebar3 new lib name=rpm
cd rpm
git init .
# Structure of RPM file
RPM file is split in four sections:
the lead section to identify the package;
the signature section to verify the integrity of the package;
the header section containing package information;
and finally the payload containing the files.
If you are interested to see the code source, you can take a look on the RPM github repository (opens new window).
# Lead Section
The lead section has a fixed size and is defined by the
rpmlead (opens new window)
C data structure. Here the code from the official documentation. The
full size of the structure is 736bits ot 92bytes.
struct rpmlead {
unsigned char magic[4];
unsigned char major, minor;
short type;
short archnum;
char name[66];
short osnum;
short signature_type;
char reserved[16];
} ;
In Erlang, we can create a function called rpm:lead/1. This one will
return a high level data-structure (a map) containing the same
fields. The magic pattern will be a function too, called
rpm:magic/0, returning the fixed bytes used to identify a RPM
package. Another function will also be created and called
rpm:magic/1 returning a boolean if a bitstring contain (or not) the
magic string.
-spec magic() -> Return when
Return :: bitstring().
magic() ->
<<16#ed, 16#ab, 16#ee, 16#db>>.
-spec magic(Bitstring) -> Return when
Bitstring :: bitstring(),
Return :: boolean().
magic(<<Magic:32/bitstring, _/bitstring>> = _Bitstring) ->
Magic =:= magic().
magic_test() ->
?assertEqual(magic(magic()) =:= true),
?assertEqual(magic(<<0,0,0,0>> =:= false).
rpm:lead/1 function is a bit more complex but uses Erlang pattern
matching feature to easily cut the bitstring and convert it in
map. Note, all the strings and values are not sanitized.
-spec lead(Bitstring) -> Return when
Bitstring :: bitstring(),
Return :: {map(), Rest},
Rest :: bitstring()}.
lead(<< Magic:32
, Major:8
, Minor:8
, Type:16
, Arch:16
, Name:(8*66)
, OS:16
, SignatureType:16
, Reserved:(8*16)
, Rest/bitstring >>) ->
true = magic(Magic),
{ #{ major => Major
, minor => Minor
, type => Type
, arch => Arch
, os => OS
, signature_type => SignatureType
, reserved => Reserved
},
Rest
}.
By using hexdump we can have a big picture of the content of the RPM.
$ hexdump -C ed-1.14.2-4.el8.x86_64.rpm | head 6
00000000 ed ab ee db 03 00 00 00 00 01 65 64 2d 31 2e 31 |..........ed-1.1|
00000010 34 2e 32 2d 34 2e 65 6c 38 00 00 00 00 00 00 00 |4.2-4.el8.......|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 05 |................|
00000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
The same data can be extracted direcly in Erlang by using
file:read_file/1 function. I tried to make the bitstring readable,
the content is the same than the output of hexdump, but in decimal.
1> {ok, <<B:736, _/bitstring>> = file:read_file("ed-1.14.2-4.el8.x86_64.rpm").
2> B.
<< 237,171,238,219, 3,0,0,0,
0,1,101,100, 45,49,46,49,
52,46,50,45, 52,46,101,108,
56,0,0,0, 0,0,0,0,
0,0,0,0, 0,0,0,0,
0,0,0,0, 0,0,0,0,
0,0,0,0, 0,0,0,0,
0,0,0,0, 0,0,0,0,
0,0,0,0, 0,0,0,0,
0,0,0,0, 0,1,0,5,
0,0,0,0, 0,0,0,0,
0,0,0,0
>>
# C structures and Erlang pattern matching
When I worked on my Erlang ZFS implementation proof of concept some years ago, I had the issue to migrate all the C internal structures in Erlang. It was not cool, at all. At this time, I decided to do it manually, but with the idea in mind to create something to easily import C structures (or structures from other languages) in Erlang. Maybe this is the time to do it?