Parsing the Linux procfs

Daniel Trugman included in Programming Internals Linux CPP procfs

2021-02-21 1576 words 8 minutes

Contents

I’m writing C++ code for Unix environments for almost a decade now. And the funny thing is that in the last five years, while working for three different companies, I had to write a new procfs parsing library every time. All those companies were developing security products monitoring applications running on Linux machines, so using the procfs was inevitable. But over time, necessity became great admiration. I simply grew very fond of the amazing creation called procfs. Having the honor of writing Linux kernel code that exported values using the same mechanisms, I also experienced the other end of things and liked it even more.

Since I haven’t read many blogs posts from library authors, and it’s been a while since I got to writing, I decided to document the mental process I was going through. Hopefully it will make for an interesting reading or better yet, expose me to readers and ideas I can learn from. This blog post is all about me writing the pfs library, the decisions I’ve made and why I made them.

Styling

I try to be true to myself and admit to the faults I’m aware of. One of those is a “slight” OCD. For example: Looking at a standard-sized screen with 60 lines of code in scope, I can easily loose focus over some misaligned lines. It bugged me when I started coding, it bugs me at work and it will probably always bug me. I believe that a huge part of our ability to percept and understand code relies upon our subconscious’ ability to sync with what it sees and know what to expect. That is why any coding standard is always better than no standard. I’ve seen great developers throw away hours trying to read and understand code that was simply formatted differently than what they were used to (No, not the ones that do code reviews with IDA). And for those of you that make fun of that, just remember great developers are usually structure people with very developed logical skills. To make use of those skills, one has to make sure that all the pieces come together nicely.

All in all, I decided to prevent this issue from ever happening here by simply adding a clang-format configuration with a pre-commit hook. I really hope this project will grow and others would like to participate, and making sure the code is accessible & attractive to others is a huge deal.

Naming conventions

It was a very long and hard process. I tried at least 3 different styles but then decided to go with (almost) the same convention as the standard library.

Pros:

Familiar and feels native for most C++ developers
Used by many other widely-used libraries, including: boost, fmt & spdlog
Conveys the fact it’s a library, as most applications use different styles (Yes, I get the irony in that, but still…)

Cons:

Since type names are all lowercase, you sometimes cannot use straightforward names for your variables.

One deviation I took from the standard library was the chosen style for private enums and constant values. I felt like keeping those lowercase as well makes it really hard to follow parsing code with many constants, so I decided to change those to the C POSIX style of uppercase snake_case.

API

Personally, I write code using VIM, without any intelligent form of auto-completion. I do that when I write code in C, C++, Bash, Python or any other language I’m well acquainted with. Why am I telling you this? Because for me, predictable APIs are the everything. I like the standard library, because all containers behave the same. Yes, I know you can’t emplace_back into an std::set, but that’s an easy constraint to follow if you know your containers.

That is why I tried to keep the names of the APIs as easy to predict as possible. I asked myself, what do I expect someone to do when trying to dig some piece of information from /procfs? And the answer that came to me was:

Use an interactive shell to find the file that contains that bit
Convert the path into a function call: /proc/net/tcp6 will become procfs().get_net().get_tcp6()

But since all rules are made to be broken, there’s also a (really-tiny) single exception from that rule. Since some APIs are actually wrappers to underlying directories, for example /proc/[pid]/fd, I decided that APIs for directories are going to be in plural form: procfs().get_task([pid]).get_fds().

Break-down into sub-APIs (and sub-classes)

Having all the methods under a single API simply wasn’t an option. If you haven’t figured it out already, I’m a devout follower of certain coding practices. One of those is breaking down code into small units, each with a single responsibility.

Luckily enough, the procfs is already broken down quite nicely, and following the same structure was the trivial go-to solution.

All files directly under procfs were mapped to methods of the procfs object. And that object is the only “entry point” in the library. If one wants to interact with a specific task, it can ask the procfs object to get a task object for him. If you wish to get some information from the network namespace of that task, you can ask for the net object.

The decision to allow creation of task objects only through the procfs object obtains two goals:

Seamless support for systems where the procfs subsystem is mounted at custom locations - Once you create the root procfs object with the actual path, all the objects created by it are already aware of this “system configuration”, sparing you the headache of always passing the path when initializing additional objects during runtime.
Allow easier future extensibility. Since all properties can be seamlessly propagated down from the main procfs object into the task objects.

Types and return values

Since this is a “glorified” parser, and parsers are all about getting the information you want, I decided that return values ought to be as simple to use as possible, and decided to use only enums and structs (rather than classes). I refrained from using accessors (getters or setters) because I didn’t presume to predict all the possible use cases. It felt more natural to just return the raw information and let the caller handle it.

Next were the variable names inside the structs. I tried to keep the names as close to the ones described in the ultimate guide, proc(5). Wherever there weren’t specific names, I tried to be consistent with the naming conventions of Linux.

There were some types, such as the ip or cap_mask structs, where I wrapped the raw values and added helper methods. At the time of writing, I didn’t know what potential users of this library might want. I merely assumed useful utilities for certain types, such as the per-capability getter to capability masks (cap_mask), might definitely be a well-received enhancement.

Another decision I made to preserve simplicity and explicitness was to avoid aliases for standard container types. For example, get_cmdline returns an std::vector<std::string>, and not some cmdline_args type. I found that this explicitness in the API helpful in predicting output format, and thus supportive of my main design goal.

Error handling

I guess error handling is one of the major pain points in modern C++ writing. The official standpoint, as far as the language moderators are concerned, is to use exceptions to make it cleaner and safer. But, when big organizations as Google formally discourage exceptions in their coding guidelines, the decision is never easy. Truth be told, we have the same anti-exceptions convention at work. In my opinion, both forms can work well, and the most important thing here is to be consistent. Since I’m using the standard library, the code is susceptible to exceptions thrown from it, mostly bad_alloc . I didn’t want to wrap all my methods internally and translate allocation issues into ENOMEM, and I also wanted the ability to write RAII code. So instead, I decided to provide a consistent experience, hoping that when users call the library’s APIs, they are sensible enough to handle underlying exceptions as well.

What’s next

One of the harder decisions I had to make was: “What is the MVP?”

Every company know that this question is super important when first introducing new products and/or features to the market. If it’s too premature, customers simply won’t use it. If it’s too mature, there’s a high chance you invested a lot of work into extra features no one is going to use.

If you look at the code I published for v0.1.0 (the first, latest and only version available at this time), I decided to cover 80% of the most-interesting (my subjective view of course) files. And of course, some of those files were parsed to the bit resolution; such as capability masks from /proc/[pid]/status, whereas others are just a single string users can embed into their logs files, such as /proc/cmdline . I decided to see how people react to that before investing more time into it.

I do however have plenty of ideas in the pipeline, and I can only hope that this project gets some traction and I can put them into play.

That’s it, I hope this was an easy and interesting read. You can find the source code for this project on Github. I’ll be happy to hear you opinions, and even happier to know you are using it.