Open Sourcing in Enterprises – Managing your Code

This article follows on from a previous Introduction to Open Source in Enterprises post by Henrik Axelsson.

These days anyone can create a Git repository on GitHub.com and publish their code. Although it’s a very straight forward process, the real challenge lies in hiding any sensitive information in the code before publishing it to the open source repo.

This is of particular concern for organisations who may have developed applications initially to suit their own needs but are now looking at open sourcing to contribute to the wider community. Enterprise organisations are generally risk-adverse at the best of times, and understandably the possibility of sensitive information being made public results in a hesitation to open source.

I speak from experience, having gone through this with one of our customers when we successfully managed to open source a centralised compliance monitoring & reporting tool called Watchmen. You can view the source code at Watchmen on GitHub, or learn more about this project via the following case study and posts:

For this article, we will focus specifically on how enterprise organisations can best manage their code when looking to open source.

CHALLENGES IN OPEN SOURCING ENTERPRISE APPLICATIONS

A common challenge for organisations is the inability to open source the code because it has company-specific configs (e.g. IP addresses / database configs). The usual solution to this would be removing the configurations and replacing it with dummy data before publishing to public Git repo.

However, testing your application with dummy data can be a challenge in itself and results in poor auditability of above changes because it’s just an interim step. This will eventually lead to creating two separate repos (public & private) with exactly same source code but different configs.

Pulling content from an open source repo to private repo involves redoing config changes, and since the above process can’t be automated, the potential for human error is significant.

As a result, these challenges tend to prohibit frequent contributions to open source repository due to the time and effort required.

SO, WHAT CAN WE DO ABOUT IT?

Short answer – separate your organisation-specific data from rest of the code.

You might notice that the root cause of the above challenges is a coupling of your config data (say ‘x’) with the source code (say ‘y’). Of course they need to be together so that your app can function. But bundling them in a single code structure takes away the flexibility of using any other config data (say ‘z’) with your code. So, you end up with:

‘x’ + ‘y’ = Private Repo

‘z’ + ‘y’ = Public Repo

Where, your source code ‘y’ is common between both repos i.e. duplication. Thus, it’s important to remove this coupling between data & code to be able to effortlessly open source your app.

HOW DO WE ACHIEVE THIS?

The process of decoupling data & code involves a one-off restructuring of your application codebase. This might sound like a lot of work, but it isn’t. Below are few guidelines which might help you achieve the awesome.

  • Keep your config files in a separate folder (e.g. ./config/)
  • Refactor your code and ensure config data can be loaded from this folder
  • Make use of Git-provided tools like Git Submodules or Git Subtree to keep data & code modular
  • If you feel it’s appropriate or needed, try to employ a configure / make / make install strategy. You can leverage configure technique to get your app ready for execution. I found a decent article here which outlines how this technique works.

As mentioned, if you separate your configs by moving them into a different repo, you can use ‘Git Submodules’ or ‘Subtree’ in your source code repo. This will allow you to simply keep only your code in the open source repo and import all sensitive information (i.e. configs) during runtime from the private repo. You can supply some sample configuration data in a folder like ./example.config/ (which is technically not useful to the code unless renamed to ./config/).

So, if we take our previous example of ‘x’, ‘y’ and ‘z’ repos, you will technically end up with:

‘x’ = Your configs (accessible privately)

‘y’ = Your source code (open sourced)

Hence, during execution time, you can use ‘y’ as a base application and import any config data i.e. either ‘x’ or ‘z’ into the ./config/ folder. You can use this strategy with not just config files but to secure any other information that you want to exclude from open sourcing.

If you’d like more information, or have any questions about this post or open source for your organisation, please drop us a line!

Enjoyed this blog?

Share it with your network!

Move faster with confidence