Tuesday 20 September 2016

Anatomy of a dope PHP package repository

While contributing to Construct, maintained by Jonathan Torres, I gathered some insights and learnings on the characteristics of a dope PHP package repository. This post summarises and illustrates these, so that PHP package develeopers have a complementary guideline to improve existing or imminent package repositories. Jonathan Reinink did a good job in putting the PHP package checklist out there which provides an incomplete, but solid quality checklist for open-source PHP packages.

I'll distill the characteristics of a dope PHP package repository by looking at the repository artifacts Construct can generate for you when starting the development of a new PHP project or micro-package. The following tree command output shows most of the elements this post will touch upon. The artifacts in parenthese are optional and configurable from Construct but can nonetheless have an import impact on the overall package quality.

├── <package-name>
│   ├── CHANGELOG.md
│   ├── (CONDUCT.md)
│   ├── composer.json
│   ├── composer.lock
│   ├── CONTRIBUTING.md
│   ├── (.editorconfig)
│   ├── (.env)
│   ├── (.env.example)
│   ├── (.git)
│   │   └── ...
│   ├── .gitattributes
│   ├── (.github)
│   │   ├── CONTRIBUTING.md
│   │   ├── ISSUE_TEMPLATE.md
│   │   └── PULL_REQUEST_TEMPLATE.md
│   ├── .gitmessage
│   ├── .gitignore
│   ├── (.lgtm)
│   ├── LICENSE.md
│   ├── (MAINTAINERS)
│   ├── (.php_cs)
│   ├── (phpunit.xml.dist)
│   ├── README.md
│   ├── (docs)
│   │   └── index.md
│   ├── src
│   │   └── Logger.php
│   ├── tests
│   │   └── LoggerTest.php
│   ├── .travis.yml
│   ├── (Vagrantfile)
│   └── vendor
│           └── ...

Definition of a dope PHP package repository

Before jumping into the details, let's define what could be considered as a dope package repository. Therefor, being lazy, I'm going to simply reword this classic quote from Michael Feathers
> Clean code is code that is written by someone who cares.
to
> A dope PHP package repository is one that is created and maintained by someone who cares.

Artifact categories

The next shown pyramid illustrates the three main categories the artifacts of a package repository will fall into.
First and most important there's the main sourcecode, it's tests or specs, and the documentation which could be dependent on it's size reside in a README.md section or inside a dedicated docs directory. Using a docs directory also allows publishing the documentation via GitHub pages. Other aspects of a package which should be documented are the chosen license, how to contribute to the package, possibly a code of conduct to comply with, and the changes made over the lifespan of the package.

Second there's the configuration for a myriad of tools like Git, GitHub, EditorConfig, Composer, the preferred testing framework, the preferred continuous inspection / integration platform such like Scrutinizer or Travis CI, and so forth.

The final category includes tools which ease the life of maintainers and potential contributors equally. These tools can be helpful for releasing new versions, enforcing coding standard compliance, or commit message quality and consistency.

Consistency

Sourcecode

All sourcecode and accompanying tests or specs should follow a coding standard (PSR-2) and have a consistent formatting style, there's nothing new here. The perfect place to communicate such requirements is the CONTRIBUTING.md file.

Tools like PHP Coding Standards Fixer or PHP_CodeSniffer in combination with a present configuration .php_cs|ruleset.xml.dist and a command wrapping Composer script are an ideal match to ease compliance. The Composer script cs-fix shown next will be available for maintainers and contributors alike.

composer.json
{
    "__comment": "omitted other configuration",
    "scripts": {
        "cs-fix": "php-cs-fixer fix . -vv || true"
    }
}
Consistent formatting styles like line endings, indentation style, and file encoding can be configured via an EditorConfig configuration residing in .editorconfig which will be used when supported by the IDE or text editor of choice.

Artifact naming and casing

Like sourcecode formatting and naming, repository artifacts should also follow a predictable naming scheme. All documentation files should have a consistent extension like .md or .rst and the casing should be consistent throughout the package repository. Comparing
├── <package-name>
│   ├── changelog.md
│   ├── code_of_conduct.md
│   ├── ...
│   ├── .github
│   │   └── ...
│   ├── LICENSE
│   ├── Readme.md
│   ├── roadmap.rst
to
├── <package-name>
│   ├── CHANGELOG.md
│   ├── CODE_OF_CONDUCT.md
│   ├── ...
│   ├── .github
│   │   └── ...
│   ├── LICENSE.md
│   ├── README.md
│   ├── ROADMAP.md
I would favour the later one anytime for it's much easier reading flow and pattern matchableness. The easier reading flow is achieved by the upper casing of the *.md files which also clearly communicates their documentation character.

The configuration files for tools which except the .dist file extension per default should all have such one like shown next.
├── <package-name>
│   ├── build.xml.dist
│   ├── phpunit.xml.dist
│   ├── ruleset.xml.dist
│   ├── ...

Commit message format

Next to the package's changelog, incrementally growing in the CHANGELOG.md file, the Git commit messages are an important source of change communication. Therefor they should also follow a consistent format which improves the reading flow while also leaving a professional impression. This format can be documented once again in the CONTRIBUTING.md file or even better be provided via a .gitmessage file residing in the package's Git repository.

Once more a Composer script, named configure-commit-template here, can ease configuration and if configured Git will use it's content when committing without the -m|--message and -F|--file option.

composer.json
{
    "__comment": "omitted other configuration",
    "scripts": {
        "configure-commit-template": "git config --add commit.template .gitmessage"
    }
}
To enforce commit message formatting adherence to the rules described by Chris Beams on a Git hook level, the git-lint-validators utility by Billie Thompson can be helpful.

Versioning

Release versions should follow the semantic versioning specification aka SemVer, once again there's nothing new here. When using version numbers in the sourcecode or CLI binaries, these should be in sync with the set Git tag. Tools like RMT or self-written tools should be utilised for this mundane task.

The next shown code illustrates such a simple self-written tool named application-version. It's main purpose is to set the provided version number in the CLI application's binary and avoid an application version and Git tag mismatch.

bin/application-version
#!/usr/bin/env php
<?php
$binApplicationName = '<bin-application-name>';
$binFile = __DIR__ . DIRECTORY_SEPARATOR . $binApplicationName;
list($void, $binFileRelative) = explode($binApplicationName, $binFile, 2);
$shortBinFilePath = $binApplicationName . $binFileRelative;

$options = getopt('v:ch', ['version:', 'current', 'verify-tag-match', 'help', 'current-raw']);

$help = <<<HELP
This command sets the version number in the {$shortBinFilePath} file:
Usage:
  application-version [options]
Options:
  -c, --current, --current-raw   The current version number
  --verify-tag-match             Verify application version and Git tag match
  -v, --version                  The version number to set
  -h, --help                     Display this help message
HELP;

if (array_key_exists('h', $options) || array_key_exists('help', $options)) {
    echo $help;
    exit(0);
}

/**
 * Return the application version.
 *
 * @param  string $binFile File holding the application version.
 * @return string
 */
function get_application_version($binFile) {
    $matches = [];
    $match = preg_match(
        '/(\d+\.)?(\d+\.)?(\*|\d+)/',
        file_get_contents($binFile),
        $matches
    );
    return trim($matches[0]);
}
/**
 * Return latest tagged version.
 *
 * @return string
 */
function get_latest_tagged_version() {
    exec('git describe --tags --abbrev=0', $output);
    return trim($output[0]);
}

if (array_key_exists('verify-tag-match', $options)) {
    $applicationVersion = 'v' . get_application_version($binFile);
    $latestGitTag = get_latest_tagged_version();
    if ($applicationVersion === $latestGitTag) {
        echo "The application version and Git tag match on {$latestGitTag}." . PHP_EOL;
        exit(0);
    }
    echo "The application version {$applicationVersion} and Git tag {$latestGitTag} don't match." . PHP_EOL;
    exit(1);
}

if (array_key_exists('current-raw', $options)) {
    echo get_application_version($binFile) . PHP_EOL;
    exit(0);
}

if (array_key_exists('c', $options) || array_key_exists('current', $options)) {
    $applicationVersion = 'v' . get_application_version($binFile);
    $latestGitTag = get_latest_tagged_version();
    echo "Current version set in {$shortBinFilePath} is {$applicationVersion}." . PHP_EOL;
    echo "Current tagged version {$latestGitTag}." . PHP_EOL;
    exit(0);
}

if ($options === []) {
    echo 'No options set.' . PHP_EOL;
    exit(1);
}

$version = isset($options['version']) ? trim($options['version']) : trim($options['v']);
$fileContent = file_get_contents($binFile);
$fileContent = preg_replace(
    '/(.*define.*VERSION.*)/',
    "define('VERSION', '$version');",
    $fileContent
);
file_put_contents($binFile, $fileContent);
echo "Set version in {$shortBinFilePath} to {$version}." . PHP_EOL;
exit(0);
The application-version tool could further be utilised in Travis CI builds, to avoid the earlier mentioned version differences, like shown in the next .travis.yml diggest. On an application version and Git tag mismatch the shown build script will break the build early.

.travis.yml
language: php

# omitted other configuration

script:
  # Verify application version and Git tag match
  - php bin/application-version --verify-tag-match
  # omitted other scripts

Lean builds

To speed up continuous integration builds, resource and time consuming extensions like Xdebug should be disabled when not required for measuring code coverage. The next shown before_script, tailored for Travis CI, is generated by Construct per default and might shave off a few build seconds and thereby provide a faster feedback.

.travis.yml
language: php

# omitted other configuration

before_script:
  - phpenv config-rm xdebug.ini || true
  # omitted other before_scripts
To reduce email traffic, the email notifications send by Travis CI should be reduced to a minimum like shown next, or dependent on your workflow the could be disabled at all.

.travis.yml
language: php

# omitted other configuration

notifications:
  email:
    on_success: never
Something I really would love to be supported by Travis CI is a feature to ignore a set of definable artifacts which could be configured in a .buildignore file or the like. This way wording or spelling changes on non build relevant artifacts like the README.md wouldn't trigger a build and misspend resources and energy. There's a related GitHub issue and here's hope it will be revisited in the near future.

Lean releases

To keep releases (or dists in Composer lingo) of PHP projects or micro-packages as lean as possible, their repositories should contain a complete and valid .gitattributes file. With such a file present all export-ignored files will be excluded from release archives and thereby save a significant amount of bandwith and energy.

The next code shows the content of such a .gitattributes file excluding non release relevant files like internal tools, configuration, and documentation artifacts. If for some reasons you require the complete source of a PHP project or micro-package you can bypass the default by using Composer's --prefer-source option.

.gitattributes
* text=auto eol=lf

.editorconfig export-ignore
.gitattributes export-ignore
.github/ export-ignore
.gitignore export-ignore
.gitmessage export-ignore
.php_cs export-ignore
.travis.yml export-ignore
bin/application-version export-ignore
bin/release-version export-ignore
bin/start-watchman export-ignore
CHANGELOG.md export-ignore
LICENSE.md export-ignore
phpunit.xml.dist export-ignore
README.md export-ignore
tests/ export-ignore
To validate the .gitattributes file of a PHP project or micro-package on the repository, Git HEAD, or build level the LeanPackageValidator CLI can be helpful.

Avoid badge posing

Badges, if used sparsely, are a handy tool for visualising some of the repository properties. It's definitely nice to immediately see the required PHP version, the current build status, or the latest version of the package as they save you manual look ups.

Badges showing the amount of downloads, code coverage, or the chosen license are in my opinion kind of poserish, they cause unnecessary requests to the badge service, and are in case of the license even obsolete.

Why should you care about the dopeness of a PHP package repository?

Creating and maintaining a dope PHP package repository might have a positive impact on several levels. It can earn you some valuable Brownie points or even provide a conversation gambit when doing job interviews, simply because you showcase professionalism.

Furthermore it's more likely to get valuable and high quality contributions from your second target audience when supportive documentation and tooling for things like issue creation, coding standards compliance, or Git commit message consistency are available.

It also might convince an end user, your main target audience, in using your package over a competitive one.

Le fini

So these were my recent insights and learnings on the anatomy of a dope PHP package repository. If you'r lucky to be attending this year's ZendCon, I recommend to catch Matthew Weier O'Phinney's session about Creating PHPantastic packages. I definitely be waiting for the related slides.

Happy packaging.

No comments: