Helping to clean CPAN: September 2016
… where I get out my virtual broom and sweep up cruft in my assigned distribution for this month’s edition of the CPAN Pull Request Challenge.
This month’s module: PDF::API2
PDF::API2 facilitates the creation and modification of PDF files.
First impressions
In this section I try to get a feeling for the state of the module, how up to date it is, how often people are contributing to it, how many other distributions are depending on it, how many bugs/issues it currently has, what the CPANTS kwalitee is, etc.
- last commit 9th of June 2016 (https://github.com/ssimms/pdfapi2)
- latest release 9th of June 2016 (https://metacpan.org/release/PDF-API2)
- issues are disabled on GitHub
- issues are tracked on RT: https://rt.cpan.org/Public/Dist/Display.html?Name=PDF-API2
- 27 open issues on RT
- 2 open pull requests
- 22 reverse dependencies via metacpan
- 0 test failure on cpantesters: http://matrix.cpantesters.org/?dist=PDF-API2+2.028
- CPANTS report (http://cpants.cpanauthors.org/release/SSIMMS/PDF-API2-2.028)
- core issues:
- has_license_in_source_file
- fixed in PR#11
- prereq_matches_use
- fixed in PR#12
- use_strict
- has_license_in_source_file
- extra issues:
- has_known_license_in_source_file
- has_meta_json
- fixed in PR#13
- use_warnings
- fixed in PR#15
- experimental issues:
- core issues:
Initial inspection of the source code
After forking the repo and cloning a local copy, let’s have a look at the project to see what build system it uses, if the test suite works and the tests pass, if it could do with a Travis-CI config file (or if present, if it can be updated).
- uses
Dist::Zilla
- no Travis config: maybe add one?
- added Travis config in PR#5
- there is no
META.yml
orMETA.json
in the repository; looks likeMETA.yml
is automatically generated, since it’s in the dist, however there’s noMETA.json
in the dist. - needed to run
dzil authordeps --missing | cpanm
in order to rundzil test
on my box - tests pass, however several
Use of uninitialized value
warnings turn up:t/gif.t ..................... Use of uninitialized value $_[1] in read at /home/cochrane/perl5/perlbrew/perls/perl-5.18.4/lib/5.18.4/x86_64-linux/IO/Handle.pm line 463. t/tiff.t .................... Use of uninitialized value $_[1] in read at /home/cochrane/perl5/perlbrew/perls/perl-5.18.4/lib/5.18.4/x86_64-linux/IO/Handle.pm line 463. Use of uninitialized value $_[1] in read at /home/cochrane/perl5/perlbrew/perls/perl-5.18.4/lib/5.18.4/x86_64-linux/IO/Handle.pm line 463. Use of uninitialized value $_[1] in read at /home/cochrane/perl5/perlbrew/perls/perl-5.18.4/lib/5.18.4/x86_64-linux/IO/Handle.pm line 463. Use of uninitialized value $_[1] in read at /home/cochrane/perl5/perlbrew/perls/perl-5.18.4/lib/5.18.4/x86_64-linux/IO/Handle.pm line 463. Use of uninitialized value $_[1] in read at /home/cochrane/perl5/perlbrew/perls/perl-5.18.4/lib/5.18.4/x86_64-linux/IO/Handle.pm line 463. Use of uninitialized value $_[1] in read at /home/cochrane/perl5/perlbrew/perls/perl-5.18.4/lib/5.18.4/x86_64-linux/IO/Handle.pm line 463. Use of uninitialized value $_[1] in read at /home/cochrane/perl5/perlbrew/perls/perl-5.18.4/lib/5.18.4/x86_64-linux/IO/Handle.pm line 463. t/tiff.t .................... 1/4 Use of uninitialized value $_[1] in read at /home/cochrane/perl5/perlbrew/perls/perl-5.18.4/lib/5.18.4/x86_64-linux/IO/Handle.pm line 463. Use of uninitialized value $_[1] in read at /home/cochrane/perl5/perlbrew/perls/perl-5.18.4/lib/5.18.4/x86_64-linux/IO/Handle.pm line 463. Use of uninitialized value $_[1] in read at /home/cochrane/perl5/perlbrew/perls/perl-5.18.4/lib/5.18.4/x86_64-linux/IO/Handle.pm line 463. Use of uninitialized value $_[1] in read at /home/cochrane/perl5/perlbrew/perls/perl-5.18.4/lib/5.18.4/x86_64-linux/IO/Handle.pm line 463. Use of uninitialized value $_[1] in read at /home/cochrane/perl5/perlbrew/perls/perl-5.18.4/lib/5.18.4/x86_64-linux/IO/Handle.pm line 463. Use of uninitialized value $_[1] in read at /home/cochrane/perl5/perlbrew/perls/perl-5.18.4/lib/5.18.4/x86_64-linux/IO/Handle.pm line 463. Use of uninitialized value $_[1] in read at /home/cochrane/perl5/perlbrew/perls/perl-5.18.4/lib/5.18.4/x86_64-linux/IO/Handle.pm line 463.
Oddly enough, these warnings don’t appear when the tests are run via
$ prove -lr t
Code Coverage
Looking at the code coverage can give an indication of code quality. If the project is well covered, this means most changes made in pull requests can be made with some confidence that any problems will be caught by the test suite. If the code coverage is low, then this is something that one could address as a pull request (or set of pull requests).
In EUMM and Build::Module
projects, one simply needs to install
Devel::Cover
and run
$ cover -test
In Dist::Zilla
projects, one needs to install the
Dist::Zilla::App::Command::cover
plugin, after which the code coverage can
be checked via:
$ dzil cover
In this distribution, the coverage is:
51.4% statement coverage; 47.5% total coverage.
This is a large code base, so likely to be a large amount of work to get these percentages up higher.
POD checks
The utility podchecker
searches through Perl source code for POD which
might not conform to the POD standard, and thus not necessarily be parseable
by all POD parsers. Fixing any issues found by podchecker
has the
positive effect of also removing any warnings noted in the project’s
documentation displayed on MetaCPAN.
Running podchecker
gives the following errors and warnings:
$ find ./lib -name '*.pm' -or -name '*.pod' | xargs podchecker |& grep ERROR # 1 errors
*** ERROR: Spurious =cut command at line 278 in file ./lib/PDF/API2/Basic/PDF/Pages.pm
$ find ./lib -name '*.pm' -or -name '*.pod' | xargs podchecker |& grep WARN # 3 warnings
*** WARNING: line containing nothing but whitespace in paragraph at line 39 in file ./lib/PDF/API2/Resource/UniFont.pm
*** WARNING: line containing nothing but whitespace in paragraph at line 46 in file ./lib/PDF/API2/Resource/UniFont.pm
*** WARNING: =item type mismatch ('definition' vs. 'number') at line 1032 in file ./lib/PDF/API2.pm
Fixed in PR#6.
Nit-picking
Some projects consider this a must, and will disallow commits to be submitted which contain trailing whitespace (the Linux kernel is an example project where trailing whitespace isn’t permitted). Other projects see whitespace cleanup as simply nit-picking. Either way one sees it personally, this could be a useful pull request to a project, so it’s worthwhile fixing and submitting; the worst that can happen is that the pull request is closed unmerged.
To look for files with trailing whitespace, run git grep ' $'
. We usually
only need to look for files with trailing whitespace and load the list
automatically into vim
:
$ vim $(git grep ' $' | cut -d':' -f1 | sort | uniq)
However, this time we’ve got binary files which match, and a PDF file which
also matches (but isn’t found by grep
to be a binary file) hence we filter
out these entries as well and print a count of the files found.
$ git grep ' $' | grep -v 'Binary file' | grep -v '\.pdf' | cut -d':' -f1 | sort | uniq | wc -l
41
Fixed in PR#7.
perlcritic
Perl::Critic will show up many
potential issues for Perl code. By simply running perlcritic
on the lib
and t
directories, one can get a further handle on the code’s quality.
$ perlcritic lib
$ perlcritic t
There are many files with basic perlcritic violations. Many Code before
strictures
, Bareword file handle
and Two argument "open"
issues.
TIL: perlcritic --verbose '%d\n'
gives a full explanation of the relevant
violation.
Perlcritic violations in tests are fixed in PR#8 and PR#9.
IO-related perlcritic violations fixed in PR#10.
Stale URLs
Links to websites can go out of date, so it’s a good idea to see if they need updating or removing. A quick grep finds all the links. After which, we just need to see which links need fixing, if any.
$ git grep 'http://\|https://'
$ git grep 'ftp://'
$ git grep 'git://'
http://deefs.net no longer exists. It’s mentioned in the Changes, so maybe it’s not a good idea (or not necessary) to update.
The links http://ns.adobe.com/iX/1.0/, http://ns.adobe.com/pdf/1.3/, http://ns.adobe.com/xap/1.0/, http://ns.adobe.com/xap/1.0/mm/, http://purl.org/dc/elements/1.1/ either no longer exist or now point elsewhere. They appear, however, in an example output of the metadata of a PDF file, hence they’re merely dated and aren’t real links as part of the documentation. Therefore, they don’t need to be updated.
Other than that, the links look ok.
Copyright notices
This probably also sits in the nit-picking category for some people, however, copyright dates (theoretically) need to be kept up to date. The appropriate copyright year is usually the year of the last release. However, if a release looks imminent, then the current year is likely to be the right candidate. Some distributions put the author’s email address on the same line as the copyright date, hence this needs to be checked as well.
Files missing current year in copyright
Here we do a case-insensitive grep over the source for the word “copyright”,
the line of which we check for the existence of a year (i.e. 4 digits), look
for the appropriate year and then clean up the grep results to get something
we can pass directly to vim
.
$ git grep -i copyright
$ vim $(git grep -i copyright | grep -P '\d{4}' | grep -v 2015 | cut -d':' -f1 | uniq)
Files missing an email address in copyright
$ vim $(git grep -i copyright | grep -P '\d{4}' | grep -v '@' | cut -d':' -f1 | uniq)
This distribution doesn’t use copyright strings in its source files and
there is no copyright date to update in dist.ini
.
Spell check POD
Good documentation can be a wonder to read. Not everyone’s docs are awesome, however we can keep the error rate to a minimum. A quick spell check will pick up most typos that don’t need to be there and fixing them can help improve the quality of a project.
In general, we want to find all files containing POD and run a spell checker
(e.g. aspell
) over all files, fixing typos we come across as we go. Not
all projects require this much effort, however here’s a general-ish way to
look for and check all POD in a project:
$ files_with_pod=$(find ./lib -name '*.pm' -o -name '*.pod' \
| xargs podchecker 2>&1 \
| grep 'pod syntax' | cut -d' ' -f1)
$ for filename in $files_with_pod
do
pod2text $filename > $filename.txt;
aspell -c $filename.txt;
done
Now look for .bak
file and check differences between it and the output
.txt
file, the process looks roughly like this:
$ find ./ -name '*.bak'
$ diff -u lib/Path/To/Module.pm.pm.txt.bak lib/Path/To/Module.pm.txt
Then update the appropriate .pm
and/or .pod
files as necessary.
Some minor typos found. Fixed in PR#14.
Kwalitee tests
Although CPANTS is the main kwalitee
reference, one can also run the kwalitee tests locally. One can use the
t/kwalitee.t
test script from
http://peateasea.de/cpan-pull-request-challenge/ for this purpose.
However, the script only uses Test::Kwalitee
which doesn’t cover as many
metrics as CPANTS.
Test::Kwalitee::Extra uses
set of metrics closer to that used on CPANTS, so replace the kwalitee_ok
call with simply use Test::Kwalitee::Extra
. More information about the
many options to Test::Kwalitee::Extra
can be found on the MetaCPAN page.
Run the kwalitee tests in an ExtUtils::MakeMaker
or Build::Module
distribution like so:
$ RELEASE_TESTING=1 prove t/99kwalitee.t
or, if the distribution uses Dist::Zilla
, run
$ dzil test --author --release
Effectively repeats the warnings found on CPANTS mentioned in the “First impressions” section.
GitHub issues
GitHub issues are disabled; all issues are handled on RT.
RT issues
Unfortunately, the time to address any RT issues ran out for this month…
Overview of the pull requests made
- Add initial Travis-CI config - https://github.com/ssimms/pdfapi2/pull/5
- Fix podchecker issues - https://github.com/ssimms/pdfapi2/pull/6
- Purge trailing whitespace - https://github.com/ssimms/pdfapi2/pull/7
- Use warnings and strict pragmas in tests - https://github.com/ssimms/pdfapi2/pull/8
- Replace string form eval with block form in tests - https://github.com/ssimms/pdfapi2/pull/9
- Fix IO-related perlcritic warnings - https://github.com/ssimms/pdfapi2/pull/10
- Add explicit LICENSE section to main source POD - https://github.com/ssimms/pdfapi2/pull/11
- Add missing prereq to dist.ini - https://github.com/ssimms/pdfapi2/pull/12
- Generate META.json when building dist - https://github.com/ssimms/pdfapi2/pull/13
- Fix typos in POD - https://github.com/ssimms/pdfapi2/pull/14
- Add warnings pragma to all files missing it - https://github.com/ssimms/pdfapi2/pull/15
- Use strict pragma wherever possible - https://github.com/ssimms/pdfapi2/pull/16
Conclusion
Getting the option parsing code working with use strict
will likely be
difficult, since options are handled sometimes as arrays and sometimes as
hashes. Some domain knowledge is likely to be necessary to get this right,
hence I’ve left this issue for this month.
Unfortunately, I only really managed to get stuck into working on this dist in the last week of the month, so consequently couldn’t do as much as I’d hoped. Since the distribution has been updated and released within the last few months, it’s likely that the current maintainer will comment on or merge these PRs sometime in the future. At the very least, I hope that these contributions have helped!