Tuesday 11 March 2008

Sniffing refactoring needs

While still messing around with the PHP_CodeSniffer Pear package, I took a somehow jealous 1000 feet look at some prominent code inspection tools of the Java community: PMD and Checkstyle. Browsing their available rule sets/checks I soon recognized that guaranteeing the coding standard adherence is just a partial aspect of these tools. The following post will focus on one of these additional aspects, which is finding problems related to the code size of the inspected code artifacts, by showing how to port two selected rules to sniffs for utilization with the PHP_CodeSniffer tool. These ported sniffs can be used to automatically spot obvious code smells in the code base and to signal the need to apply the appropriate and odour reducing activity known as refactoring. Further more a complete set of code size sniffs, going beyond the trageted realm of the sniffs to come, could be used to speed up the feedback loop and to reduce the effort for manual code reviews.

PMD has a rule named TooManyMethods which will fail when the inspected class is housing to much methods. In this case the class is probably breaking the Single responsibility principle and a refactoring i.e. Extract Class should be applied to reduce complexity and take along the additional advantages like an increased testability and a clear separation of concerns.

The following port of this rule simply counts the methods of the class under inspection and decides upon this count whether to raise an error, which will be reported, or not. It does so by analyzing the tokenised representation of the class under inspection and validating the relevant results against programmatically defined sniff criterions. For detailed information about how to roll your own sniffs take a look at the available tutorial of the PHP_CodeSniffer package.

<?php
/**
* 'File-level' PHPDoc Block
*/

/**
* This sniff counts the defined methods of a class and raises an error when
* the configurable method amount per class is exceeded.
*
* An example of the targeted code size violation is:
*
* <code>
* class One
* {
* public function method1()
* {
* ...
* }
* ... // more methods
* public function methodN()
* {
* ...
* }
* }
* </code>
*
* @category PHP
* @package PHP_CodeSniffer
* @author Raphael Stolt <raphael.stolt@gmail.com>
* @license http://matrix.squiz.net/developer/tools/php_cs/licence BSD Licence
* @version Release: @package_version@
* @link http://pear.php.net/package/PHP_CodeSniffer
*/
class Example_Sniffs_CodeSize_TooManyMethodsSniff implements PHP_CodeSniffer_Sniff
{
/**
* Defines the maximum of allowed methods in a inspected class.
*
* @var integer
*/
private $_maxMethodsPerClass = 10;

/**
* Returns the token types this sniff is interested in.
* For an overview of available tokens take a look at the
* PHP_CodeSniffer_Tokens class.
*
* @return array(int)
*/
public function register()
{
return array(T_CLASS);
}
/**
* Processes the tokens that this sniff is interested in.
*
* @param PHP_CodeSniffer_File $phpcsFile The file where the token was found.
* @param int $stackPtr The position in the stack where
* the token was found.
*
* @return void
*/
public function process(PHP_CodeSniffer_File $phpcsFile, $stackPtr)
{
$classTokenIndex = $phpcsFile->findNext(array(T_CLASS), $stackPtr);
$classNameTokenIndex = $phpcsFile->findNext(array(T_STRING), $classTokenIndex);

$className = $phpcsFile->getTokensAsString($classNameTokenIndex, 1);

$methodTokenIndex = $stackPtr;

$foundMethodTokens = 0;

// Get all methods of class and increase counter on appearance

while ($methodTokenIndex = $phpcsFile->findNext(array(T_FUNCTION), $methodTokenIndex + 1)) {
if ($methodNameTokenIndex = $phpcsFile->findNext(array(T_STRING), $stackPtr)) {
$foundMethodTokens+=1;
}
}

// Decide if there is a sniff violation and raise error in case of

if ($foundMethodTokens > $this->_maxMethodsPerClass) {
$message = "Method count in class '{$className}' exceeded "
. "the defined maximum of {$this->_maxMethodsPerClass}"
. ". Found {$foundMethodTokens} methods";
$phpcsFile->addError($message, $stackPtr);
}
}
}
?>
Another rule of PMD, falling in the rule set targeting code size violations, is called ExcessiveMethodLength and can also be easily ported to a PHP_CodeSniffer sniff. A violation of this rule/sniff, by exceeding the configurable method length, indicates again the need to put on the refactoring hat for keeping the code base in a non broken window state. The suitable refactorings in this case would be Extract Method or Move Method.
<?php
/**
* 'File-level' PHPDoc Block
*/

/**
* This sniff counts the lines of a defined method and raises an error when
* the configurable method loc is exceeded.
*
* An example of the targeted code size violation is:
*
* <code>
* class One
* {
* public function tooLongMethod()
* {
* // lots of code exceeding the configurable method loc
* }
* }
* </code>
*
* @category PHP
* @package PHP_CodeSniffer
* @author Raphael Stolt <raphael.stolt@gmail.com>
* @license http://matrix.squiz.net/developer/tools/php_cs/licence BSD Licence
* @version Release: @package_version@
* @link http://pear.php.net/package/PHP_CodeSniffer
*/
class Example_Sniffs_CodeSize_MethodLengthSniff implements PHP_CodeSniffer_Sniff
{
/**
* Defines the maximum of allowed lines in an inspected method.
*
* @var integer
*/
private $_maxMethodLoc = 100;
/**
* Defines whether to exclude comments from the method loc.
*
* @var boolean
*/
private $_exludeComments = true;
/**
* Defines whether to exclude empty lines from the method loc.
*
* @var boolean
*/
private $_excludeEmptyLines = true;

/**
* Returns the token types this sniff is interested in.
* For an overview of available tokens take a look at the
* PHP_CodeSniffer_Tokens class.
*
* @return array(int)
*/
public function register() {
return array(T_FUNCTION);
}
/**
* Processes the tokens that this sniff is interested in.
*
* @param PHP_CodeSniffer_File $phpcsFile The file where the token was found.
* @param int $stackPtr The position in the stack where
* the token was found.
*
* @return void
*/
public function process(PHP_CodeSniffer_File $phpcsFile, $stackPtr)
{
$tokens = $phpcsFile->getTokens();
$methodName = $phpcsFile->getDeclarationName($stackPtr);

$methodBeginScope = $tokens[$stackPtr]['scope_opener'] + 1;
$methodClosingScope = $tokens[$stackPtr]['scope_closer'];

// Get method body

$methodBody = $phpcsFile->getTokensAsString($methodBeginScope,
$methodClosingScope - $methodBeginScope);

$methodLines = explode("\n", $methodBody);

$funtionalMethodLines = array();

if ($this->_exludeComments) {
$commentPattern = '/^#.*' . // Hash comment
'|^\/\/.*' . // Opening comment
'|^\*.*' . // Continuing multiple line comment
'|^\*\/' . // Closing comment
'|^\/\*\*.*' . // Opening comment/doc
'|^\/\*.*\*\//i'; // Single line comment
}

// Handle each method line

foreach ($methodLines as $methodLine) {

$methodLine = str_replace(array(" ", "\t", "{"), '',
$methodLine);

if ($this->_exludeComments) {

preg_match($commentPattern, $methodLine, $matches);

if (count($matches) > 0) {
continue;
}
}
if ($this->_excludeEmptyLines) {
if (strlen($methodLine) > 1) {
$funtionalMethodLines[] = $methodLine;
}
} else {
$funtionalMethodLines[] = $methodLine;
}
}

$methodLoc = count($funtionalMethodLines);

// Decide if there is a sniff violation and raise error in case of

if ($methodLoc > $this->_maxMethodLoc) {
$message = "Method '{$methodName}' LOC exceeded the defined "
. "maximum of {$this->_maxMethodLoc} lines. Found "
. "{$methodLoc} LOC";
$phpcsFile->addError($message, $stackPtr);
}
}
}
?>
In case you want to twiddle around with these exploratory sniffs you have to put the sniff definitions in the appropriate directory i.e. PEAR/PHP/CodeSniffer/Standards/Example/Sniffs/CodeSize and create a 'coding standard' i.e. PHP_CodeSniffer_Standards_Example_ExampleCodingStandard for making the sniffs runnable and available to the inspection tool.

The closing console dump shows the two previous listed code size sniffs in action and the errors raised, after sniffing obvious refactoring needs in an inspected class.
triton:tmp stolt$ phpcs --standard=Example Test.php

FILE: /Volumes/USB DISK/tmp/Test.php
--------------------------------------------------------------------------------
FOUND 2 ERROR(S) AND 0 WARNING(S) AFFECTING 2 LINE(S)
--------------------------------------------------------------------------------
2 | ERROR | Method count in class 'Test' exceeded the defined maximum of 10.
| | Found 12 methods
5 | ERROR | Method 'longMethod' LOC exceeded the defined maximum of 100 lines.
| | Found 120 LOC
--------------------------------------------------------------------------------

5 comments:

Topbit said...

That's a really good idea to add to phpcs and make wider coding standards. I use codesniffer myself to help tidy up my source (and more often, other peoples), though adding to the existing standards, without having to put files into the PEAR subdirectory hierarchy would be useful - though it does already look like the PHP_CodeSniffer_Standards_ \ CodingStandard::getIncludedSniffs()
function lets you easily enough reuse/extend other standards.

The closed-source website I'm writing doesn't need all the same docblock tags the PEAR requires for example - being able to do away with them would make me happy - and have phpcs happy - which makes me happy.

Anonymous said...

May I advertise? :-)

I wrote a Vim function yesterday to call PHPCS and parse the output into the quickfix window:

get it

Topbit said...

Aha, it took some digging through the code for the right paths and classnames, but it is possible (http://pear.php.net/bugs/bug.php?id=11886) to give a full path to your overriding/extending standards - details http://www.topbit.co.uk/8-A-useful-idea-for-helping-to-enforce-PHP-code-standards.html

Anonymous said...

If you want to get some major tool envy going on, check out Simian. Using the PHP_CS tool, you could probably get similar results, but alas I don't have to the time to try something like that out. :-(

Raphael Stolt said...

Yeah stumbled over it while reading the CI book and sadly it doesn't support PHP. I think, in that whole inspection/analyzing tool area there 's a lot off ground to cover, PHP wise.