C++ Split Interface Refactoring with DMS

Introduction

Software bloat is an unavoidable consequence of long lived legacy applications. Years or even decades of software enhancements creates large and complex systems that are hard to understand and improve. The effect of the bloated application is escalating maintenance and enhancement cycle times and difficulty in bringing new resources up to speed.

To combat software bloat more and more organizations are turning to automated software refactoring to reduce complexity and reinstate corporate software standards. Automated tools enable software refactorings to be applied at scale at a fraction of the cost and time of making the changes via manual recoding. Semantic Designs DMS Toolkit enables us to deliver specific software refactorings with just a few man weeks of effort. Through DMS' automated refactoring capabilities, our customers are able to realize substantial return on their investment within the first year. This whitepaper highlights once such example for API changes.

API Refactoring Overview

An abstract interface defines a set of (related) virtual methods and types that an object of a class that has that interface provides.

A common way to implement such interfaces is to create a delegating class that contains a pointer to the implementation class and redirects all its method calls to the implementation class object. This approach results in more stable library ABIs as changes to the implementation class do not affect the abstract interface; clients of the abstract interface only need to be recompiled if the abstract interface changes.

Despite that, in long lived software such interfaces sometimes become bloated with two many operations that are only loosely related. These "god" interfaces need to be refactored, i.e. split into smaller interfaces, each containing a set of closely related functionality.

Large systems can benefit from automation of the refactoring task, i.e. from automatically creating and using the new interfaces across the whole code base, based on a provided specification of how to distribute the methods and types of the bloated legacy interface into new lean and mean interfaces.

The DMS Software Re-engineering Toolkit (DMS) with its C++ front-end provides the core facilities necessary for automating this task. In the following we sketch how to implement such a refactoring tool with DMS.

(A customer-specific variation of) this tool has been applied successfully to a large code base consisting of over 10,000 translation units.

Specification

The methods and types from the legacy interface need to be distributed over a set of new interfaces. Thus, we need to designate a set of new interfaces, and for each of the new interfaces, a list of methods from the legacy interface that are supposed to be placed into the respective new interface. Except where overloads of the legacy interface need to be placed into different new interfaces, such methods can be designated by using only their name.

Example: Let us consider a legacy pure abstract interface for managing employees in a company with all functionality currently being provided directly by one large interface:

   class string;
   class Employee;
   class IEmployeeManagement {
     public:
       /* basic employment management */
       typedef unsigned int ID;
       virtual ID Hire(Employee &employee) = 0;
       virtual ID Lookup(const Employee &employee) = 0;
       enum Reason { Resigned, Retired, Fired };
       virtual void Terminate(const ID &employeeID, const Reason &reason) = 0;
       /* personal information management */
       class Address { string street, city; };
       virtual void AddAddress(const ID &employeeID, const Address &address) = 0;
       virtual void RemoveAddress(const ID &employeeID, const Address &address) = 0;
       class Number { string number; };
       virtual void AddHomePhone(const ID &employeeID, const Number &number) = 0;
       virtual void AddCellPhone(const ID &employeeID, const Number &number) = 0;
       virtual void RemovePhone(const ID &employeeID, const Number &number) = 0;
       /* salary management */
       class Amount { double amount; };
       class Percentage { double percentage; };
       virtual void SetPay(const ID &employeeID, const Amount amount) = 0;
       virtual Amount GiveRaise(const ID &employeeID, const Amount amount) = 0;
       virtual Amount GiveRaise(const ID &employeeID, const Percentage percentage) = 0;
       virtual void GiveBonus(const ID &employeeID, const Amount amount) = 0;
   };

This legacy interface should be split up into multiple interfaces:

  • A base interface for employee management, which is also intended to be uses as the base interface from which the other (sub-)interfaces can be queried,
  • A (sub-)interface for personal information management, and
  • A (sub-)interface for salary management.
Split Interface

These new interfaces can be specified e.g. as follows (using a pseudo-C++ class declaration syntax):

  • Specification of new base interface for employee management:
      class IEmployeeManagement {
        public:
          typedef auto ID;
          auto Hire, Lookup;
          typedef auto Reason;
          auto Terminate;
       };
  • Specification of new queried interface for personal information management:
      class IPersonalInfoManagement {
        public:
          typedef auto Address, Number;
          auto AddAddress, RemoveAddress, AddHomePhone, AddCellPhone, RemovePhone;
       };
  • Specification of new queried interface for salary management:
      class ISalaryManagement {
        public:
          typedef auto Amount, Percentage;
          auto SetPay, GiveRaise, GiveRaise, GiveBonus;
       };

Note that all variants of the overloaded method GiveRaise are specified to go into the same new interface; as such, no type information is needed to distinguish different occurrences of the method.

Generation of New Interfaces

The specification of the new interfaces does not provide the signatures of the methods nor the definitions of the related types defined in the legacy interface. However, all this information is present in the legacy interface, and can therefore be derived from it. Generate Interfaces

Thus, in order to create the new interfaces, we use DMS and its C++ front-end to parse the legacy interface, and perform name and type resolution, creating a syntax tree with associated static semantic information stored in a symbol table. From this internal representation we can collect the type definitions and method signatures as declared in the legacy interface.

We then parse the specification for the new interfaces, and check whether the methods and types of the legacy interface are distributed properly across the desired new interfaces.

The new interfaces are created by applying source-to-source transforms to their respective specification, using some auxiliary functions to access information from the legacy interface and the specification.

Transformation activities include:

  • Normalizing the specification such that each method or type uses a separate declaration.
    Using the rule specification language (RSL) for source-to-source transforms in DMS, such rules may be specified as follows:
      rule SimplifyMultipleAutoMethodDeclarators43(
             mm1:member_specification,
             ml1:member_declarator_list,
             id:IDENTIFIER,
             mm2:member_specification
           ):member_specification->member_specification
        = "\mm1\:pp_member_specification
           auto \ml1, \id;
           \mm2"
       -> "\mm1\:pp_member_specification
           auto \ml1;
           auto \id;
           \mm2".
  • Injecting the method signatures, reusing the corresponding signatures from the legacy interface:
      external pattern GetAutoMethodDeclaration(
                         id:IDENTIFIER
                       ):member_declaration
        = 'Refactoring/GetAutoMethodDeclaration'.
      rule ResolveAutoMethodDeclaration(
             id:IDENTIFIER
           ):member_declaration->member_declaration
        = "auto \id;"
       -> GetAutoMethodDeclaration(id).
    where GetAutoMethodDeclaration is an auxiliary function creating the syntax tree for the declaration of method id, using the signature from the legacy interface.
    Note that when (re-)using the signature from the legacy interface in a new interface, some types may require additional nested name specifiers as they may be in a different new interface. In a simple implementation, all member types from the legacy interface can be augmented by such nested name specifiers, and the types must actually be defined either in the same new interface they are referenced in, or in the new base interface. Further simplification can then be used to eliminate the nested name specifiers where they are not needed.
  • Injecting the type definitions, reusing the corresponding type definitions from the legacy interface:
      external pattern GetAutoTypeDeclaration(
                         id:IDENTIFIER
                       ):member_declaration
        = 'Refactoring/GetAutoTypeDeclaration'.
      rule ResolveAutoTypeDeclaration(
             id:IDENTIFIER
           ):member_declaration->member_declaration
        = "typedef auto \IdentifierOrSimpleTemplateId\(\id\);"
       -> GetAutoTypeDeclaration(id).
    where GetAutoTypeDeclaration is an auxiliary function creating the syntax trees for the type definition of type id, using the definition from the legacy interface.
    Similar to method signatures, symbols used in such type definitions may need to be adjusted.
  • Adding declarations to get the new queried interfaces from the new base interface, and to get the new base interface from the new queried interface:
      external condition IsNewBaseInterface(
                           id:IDENTIFIER
                         )
        = "Refactoring/IsNewBaseInterface'.
      external pattern GetNewQueriedInterfaceDeclarations(
                         src:member_specification
                       ):member_specification
        = 'Refactoring/GetNewQueriedInterfaceDeclarations'.
      rule AdjustNewBaseInterface2a(
             id:IDENTIFIER,
             mm:member_specification
           ):class_specifier->class_specifier
        = "class \id { \mm }"
       -> "class \id: {
             public:
               \GetNewQueriedInterfaceDeclarations\(\mm\)\:pp_member_specification
             private:
               \mm
           }"
       if IsNewBaseInterface(id).
    where IsNewBaseInterface is an auxiliary condition testing whether id represents the new base interface, and GetNewQueriedInterfaceDeclarations is an auxiliary function creating the syntax trees for the declarations of methods to query the new queried interfaces in the base interface.
  • Adding the dependencies needed to create a well-formed header file, i.e. (1) the declarations from the legacy interface header referenced in the new interface, (2) the include directives needed for other referenced symbols, and (3) forward declarations for the queried interfaces in the base interface header.
    A simple variant of this is to copy over all the includes and declarations from the legacy interface header file to the new base interface header file, and include the new base interface header file in the header files for the queried interfaces:
      external condition IsNewBaseInterfaceHeader(
                           tu:translation_unit
                         )
        = 'Refactoring/IsNewBaseInterfaceHeader'.
      external pattern GetLegacyInterfaceIncludesAndDeclarations(
                       ):declaration_seq
        = 'Refactoring/GetLegacyInterfaceIncludesAndDeclarations'.
      external pattern GetNewQueriedInterfaceForwardDeclarations(
                       ):declaration_seq
        = 'Refactoring/GetNewQueriedInterfaceForwardDeclarations'.
      rule AddBaseInterfaceDependencies1b(
             dd:declaration_seq,
             id:IDENTIFIER
           ):translation_unit->translation_unit
        = "\dd"
       -> "\GetLegacyInterfaceIncludesAndDeclarations\(\)
           \GetNewQueriedInterfaceForwardDeclarations\(\)\:pp_declaration_seq
           \dd\:pp_declaration_seq"
       if IsNewBaseInterfaceHeader("\dd").
    where IsNewBaseInterfaceHeader is an auxiliary condition testing whether the translation unit "\dd" represents the header file for the new base interface, GetLegacyInterfaceIncludesAndDeclarations is an auxiliary function fetching the includes and declarations (other than the legacy interface) from the legacy interface header, and GetNewQueriedInterfaceForwardDeclarations is an auxiliary function creating the syntax trees for forward declarations of the new queried interfaces.
To achieve the desired effect of generating the header files for the new interfaces, a significant number of additional rules is needed to deal with all possible cases.

The source-to-source rewrites can then be bundled into a rule set and applied to the specification:

  public ruleset CreateNewInterface
    = { SimplifyMultipleAutoMethodDeclarators43,
        AdjustNewBaseInterface2a,
        ResolveAutoMethodDeclaration,
        ResolveAutoTypeDeclaration,
        AddBaseInterfaceDependencies1b,
        ...
      }.

The left-hand-sides and right-hand-sides of the rules above are specified using the native syntax of C++ (with some additional embedded meta-syntax). DMS processes these rules and creates internal representations that can be applied as term rewrites to actual C++ syntax tree terms resulting from parsing C++ source code.

Applying this rule set to the specification results in syntax trees for the following header files:

  • Header file for new base interface:
      class string;
      class Employee;
      class IPersonalInfoManagement;
      class ISalaryManagement;
      class IEmployeeManagement {
        public:
          virtual IPersonalInfoManagement
                  *GetInterface(const IPersonalInfoManagement *) const = 0;
          virtual ISalaryManagement
                  *GetInterface(const ISalaryManagement *) const = 0;
        public:
          /* basic employment management */
          typedef unsigned int ID;
          virtual IEmployeeManagement::ID Hire(Employee &employee) = 0;
          virtual IEmployeeManagement::ID Lookup(const Employee &employee) = 0;
          enum Reason { Resigned, Retired, Fired };
          virtual void Terminate(const IEmployeeManagement::ID &employeeID,
                                 const Reason &reason) = 0;
      };
  • Header file for new queried interface for personal information management:
      #include "IEmployeeManagement.h"
      class IPersonalInfoManagement {
        public:
          virtual IEmployeeManagement
                  *GetInterface(const IEmployeeManagement *) const = 0;
        public:
          /* personal information management */
          class Address { string street, city; };
          class Number { string number; };
          virtual void AddAddress(const IEmployeeManagement::ID &employeeID,
                                  const IPersonalInfoManagement::Address &address) = 0;
          virtual void RemoveAddress(const IEmployeeManagement::ID &employeeID,
                                     const IPersonalInfoManagement::Address &address) = 0;
          virtual void AddHomePhone(const IEmployeeManagement::ID &employeeID,
                                    const IPersonalInfoManagement::Number &number) = 0;
          virtual void AddCellPhone(const IEmployeeManagement::ID &employeeID,
                                    const IPersonalInfoManagement::Number &number) = 0;
          virtual void RemovePhone(const IEmployeeManagement::ID &employeeID,
                                   const IPersonalInfoManagement::Number &number) = 0;
      };
  • Header file for new queried interface for salary management:
      #include "IEmployeeManagement.h"
      class ISalaryManagement {
        public:
          virtual IEmployeeManagement
                  *GetInterface(const IEmployeeManagement *) const = 0;
        public:
          /* salary management */
          class Amount { double amount; };
          class Percentage { double percentage; };
          virtual void SetPay(const IEmployeeManagement::ID &employeeID,
                              const ISalaryManagement::Amount amount) = 0;
          virtual ISalaryManagement::Amount
                  GiveRaise(const IEmployeeManagement::ID &employeeID,
                            const ISalaryManagement::Amount amount) = 0;
          virtual ISalaryManagement::Amount
                  GiveRaise(const IEmployeeManagement::ID &employeeID,
                            const ISalaryManagement::Percentage percentage) = 0;
          virtual void GiveBonus(const IEmployeeManagement::ID &employeeID,
                                 const ISalaryManagement::Amount amount) = 0;
      };

The actual header files are created from the internal syntax tree representation by using the formatter for C++ that is part of DMS' C++ support.

Generation of Delegates

The new interfaces can be implemented using a pointer-to-implementation (pImpl) idiom, with the pointed-to object providing the actual implementation. To allow for easy modification of the legacy implementation classes, the delegating class can be described using a template.

Generate Delegates

This template can be derived from the new interface headers by using source-to-source rewrites that:

  • Change the interface class to a template class inheriting from the interface class, and add the delegate constructor and member:
      rule CreateTemplateDeclaration2a(
             id:IDENTIFIER,
             mm:member_specification
           ):class_specifier->template_declaration
        = "class \id { \mm }"
       -> "template<typename T = \id>
           class \ConcatenateIDENTIFIERs\(Delegated\,\id\): public \id {
             public:
               \ConcatenateIDENTIFIERs\(Delegated\,\id\)(T *delegate)
                 : m_delegate(delegate) { }
             private:
               virtual T *\GetNewDelegateMethodName\(\)() const {
                 return m_delegate;
               }
             public:
               \mm\:pp_member_specification
             private:
               T *m_delegate;
           };".
  • Rewrite method declarations to delegating method definitions:
      private pattern NameParametersInDeclarator(
                        d:declarator,
                        pdc:parameter_declaration_clause,
                        i:INT_LITERAL
                      ):declarator
        = tag.
      private pattern CreateArgumentList(
                        pdc:parameter_declaration_clause,
                        i:INT_LITERAL
                      ):initializer_list = tag.
      ...
      private rule CreateOrdinaryDelegate2(
                     dss:basic_decl_specifier_seq,
                     d:declarator
                   ):member_declaration->member_declaration
        = "virtual \dss\:decl_specifier \d = 0;"
       -> "\dss \NameParametersInDeclarator\(
                   \d\,
                   \GetParametersOfDeclarator\(\d\)\,
                   \CountParameterList\(\GetParametersOfDeclarator\(\d\)\,0\)
                 \) override {
             return \GetNewDelegateMethodName\(\)()
                      ->\GetIdentifierOfDeclarator\(\d\)(
                          \CreateArgumentList\(
                            \GetParametersOfDeclarator\(\d\)\,
                            \CountParameterList\(\GetParametersOfDeclarator\(\d\)\,0\)
                          \)
                        );
           }".
    where among others NameParametersInDeclarator creates an auxiliary node indicating that all anonymous parameters in the parameter declaration clause must be named and CreateArgumentList creates an auxiliary node indicating that an argument list is to be extracted naming the parameters in the parameter declaration clause. Additional rules are used to implement the intent indicated by these auxiliary nodes.
  • Eliminate type definitions:
      condition BasicDeclSpecifierSeqContainsTypedef(dss:basic_decl_specifier_seq)
        =     dss == "\:basic_decl_specifier_seq typedef"
           \/ [dss2:basic_decl_specifier_seq.
                 dss <: "\:basic_decl_specifier_seq typedef \dss2\:decl_specifier"]
           \/ [dss1:basic_decl_specifier_seq.
                 dss <: "\:basic_decl_specifier_seq \dss1 typedef"]
           \/ [dss1:basic_decl_specifier_seq,dss2:basic_decl_specifier_seq.
                 dss <: "\:basic_decl_specifier_seq \dss1 typedef \dss2\:decl_specifier"].
      condition DeclSpecifierSeqContainsTypedef(dss:decl_specifier_seq)
        =    [bdss:basic_decl_specifier_seq.
                dss <: "\:decl_specifier_seq \bdss"
              | BasicDeclSpecifierSeqContainsTypedef(bdss)]
          \/ [bdss:basic_decl_specifier_seq,a:attribute_specifier_seq.
                dss <: "\:decl_specifier_seq \bdss \a"
              | BasicDeclSpecifierSeqContainsTypedef(bdss)].
      rule EliminateType4d(
             mm1:member_specification,
             dss:decl_specifier_seq,
             mdl:member_declarator_list,
             mm2:member_specification
          ):member_specification->member_specification
        = "\mm1\:pp_member_specification
           \dss \mdl;
           \mm2"
       -> "\mm1\:pp_member_specification
           \mm2"
       if DeclSpecifierSeqContainsTypedef(dss).
      rule EliminateType4t(
             mm1:member_specification,
             ts:type_specifier,
             mm2:member_specification
          ):member_specification->member_specification
        = "\mm1\:pp_member_specification
           \ts;
           \mm2"
       -> "\mm1\:pp_member_specification
           \mm2".

These rules can be bundled into a ruleset and applied to the new interfaces created in the previous step.

For the new base interface this results in the following delegate template class:

  template<typename T = IEmployeeManagement>
      class DelegatedIEmployeeManagement : public IEmployeeManagement {
        public:
          DelegatedIEmployeeManagement(T *delegate): m_delegate(delegate) {
          }
        private:
          virtual T *GetEmployeeManagementDelegate() const {
            return m_delegate;
          };
        public:
          PersonalInfoManagement
          *GetInterface(const PersonalInfoManagement *type) const override {
            return GetEmployeeManagementDelegate()->GetInterface(type);
          }
          ISalaryManagement
          *GetInterface(const ISalaryManagement *type) const override {
            return GetEmployeeManagementDelegate()->GetInterface(type);
          }
        public:
          IEmployeeManagement::ID Hire(Employee &employee) override {
            return GetEmployeeManagementDelegate()->Hire(employee);
          }
          IEmployeeManagement::ID Lookup(const Employee &employee) override {
            return GetEmployeeManagementDelegate()->Lookup(employee);
          }
          void Terminate(const IEmployeeManagement::ID &employeeID,
                         const IEmployeeManagement::Reason &reason) override {
            GetEmployeeManagementDelegate()->Terminate(employeeID, reason);
          }
        private:
          T *const m_delegate;
      };

For the new queried interfaces, the resulting delegate template classes look similar.

Injection of New Interfaces

If the legacy interface has been realized by a delegate similar to the delegate templates for the new base and queried interfaces, there is an actual implementation class realizing the functionality, which may look like the following:

  /* implementation class header "EmployeeManagement.h" */
  #include <IEmployeeManagement.h>
  #include <DelegatedIEmployeeManagement.h>
  class EmployeeManagement {
    public:
      EmployeeManagement();
      /* get interface */
      IEmployeeManagement *GetInterface(const IEmployeeManagement *) const;
      /* basic employment management */
      IEmployeeManagement::ID Hire(Employee &employee);
      IEmployeeManagement::ID Lookup(const Employee &employee);
      void Terminate(const IEmployeeManagement::ID &employeeID, 
                     const IEmployeeManagement::Reason &reason);
      /* personal information management */
      void AddAddress(const IEmployeeManagement::ID &employeeID, 
                      const IEmployeeManagement::Address &address);
      void RemoveAddress(const IEmployeeManagement::ID &employeeID, 
                         const IEmployeeManagement::Address &address);
      void AddHomePhone(const IEmployeeManagement::ID &employeeID, 
                        const IEmployeeManagement::Number &number);
      void AddCellPhone(const IEmployeeManagement::ID &employeeID, 
                        const IEmployeeManagement::Number &number);
      void RemovePhone(const IEmployeeManagement::ID &employeeID, 
                       const IEmployeeManagement::Number &number);
      /* salary management */
      void SetPay(const IEmployeeManagement::ID &employeeID, 
                  const IEmployeeManagement::Amount amount);
      IEmployeeManagement::Amount 
      GiveRaise(const IEmployeeManagement::ID &employeeID, 
                const IEmployeeManagement::Amount amount);
      IEmployeeManagement::Amount 
      GiveRaise(const IEmployeeManagement::ID &employeeID, 
                const IEmployeeManagement::Percentage percentage);
      void GiveBonus(const IEmployeeManagement::ID &employeeID,
                     const IEmployeeManagement::Amount amount);
    private:
      std::unique_ptr<DelegatedIEmployeeManagement<EmployeeManagement>> 
      m_IEmployeeManagement;
  };

  /* implementation class translation unit "EmployeeManagement.cpp" */
  #include <EmployeeManagement.h>
  EmployeeManagement::EmployeeManagement()
    : m_IEmployeeManagement(
        std::make_unique<DelegatedIEmployeeManagement<EmployeeManagement>>(*this)
      ) {
  }
  ...
  IEmployeeManagement 
  *EmployeeManagement::GetInterface(const IEmployeeManagement *) const {
    return m_IEmployeeManagement.get();
  }

This implementation class needs to be modified, replacing the legacy interface by the new base and queried interfaces generated above, such that the modified implementation class provides access to the new interfaces as needed. Inject Interfaces

The transformations to be applied include:

  • Replacement of the access paths for types defined in the legacy interface based on their new location in the new interfaces:
      external condition IsMemberTypeOfLegacyInterface(
                           t:trailing_type_specifier
                         )
        = 'Refactoring/IsMemberTypeOfLegacyInterface'.
      external pattern GetMemberTypeOfNewInterfaceForMemberTypeOfLegacyInterface(
                         t:trailing_type_specifier
                       ):trailing_type_specifier
        = 'Refactoring/GetMemberTypeOfNewInterfaceForMemberTypeOfLegacyInterface'.
      rule AdjustBaseInterfaceMemberType1(
             t:simple_type_specifier
           ):trailing_type_specifier->trailing_type_specifier
        = "\t" 
       -> GetMemberTypeOfNewInterfaceForMemberTypeOfLegacyInterface("\t") 
       if IsMemberTypeOfLegacyInterface("\t").
    where IsMemberTypeOfLegacyInterface is an auxiliary condition testing whether a type specifier represents a member type from the legacy interface, and GetMemberTypeOfNewInterfaceForMemberTypeOfLegacyInterface is an auxiliary function creating a syntax tree for the corresponding member type from the proper new base or queried interface.
  • Addition of GetInterface method declarations and interface providing data members for the new queried interfaces:
      rule AddInterface1(
    	 ch:class_head,
    	 ms:member_specification
           ):class_specifier->class_specifier
        = "\ch { \ms }" 
       -> "\ch {
             \ms 
    	 public: 
    	   \GetNewQueriedGetInterfaceMethodDeclarations\(\) 
    	 private: 
    	   \GetQueriedGetInterfaceFields\(\) 
           }" 
        if IsImplementationClass("\ch { \ms }").
    where GetNewQueriedGetInterfaceMethodDeclarations is an auxiliary function creating the syntax trees for the GetInterface methods for the new queried interfaces, and GetQueriedGetInterfaceFields is an auxiliary function creating the syntax trees for the interface providing data members for the new queried interfaces.
  • Initialization of the interface providing data members for the new queried interfaces. This requires extending the member initializer list of user-defined constructors:
      external condition IsImplementationClassConstructor(
    	               fd:function_definition
    		     )
        = 'Refactoring/IsImplementationClassConstructor'.
      external pattern GetNewBaseAndQueriedGetInterfaceFields(
    	           ):member_specification
        = 'Refactoring/GetNewBaseAndQueriedGetInterfaceFields'.
      rule InitializeInterface1a(
    	 id:id_expression,
    	 pdc:parameter_declaration_clause,
    	 il:mem_initializer_list,
    	 fb:function_body
           ):function_definition->function_definition
        = "\id(\pdc) :\il \fb" 
       -> "\id(\pdc) :\il, \GetNewQueriedGetInterfaceFieldInitializers\(\) \fb" 
       if IsImplementationClassConstructor("\id(\pdc) :\il \fb").
    where IsImplementationClassConstructor is an auxiliary condition testing whether the member specification is a user-defined constructor for the implementation class, and GetNewQueriedGetInterfaceFieldInitializers is an auxiliary function creating the syntax trees for the member initializers of the interface providing data members for the new queried interfaces.
  • Addition of include directives for the new interfaces and delegates, and addition of GetInterface method definitions for the new queried interfaces:
      rule AddImplementationDetails1a(
    	 dd:declaration_seq
           ):translation_unit->translation_unit
        = "\dd" 
       -> "\GetNewQueriedInterfaceHeaders\(\)\:pp_declaration_seq 
           \GetNewQueriedDelegatedHeaders\(\)\:pp_declaration_seq 
           \dd\:pp_declaration_seq 
           \GetNewQueriedGetInterfaceMethodDefinitions\(\)"
       if IsInterfaceImplementationTranslationUnit("\dd").
    where IsInterfaceImplementationTranslationUnit is an auxiliary condition testing whether the translation unit is the translation unit for the implementation class, GetNewQueriedInterfaceHeaders creates the syntax trees for the include directives for the header files of the new queried interfaces, GetNewQueriedDelegateHeaders creates the syntax trees for the include directives for the header files of the delegate templates for the new queried interfaces, and GetNewQueriedGetInterfaceMethodDefinitions creates the syntax trees for the definitions of the GetInterface methods for the new queried interfaces in the implementation class.

Note: As the new base interface has the same name as the legacy interface,we do not have to rename references to the legacy interface to references to the new base interface.

Applying these transformations to the implementation class results in the following code:

  /* implementation class header "EmployeeManagement.h" */
  #include <IEmployeeManagement.h>
  #include <PersonalInfoManagement.h>
  #include <ISalaryManagement.h>
  #include <DelegatedIEmployeeManagement.h>
  #include <DelegatedPersonalInfoManagement.h>
  #include <DelegatedISalaryManagement.h>
  class EmployeeManagement {
    public:
      EmployeeManagement();
      /* get interface */
      IEmployeeManagement *GetInterface(const IEmployeeManagement *) const;
      /* basic employment management */
      IEmployeeManagement::ID Hire(Employee &employee);
      IEmployeeManagement::ID Lookup(const Employee &employee);
      void Terminate(const IEmployeeManagement::ID &employeeID, 
                     const IEmployeeManagement::Reason &reason);
      /* personal information management */
      void AddAddress(const IEmployeeManagement::ID &employeeID, 
                      const PersonalInfoManagement::Address&address);
      void RemoveAddress(const IEmployeeManagement::ID &employeeID, 
                         const PersonalInfoManagement::Address&address);
      void AddHomePhone(const IEmployeeManagement::ID &employeeID, 
                        const PersonalInfoManagement::Number&number);
      void AddCellPhone(const IEmployeeManagement::ID &employeeID, 
                        const PersonalInfoManagement::Number&number);
      void RemovePhone(const IEmployeeManagement::ID &employeeID, 
                       const PersonalInfoManagement::Number&number);
      /* salary management */
      void SetPay(const IEmployeeManagement::ID &employeeID, 
                  const ISalaryManagement::Amount amount);
      ISalaryManagement::Amount
      GiveRaise(const IEmployeeManagement::ID &employeeID, 
                const ISalaryManagement::Amount amount);
      ISalaryManagement::Amount
      GiveRaise(const IEmployeeManagement::ID &employeeID, 
                const ISalaryManagement::Percentage percentage);
      void GiveBonus(const IEmployeeManagement::ID &employeeID, 
                     const ISalaryManagement::Amount amount);
    private:
      std::unique_ptr<DelegatedIEmployeeManagement<EmployeeManagement>> 
      m_IEmployeeManagement;
    public:
      PersonalInfoManagement *GetInterface(const PersonalInfoManagement *type) const;
      ISalaryManagement *GetInterface(const ISalaryManagement *type) const;
      private:
      std::unique_ptr<DelegatedPersonalInfoManagement<EmployeeManagement>> 
      m_IPersonalInformationManagement;
      std::unique_ptr<DelegatedISalaryManagement<EmployeeManagement>> 
      m_ISalaryManagement;
  };

  /* implementation class translation unit EmployeeManagement.cpp */
  #include <EmployeeManagement.h>
  EmployeeManagement::EmployeeManagement()
    : m_IEmployeeManagement(
        std::make_unique<DelegatedIEmployeeManagement<EmployeeManagement>>(*this)
      )
    , m_IPersonalInformationManagement(
        std::make_unique<DelegatedPersonalInfoManagement<EmployeeManagement>>(*this)
      )
    , m_ISalaryManagement(
        std::make_unique<DelegatedISalaryManagement<EmployeeManagement>>(*this)
      ) {
  }
  ...
  IEmployeeManagement 
  *EmployeeManagement::GetInterface(const IEmployeeManagement *) const {
    return m_IEmployeeManagement.get();
  }
  PersonalInfoManagement 
  *EmployeeManagement::GetInterface(const PersonalInfoManagement *type) const {
    return m_IPersonalInformationManagement.get();
  }
  ISalaryManagement 
  *EmployeeManagement::GetInterface(const ISalaryManagement *type) const {
    return m_ISalaryManagement.get();
  }

Replacement of Legacy Interface Calls

When the legacy interface is split up into multiple new interfaces:

  • The access paths of the types defined in the legacy interface need to be changed to the corresponding access paths of the respective type in the new base and queried interfaces.
  • Any method call in the code base using the legacy interface may need to be changed, by looking up the new queried interface if necessary and calling the designated method in the new interface. In order to modify these calls, we will assume that a generic implementation of a GetInterface template to lookup a requested interface is defined as follows:
      template<typename T, typename S>
      T GetInterface(const S *p) {
        return p->GetInterface((const T)nullptr);
      }

Example: A function to raise an employee's salary in a department based on individual performance and current inflation may have been defined as follows:

  void RaiseSalary(
         const Department *department, 
	 const Employee &employee, 
	 const IEmployeeManagement::Amount amount, 
	 const IEmployeeManagement::Percentage inflation
       ) {
    IEmployeeManagement *em;
    if (department->HasSeparateEmployeeManagement())
      em = department->GetEmployeeManagement();
    else
      em = GetCoroporateEmployeeManagement();
    const IEmployeeManagement::ID employeeID = em->Lookup(employee);
    // individual performance-based adjustment
    em->GiveRaise(employeeID,amount); 
    // general inflation adjustment
    if (department->AutomaticRaises())
      em->GiveRaise(employeeID,inflation); 
  }

Due to the IEmployeeManagement legacy interface being split, any reference to a type definition in IEmployeeManagement and any method call via the pointer em to methods in IEmployeeManagement may need to be adjusted. In the following we will focus on adjusting the method calls.

Replace Calls

The most straightforward way to modify any method call is to wrap the interface designating expression in the method designating expression with the necessary GetInterface template call:

  external condition IsMethodOfLegacyInterfaceMovedToNewQueriedInterface(
                       m:id_expression
		     )
    = 'Refactoring/IsMethodOfLegacyInterfaceMovedToNewQueriedInterface'.
  external pattern CreateQueriedInterfaceWrapperForMethodDesignator(
                     pe:postfix_expression,
		     m:id_expression
		   ):postfix_expression
    = 'Refactoring/CreateQueriedInterfaceWrapperForMethodDesignator'.
  private rule AdjustMethodDesignator3o(
                 pe:postfix_expression,
		 m:id_expression
	       ):postfix_expression->postfix_expression
    = "\pe->\m" 
  -> "\CreateQueriedInterfaceWrapperForMethodDesignator\(\pe\,\m\)->\m"
  if IsMethodOfLegacyInterfaceMovedToNewQueriedInterface(m).
where IsMethodOfLegacyInterfaceMovedToNewQueriedInterface is an auxiliary condition testing whether the designated method is a method moved from the legacy interface to a new queried interface, and CreateQueriedInterfaceWrapperForMethodDesignator is an auxiliary function creating the syntax tree for the interface designating expression with the GetInterface template call to get the proper interface.

As the infix operator -> may be user-defined, the type of the object designated by the postfix expression in \pe->\m may not be the legacy interface, but instead may be a proxy object that, after a chain of operator-> calls, provides the legacy interface. As such, the auxiliary function CreateQueriedInterfaceWrapperForMethodDesignator may need to add such operator calls before adding the GetInterface template call. The implementation of the function uses facilities from DMS' C++ front-end and the following additional source patterns to construct the proper syntax tree:

  public pattern CallMemberAccessOperator(
                   pe:postfix_expression
	         ):postfix_expression
    = "\pe.operator->()".
  public pattern GetPointerToInterface(
                   ts:trailing_type_specifier,
		   pe:postfix_expression
		 ):postfix_expression
    = "\:[postfix_expression = postfix_expression '(' expression_list ')'] 
       GetInterface<\ts *>(\pe)".

Applying this rule and other type adjusting rules the results in the following refactored code for the RaiseSalary function:

  void RaiseSalary(
         const Department *department, 
	 const Employee &employee, 
	 const ISalaryManagement::Amount amount, 
	 const ISalaryManagement::Percentage inflation
       ) {
    IEmployeeManagement *em;
    if (department->HasSeparateEmployeeManagement())
      em = department->GetEmployeeManagement();
    else
      em = GetCoroporateEmployeeManagement();
    const IEmployeeManagement::ID employeeID = em->Lookup(employee);
    // individual performance-based adjustment
    GetInterface<ISalaryManagement *>(em)->GiveRaise(employeeID,amount); 
    // general inflation adjustment
    if (department->AutomaticRaises())
      GetInterface<ISalaryManagement *>(em)->GiveRaise(employeeID,inflation); 
  }

In the refactored code above, there are multiple calls to the GetInterface template for the same queried interface ISalaryManagement and the same pointer to the base interface em, which are caused by our transformation of method calls to locally adjust the interface designating expression.

A more sophisticated transform could recognize this situation and instead create a local variable for the queried interface, update that variable whenever the corresponding variable for the base interface object is assigned a new value for which later a method for the new queried interface is to be called, and use the local variable for the queried interface instead of looking up the queried interface on-the-fly.

In order to determine the relevant definitions feeding a use of the legacy interface in a method call, in general, we need to trace data flow backwards from the use to all possible definitions providing the values for the use. The DMS' C++ front-end has support for constructing fine-grained control flow graphs (with control flow nodes linking back to syntax tree fragments) and performing flow analyses over these; it also provides an implementation for computing use-definition chains over these.

Using the control flow graph and the use-definition chains we can determine for each method call via a legacy interface designating variable where the variable has been updated; and using the symbol table we can determine for each of these variables where it is declared. With this information, we can:

  • Add a variable declaration for the new queried interface directly after the variable declaration for the legacy interface:
      rule AddQueriedInterfaceDeclarations4(
             ss1:statement_seq,
             dss:decl_specifier_seq,
             d:init_declarator,ss2:statement_seq
           ):statement_seq->statement_seq
        = "\ss1 
           \dss \d; 
           \ss2\:pp_statement_seq" 
       -> "\ss1 
           \dss \d; 
           \CreateQueriedInterfaceDeclarationsRequiredByDeclaration\(
             \GetIdentifierOfDeclarator\(\d\)
           \)\:pp_statement_seq 
           \ss2\:pp_statement_seq" 
       if DeclarationRequiresQueriedInterfaceDeclarations(GetIdentifierOfDeclarator(d)).
    Note that the auxiliary function CreateQueriedInterfaceDeclarationsRequiredByDeclaration may create syntax trees for multiple declarations to allow for multiple queried interfaces being used.
  • Update the variable for the new queried interface directly after the variable for the legacy interface has been updated:
      rule AddQueriedInterfaceAssignments4(
             ss1:statement_seq,
    	 id:IDENTIFIER,
    	 ic:initializer_clause,
    	 ss2:statement_seq
           ):statement_seq->statement_seq
        = "\ss1 
           \id = \ic; 
           \ss2\:pp_statement_seq" 
       -> "\ss1 
           \id = \ic; 
           \CreateQueriedInterfaceAssignmentStatementsRequiredByAssignment\(
             \id
           \)\:pp_statement_seq
           \ss2\:pp_statement_seq" 
       if AssignmentRequiresQueriedInterfaceAssignments(id).
    Note that the auxiliary function CreateQueriedInterfaceAssignmentStatementsRequiredByAssignment may create syntax trees for multiple assignments to allow for multiple queried interfaces being used.
  • Use the variable for the new queried interface instead of looking up the queried interface on-the-fly:
      rule AdjustMethodDesignator3v(
             id:IDENTIFIER,
    	 m:id_expression
           ):postfix_expression->postfix_expression
        = "\id->\m" 
       -> "\CreateQueriedInterfaceObjectForMethodDesignator\(\id\)->\m" 
       if MethodDesignatorRequiresQueriedInterfaceObject(id).

Assignments need to be added even if the occur as subsidiary statements of compound statements like if, for, while, etc. This can be done by applying normalization/simplification rules where necessary, i.e. where we want to add an assignment for a queried interface after an existing assignment. To minimize code changes we want to apply such rules only where necessary:

  condition IsCriticalStatement(
              s:statement
            )
    =    [id:IDENTIFIER,ic:initializer_clause. 
            GetNestedStatementOfLabeledStatement(s) <: "\:statement \id = \ic;" 
	  | AssignmentRequiresQueriedInterfaceAssignments(id)]
      \/ [a:attribute_specifier_seq,id:IDENTIFIER,ic:initializer_clause. 
            GetNestedStatementOfLabeledStatement(s) <: "\:statement \a \id = \ic;" 
	  | AssignmentRequiresQueriedInterfaceAssignments(id)].
  rule SimplifyIfThenElseStatement1t(
         c:condition,s1:statement,
	 s2:statement
       ):selection_statement->selection_statement
    = "if (\c) \s1 else \s2" 
   -> "if (\c) { \s1 } else \s2" 
   if IsCriticalStatement(s1).
  rule SimplifyIfThenElseStatement1e(
         c:condition,s1:statement,
	 s2:statement
       ):selection_statement->selection_statement
    = "if (\c) \s1 else \s2" 
  -> "if (\c) \s1 else { \s2 }" 
  if IsCriticalStatement(s2).

Similar rules are needed to deal with various occurrences of declarations for the legacy interface in if, for, while, etc., for which declarations for queried interfaces are needed.

Applying this rule and other type adjusting rules the results in the following refactored code for the RaiseSalary function:

  void RaiseSalary(
         const Department *department, 
	 const Employee &employee, 
	 const ISalaryManagement::Amount amount, 
	 const ISalaryManagement::Percentage inflation
       ) {
    IEmployeeManagement *em;
    ISalaryManagement *em_ISalaryManagement;
    if (department->HasSeparateEmployeeManagement()) {
      em = department->GetEmployeeManagement();
      em_ISalaryManagement = GetInterface<ISalaryManagement *>(em);
    } else {
      em = GetCoroporateEmployeeManagement();
      em_ISalaryManagement = GetInterface<ISalaryManagement *>(em);
    }
    const IEmployeeManagement::ID employeeID = em->Lookup(employee);
    // individual performance-based adjustment
    em_ISalaryManagement->GiveRaise(employeeID,amount); 
    // general inflation adjustment
    if (department->AutomaticRaises())
      em_ISalaryManagement->GiveRaise(employeeID,inflation); 
  }

Due to the non-ideal placement of updating the variables for the new queried interfaces immediately after updating the corresponding variables for the base interface, this code can be further optimized, e.g. by moving the common sub-statement from the two branches of the if to occur after the if statement.

Conclusion

Above we sketched how to split a large C++ interface into multiple smaller interfaces, and modify the whole code base to use the new smaller interfaces instead of the large legacy interface. Taking into account the intricacies of C++ and the variety of its language features requires a significant number of additional transformation rules with some auxiliary conditions and functions querying the syntax tree, name resolution results, flow analyses, and the interface specification. Our current implementation of the split interface refactoring consists of approximately 700 source-to-source rewrite rules, 30 auxiliary conditions, 50 auxiliary functions, and 70 source patterns used by these auxiliary functions to create syntax trees. Preceding the application of the rewrite rules is a planning step, precomputing some information that is needed by the auxiliary conditions and functions.

(A customer-specific variation of) the automated split interface refactoring has been applied successfully to a large code base consisting of over 10,000 translation units. The tool created the new interfaces and delegates, modified the underlying implementation class to provide the new interfaces, and adjusted the code base to use the new interfaces instead of the legacy interface without any further human involvement.

Some aspects important for applying an automated refactoring tool to production code have not been discussed above. These include:

  • Comments need to be preserved when applying transforms. In order to do so, we heuristically associate comments to language constructs. Rewrite rules will automatically preserve comments for the root of their left-hand-side. Other occurrences may require calling auxiliary functions to copy comments from occurrences in the matched subtree to suitable occurrences in the replacement subtree of the rewrite.
  • C++ code does use macros. When applying refactorings, the macros in general should be preserved. To allow for the use of unstructured macros, the C++ front-end of DMS expands macro calls during preprocessing but keeps track of sufficient information to regenerate these when producing output files from the internal syntax tree representation. Where macro expansions overlap with code changes, in general, the expansions need to be printed instead.
  • C++ code to be refactored may contain preprocessing conditionals for configuration management. The C++ front-end of DMS allows structured and frequently used unstructured preprocessing directives in many places in its syntax trees. Name resolution currently only occurs in a single configuration, thus any transformation applied will only be applied to branches of the current configuration of interest. If refactorings are needed in configuration-specific code, the tool may need to be applied multiple times for the configurations of interest, and the results be merged afterwards. We plan to add direct support for multi-configuration analysis in some future version of DMS.
  • In order to facilitate comparing the original code base and the refactored code base, the changes in source code should be minimized. This requires that the formatter can preserve the position of language elements to the utmost degree. DMS-based formatters come close to this ideal, and can print most language elements at their original position, resp. shift these consistently as needed after some intermediate code changes. The refactoring rules have been designed to only apply when a refactoring change is actually needed, e.g. curly brackets are only added to subsidiary statements of if, for, while, etc. statements, if needed to add assignments for variables for the new queried interfaces.

About Semantic Designs

Since 1995, Semantic Designs, Inc., (SD) a privately-held corporation headquartered in Austin, Texas, has provided world-class analysis, enhancement and transformation of large and complex software systems. Solutions include Tools, Bundles and Suites, Services for Enterprises and ISV's, and System Integrator partnering for dozens of computer languages, databases and schema.

SD's staffed by highly trained computer scientists with decades of experience focused on automated transformation tools that work with a variety of computer languages and environments. SD's success comes from a deep understanding of software engineering and compiler technology with decades of engineering investment in a scalable transformation platform called DMS®. Semantic Designs DMS Software Reengineering Toolkit and methodologies have been proven, supporting custom reengineering tasks since 1995. Our experience and technology can tackle the toughest application analysis and re-engineering problems not practical with conventional development teams or other vendors.

For more information: [email protected]    Follow us at Twitter: @SemanticDesigns

C++ Refactoring
Split Interface