Limiting Test Execution Based on Changed Files and Package Dependencies: Tools and Recommendations?

This message was imported from the Ruby/Rails Modularity Slack server. Find more info in the import thread.

Are there any tools to limit test execution based on changed files and package dependencies? (something like https://github.com/shageman/cobratest)

Message originally sent by slack user U71TN2WF04X

I remember @AlexEvanczuk posted a script that could do exactly that

@AlexEvanczuk Tell me more!

I can’t find it :sweat_smile: But basically the algorithm is:

  1. Get the packages (set A) for the changed files (there are APIs in Packs to do this)
  2. Use any graph traversal algorithm to get affected packages.
    a. Namely: Get the packages (set B) that depend on those packages. Those tests need to run. Get the things that depend on set B. Those tests need to run too. Continue recursively until all connected packages have been visited
  3. Run tests for all those packages.

Message originally sent by slack user U71TN2WF04X

Found it:

require 'parse_packwerk' # Gusto open sourced this to make it easier to query packwerk YML files <https://github.com/rubyatscale/parse_packwerk>
files_that_changed = ['packs/my_pack/path/to/file.rb', 'path/to/other/file.rb'] # getting this list will be custom to your setup
packs_that_changed = files_that_changed.map{|f| ParsePackwerk.package_from_path(f) }.uniq
packs_to_test = []
packs_to_test += packs_that_changed

class Traverser
  def initialize(visited)
    @visited = visited
    @incoming_dependency_list = {}
    @incoming_violation_list = {}
    Packs.all.each do |ancestor|
      p.dependencies.each do |descendant_name|
        descendant = Packs.find(descendant_name)
        @incoming_dependency_list[descendant] = ancestor
      end

      p.violations.select(&:dependency).map(&:to_package_name).uniq.each do |descendant_name|
        descendant = Packs.find(descendant_name)
        @incoming_violation_list[descendant] = ancestor
      end
    end
  end

  def get_ancestors(pack, visited: [])
    return [] if visited.include?(pack)
    @visited << pack
    ancestors = []

    @incoming_dependency_list[pack].each do |ancestor|
      ancestors << ancestor
    end

    if !draft_build
      @incoming_violation_list[pack].each do |ancestor|
        ancestors << ancestor
      end
    end

    ancestors
  end
end

traverser = Traverser.new

packs_that_changed.each do |pack|
  packs_to_test += traverser.get_ancestors(pack)
end

run_tests_for(packs_to_test) # this will also be custom to your setup

Message originally sent by slack user U7321VTD2MQ

I just wrote something to facilitate this using graphwerk before someone referred me to this thread.

class DependencyService
  ROOT_NODE_ID = 'Application'

  def initialize(pack_names:, exclude_todos: false)
    @pack_names = pack_names
    @graph = Graphwerk::Builders::Graph.new(
      Packwerk::PackageSet.load_all_from(Rails.root.to_s),
      options: { exclude_todos: }
    ).build
  end

  def all_dependents
    pack_names.flat_map { dependents(node(_1)) }.uniq
  end

  private

  attr_reader :graph, :pack_names

  def node(name)
    graph.find_node(name)
  end

  def dependents(node, visited_nodes = [])
    return [] if node.id == ROOT_NODE_ID || visited_nodes.include?(node.id)

    visited_nodes << node.id
    ([node.id] + node.incidents.map { dependents(_1, visited_nodes) }).flatten.uniq
  end
end

Message originally sent by slack user U7321VTD2MQ

I think I like parse_packwerk better though

Message originally sent by slack user U7321VTD2MQ

graphwerk still needs to be updated to work with the newer version of packwerk too, so I can ditch that fork by switching. Thanks!

<@U71TN2WF04X> @AlexEvanczuk Is this proof of concept, or is it in use somewhere? I’d like to talk with those teams that are using it, if possible.

Message originally sent by slack user U71TN2WF04X

At my company we are not using it.
We are still moving files around, defining boundaries and thinking on public API implementations.If we used it now, we would probably still test all of our code since it has so many violations

We used it at Gusto for a while. Eventually, we migrated to different implementation of the same idea that basically used code coverage (i.e. what tests execute what code) to carry the same logic.

I believe while we were using this we always ran all tests on the main branch.

Message originally sent by slack user U70TIGAX94P

I‘m curious - we had planned this at Shopify but never got to it because package isolation wasn’t good enough.

Plus we implemented a coverage based approach that pretty much eliminated the need, at least for performance purposes.

My assumption has always been that package based test selection only works well if packages are completely isolated (meaning no package-todo files and all dependencies enforced, at least).

Have you others done it on incompletely isolated packages?

So while at Gusto we did a combination:
• For draft PRs, we ran tests only based on acyclic dependencies
• For non-draft, we ran tests based on violations (because of the entanglement, this mostly meant all tests except for a handful of more isolated packs
• For the main branch, we ran all tests.
We later switched to do the same thing – a coverage based approach. Also mostly for performance reasons.

I’d generally agree that dependency graph based testing seems like it makes more sense when things are properly isolated