Exploring Data Visualization for Modularization Efforts: A Shareable Analysis

This message was imported from the Ruby/Rails Modularity Slack server. Find more info in the import thread.

Message originally sent by slack user U7213XMGS3H

Hey, I did some exploratory data visualization on our modularization effort and thought I’d share

Message originally sent by slack user U71268WT64J

This is pretty useful, I think. Do you have some code to generate it?

Thanks for sharing!

Message originally sent by slack user U71268WT64J

It reminds me of the “gapminder” graphs. It would be interesting to animate it over time in the same way they do

Message originally sent by slack user U7213XMGS3H

There are two pieces of code, a rake task to generate some json, and a vega-lite snippet… I’ll see if I’m allowed to share!

Message originally sent by slack user U7213XMGS3H

The packs.json file is mostly stuff you can compute with the parse_packwerk gem

Message originally sent by slack user U7213XMGS3H

OK, here’s some example code. I’ve modified it slightly to remove our domain specific stuff.

We have a rake task called artifacts:generate_packwerk_json

require "fileutils"

namespace :artifacts do
  task generate_packwerk_json: :environment do
    require_relative "./pack_stats"

    packs = ParsePackwerk.all
    PackStats.setup(packs: packs)

    fields = {
      name: proc { |pack|
        pack.name
      },
      url: proc { |pack|
        "<https://github.com/your_org/your_repo/tree/main/#{pack.name}>"
      },
      layer: proc { |pack|
        pack.config["layer"]
      },
      summary: proc { |pack|
        pack.metadata["summary"]
      },
      dependencies: proc { |pack|
        pack.dependencies
      },
      owner: proc { |pack|
        pack.metadata["owner"]
      },
      slack: proc { |pack|
        pack.metadata["slack_channels"]
      },
      private_file_count: proc { |pack|
        stats = PackStats.all[pack.name]
        stats.private_files.size
      },
      public_file_count: proc { |pack|
        stats = PackStats.all[pack.name]
        stats.public_files.size
      },
      public_files: proc { |pack|
        stats = PackStats.all[pack.name]
        stats.public_files
      },
      violations: proc { |pack|
        stats = PackStats.all[pack.name]
        {
          outbound: {
            total: stats.total_outbound,
            **stats.outbound,
          },
          inbound: {
            total: stats.total_inbound,
            **stats.inbound,
          },
        }
      },
    }

    results = packs.map { |pack|
      fields.transform_values { |proc|
        proc.call(pack)
      }
    }

    results = results.sort_by { |pack| pack[:name] }

    Rails.root.join("packs/packs.json").write(JSON.pretty_generate(results))
  end
end

Message originally sent by slack user U7213XMGS3H

Then we have a class it depends on called PackStats

class PackStats
  attr_reader :outbound
  attr_accessor :inbound

  class << self
    attr_accessor :all
  end

  def self.setup(packs:)
    self.all = packs.to_h { |pack|
      [pack.name, PackStats.new(pack: pack)]
    }
    self.all.each_value(&:collect_stats!)
  end

  def initialize(pack:)
    @pack = pack
    # inbound = violations in other packs accessing this pack
    @inbound = {}
    # outbound = violations in this pack in how it accesses others
    @outbound = {}
  end

  def package_todo
    @package_todo ||= ParsePackwerk::PackageTodo.for(@pack)
  end

  def total_inbound
    @inbound.values.sum
  end

  def total_outbound
    @outbound.values.sum
  end

  def private_files
    Dir.glob("#{@pack.name}/app/**/*.rb") - public_files
  end

  def public_files
    Dir.glob("#{@pack.name}/#{@pack.public_path}/**/*.rb")
  end

  def collect_stats!
    package_todo.violations.each { |violation|
      other_stats = PackStats.all[violation.to_package_name]

      @outbound[violation.type] = 0 if @outbound[violation.type].nil?
      @outbound[violation.type] += 1

      other_stats.inbound[violation.type] = 0 if other_stats.inbound[violation.type].nil?
      other_stats.inbound[violation.type] += 1
    }
  end
end

Message originally sent by slack user U7213XMGS3H

Finally we have a vega-lite json file that looks something like this

{
  "$schema": "<https://vega.github.io/schema/vega-lite/v5.json>",
  "mark": "circle",
  "width": 850,
  "height": 550,
  "transform": [
    {
      "calculate": "datum.public_files.length+datum.private_file_count",
      "as": "file_count"
    }
  ],
  "encoding": {
    "x": {
      "field": "violations.inbound.total",
      "type": "quantitative",
      "scale": { "type": "symlog" }
    },
    "y": {
      "field": "violations.outbound.total",
      "scale": { "type": "symlog" }
    },
    "tooltip": [
      { "field": "name", "type": "nominal" },
      { "field": "file_count", "type": "quantitative" },
      {
        "field": "public_files.length",
        "title": "public_files",
        "type": "quantitative"
      },
      {
        "field": "private_file_count",
        "title": "private_files",
        "type": "quantitative"
      },
      {
        "field": "violations.outbound.total",
        "title": "outbound total",
        "type": "quantitative"
      },
      {
        "field": "violations.outbound.dependency",
        "title": "outbound dependency",
        "type": "quantitative"
      },
      {
        "field": "violations.outbound.architecture",
        "title": "outbound architecture",
        "type": "quantitative"
      },
      {
        "field": "violations.outbound.privacy",
        "title": "outbound privacy",
        "type": "quantitative"
      },
      {
        "field": "violations.inbound.total",
        "title": "inbound total",
        "type": "quantitative"
      },
      {
        "field": "violations.inbound.dependency",
        "title": "inbound dependency",
        "type": "quantitative"
      },
      {
        "field": "violations.inbound.architecture",
        "title": "inbound architecture",
        "type": "quantitative"
      },
      {
        "field": "violations.inbound.privacy",
        "title": "inbound privacy",
        "type": "quantitative"
      }
    ],
    "color": { "field": "layer" },
    "size": {
      "field": "file_count",
      "type": "quantitative",
      "scale": { "type": "linear", "rangeMax": 10000 },
      "legend": null
    }
  },
  "data": { "values": [] }
}

You have to replace the data key with the actual data

Message originally sent by slack user U7213XMGS3H

<@U71268WT64J> lmk if you have any trouble with it

Message originally sent by slack user U70VMMV37TJ

a) I like your visualization and am noodling how to make it actionable: what’s the next step.

b) I noticed that your X axis and Y axis seem pretty symmetrical and wondered if we could combine to total number of violations and then put something else on the Y axis.

Your graph feels like mine, it’s much easier to get the < 20 packs to the origin. :smiley:

Message originally sent by slack user U7213XMGS3H

yeah i think there are a number of alternative graphs we could do… i was interested in particular violation types because we are concerned most with architectural, then dependency, and least with privacy at the current time

Message originally sent by slack user U71268WT64J

Thanks so much for spending the time <@U7213XMGS3H> - I’ll definitely give it a whirl

Message originally sent by slack user U7213XMGS3H

if you can, post your results! it’d be interesting to see how different folks packs look

Message originally sent by slack user U70TIGAX94P

Have you thought about normalizing violations? Something like violations per LOC, for example

IMO right now you’re essentially graphing “amount of work left” vs normalization (if it works) could display something more like “quality” or “modularization progress”.

Another interesting thing to graph may be churn. I’m assuming incoming dependency violations matter less on low-churn packages, while outgoing dependency violations may matter more on low churn packages.

Out going dependencies mean the package inherits instability; it may have to change when its dependencies change. That may be more difficult on a low-churn (hence rarely touched) package.

Reminds me of the concept of dependency-induced stability that Robert Martin has written about as the Stable Dependencies Principle. https://web.archive.org/web/20220121055640/butunclebob.com/ArticleS.UncleBob.PrinciplesOfOod

Message originally sent by slack user U70TIGAX94P

This is data that is (sadly) difficult to get from packwerk, but a comparison of “references within the package” vs “references across packages” can give you an actual network modularity metric that should tell you one or two things about coupling and cohesion.

Message originally sent by slack user U7213XMGS3H

that’s a great idea, i bet we could totally do that with this json format

Message originally sent by slack user U7213XMGS3H

Also if we published a standard json format we could build tooling around it other people could use

Message originally sent by slack user U7213XMGS3H

For example: a set of rake tasks that produce pack-json (with a json spec, for example) and then a series of vega-lite based visualization tools

Message originally sent by slack user U70TIGAX94P

Parse-Packwerk is reverse engineered and thus will always miss some stuff. IMO packwerk should have a way to dump all references as a edgelist.

Message originally sent by slack user U7213XMGS3H

that’d be nice