Rails Modular Monolith: Bulk Import with Validations - Best Practices?

gpassero · November 8, 2022, 1:55am

This message was imported from the Ruby/Rails Modularity Slack server. Find more info in the import thread.

Another rails modular monolith question: bulk import (eg from CSV).
My validations are on the model (business logic), behind an API.
I’ve added a method to create a record, and pass back validation error messages. No problem there.
For scalable bulk import, Rails provides insert_all . Combine with batched record parsing (eg https://github.com/tilo/smarter_csv) = no memory footprint explosing and fast(est) import.
However, insert_all doesn’t perform any validations.
How are you doing your bulk data import with validations?
Seems like I either need to move my validation rules to the API itself, so clients can do the checks there, or I have to import records one at a time, to get Rails model validation.
What’s the obvious best practice that I missed here? Background job + batched record parsing + Rails model creation one-at-a-time with validation?

system · November 8, 2022, 9:36am

Message originally sent by slack user U715EVJUTD0

The activerecord-import gem supports running validations as part of the import, if that helps, as well as batching.

But I’d also be keen to know what the best practice is here

system · November 8, 2022, 10:29am

Message originally sent by slack user U715TW4PT5O

We do all our business logic in “Operations” / Service Objects. We used to use Trailblazer, but found it kind obscured too much data, so we have a much simpler PORO structure now. So we have both “thin controllers” and “thin models”.
Then it’s easy to reuse those service objects everywhere from REST endpoints, GraphQL mutations, and bulk imports all trigger that single point of business logic.

In bulk jobs, we aggregate the errors and notify the user of the failed records when done.

So a rest endpoint might be roughly like

def create
 @result = CreateAThing.run(params)
end

And the bulk import would be used something like

def process(csv)
  csv.each do |line|
    params = line_to_params(line)
    @results << CreateAThing.run(params)
  end
end

gpassero · November 8, 2022, 3:09pm

<@U715EVJUTD0> I saw that, and was intrigued, but thought it was made for a time before Rails model insert_all.

<@U715TW4PT5O> Thanks for sharing your approach! What happens within the CreateAThing though… ? where are you performing data validations?

system · November 8, 2022, 3:32pm

Message originally sent by slack user U715TW4PT5O

We have a standard Results class that every operation returns - and so validation is the left to individual service object. For complex stuff we use Reform since we used that within Trailblazer, and it works nicely to separate validation logic away from models. But we sometimes also just use the ActiveModel validations as normal, or, for simple validations just ad-hoc code.

The Result is really simple:

class Result
  attr_reader :errors, :object

  alias model object

  def initialize(success:, errors:, object:)
    raise 'Success must be true or false' unless success.is_a?(TrueClass) || success.is_a?(FalseClass)

    @success = success
    @errors = errors
    @object = object
  end

  def success?
    @success
  end

  def failed?
    !success?
  end
end

system · November 8, 2022, 4:49pm

Message originally sent by slack user U715TW4PT5O

@gpassero Given the above, a really simple operation could be:

class CreateAThing
  def run(params)
    thing = Thing.new(params)
    if thing.save
      Result.new(success: true, object: thing, errors: nil)
     else
      Result.new(success: false, object: nil, errors: thing.errors)
     end
end

Obviously, in that example it doesn’t really offer a lot of advantages to just a using the model, aside from hiding the model from been directly used by other code. So, in Packwerk world we would expose the CreateAThing as public, but keep Thing private to enforce boundaries across code.