Best Practices: Java Function Signatures

[Note: most of these examples use the Guava open-source library; I highly recommend it.]

Say I have a Java function that takes a bag of objects, does some processing, and returns another bag of objects. I could naively write the function like this:

public Foo[] process(Foo[] inputs) { ... }

This is obviously terrible, but let's talk about why it's terrible. First, we've constrained the input to be in the form of an array, which is an inflexible data type used only rarely in Java. Most callers will have their data in a collection of some sort and will be forced to call .toArray(), which is a waste of CPU and memory. The return value is equally bad - arrays are not first-class collections in Java, and in order to do any interesting additional processing they're going to have to do an explicit conversion using, say, Arrays.asList() or Arrays.stream().

Let's look at a slightly less bad version of this function:

public List<Foo> process(Set<Foo> inputs) { ... }

Now at least we're using Java collections. List<> is a useful collection and might do all of the things the caller is interested in doing with the result (but we'll get to that later).

Let's consider the parameter. The function is asking for a Set<>. Why? Does it need a collection of possibly unordered, unique Foo objects? Or does it just stream() them or iterate through them and not really care what order they're in? If that's the case, then why are we requiring passing in a Set<>? The answer is probably we created this as a helper function and the caller was keeping the data in a Set<> so we just copied that. But there's no good reason to require a Set<> if a more general type would suffice.

Likewise, let's consider the return value. Our function probably generates a list of objects internally, so just returning a List<> is fine. But List<> doesn't really tell us anything about the kind of list we've generated. Is it mutable? Immutable? The caller might need to know whether it needs to make a copy of the list to make changes to it! In these cases, especially if we're returning an immutable collection, we might want to make that specific to give our caller a heads-up!

One more time, this time even better!

public ImmutableList<Foo> process(
  Collection<Foo> inputs) { ... }

public ImmutableList<Foo> process(
  Iterable<Foo> inputs) { ... }

Both of these are perfectly fine. I tend to prefer passing in Iterable<> since it's more general and allows for lazy-evaluated sources, but Iterable<> isn't supported by Java's native for-each loop and getting a Stream<> from an Iterable<> requires the more complex call:

Streams.stream(inputs)

Rather than just:

inputs.stream()

So go with whatever you're more comfortable with.

There's still one thing more we can do to improve this, however. Note that the above requires that the Collection<> or Iterable<> be of a specific type. But in the case where we might have polymorphic objects, there's no reason the collection couldn't be of a child type - the function should be able to handle it all the same. So, let's finish with the best possible version of this function for an arbitrary class Foo (if the argument is a primitive type or String we shouldn't do this, though):

public ImmutableList<Foo> process(
  Collection<? extends Foo> inputs) { ... }

public ImmutableList<Foo> process(
  Iterable<? extends Foo> inputs) { ... }

And there you have it! The perfect function signature.
Have fun and happy coding!