Dienstag, 10. August 2010

From Control Theory to CQS

The term observable state steems from control theory, which is about state-systems. The observable state is the state of a system which can be observed (queried) from the outside world, while the controllable state of a system the state is, which may be changed.

Thinking in the concepts of encapsulation, observability enables it to find something out about the system. In OO, this would mean that all properties of a type which are observable must be public (or protected) queries on the object.
Controllability, on the other hand, defines operation on a system. Ideally, a system (or object) cannot change. In that case, it offers only observable properties and is stateless (immutable).

However, any kind of program needs to modify some state - otherwise, its just a "black box that getters hotter" (google this). For classical OO, I'd like to add a further definition exclusively for programming: Two systems (objects) can be considered to be equal at a given state, if all of their observable properties are also equal (recursively by this definition).

Taking these three approaches: Controllability, Observability and the Equality Definition, it's very easy to define referential transparency as well: All observations which are done with the same parameters on the same observer method at the same system's state must yield an equal observation result. After each control operation, all observer methods might have changed their result (but do not need to), but a guarantee is given that their result will once again be constant until the next control opreation arises.

Quite obviously, this is only true as long the system does not introduce side effects, which would modifiy the observable state internally without external control. Now, we're coming to next and last two definitions:
A system without malicious side effects and just with observations is a functional pure system.

However, there are few programming languages which model this definition, since the referential transparency cannot always be guaranteed: Simply think about external resources which are needed for in- and output by the program, for example. If these would be used in queries, then even for the same parameters (connection settings, file names), the same results could never be guaranteed. Therefore, imperative languages allow it, besides writing pure functions, to introduce invisible side effects as well - usually these are the resource for countless bugs and hard-to-read code.

And now up to the last step: A good approach in these impure languages could be the command and query pattern for system building: Queries are functional pure and referential transparent observations of the system, Commands will change the state of the system.
After a command has been executed, all queries on a single object are no longer guaranteed to yield the same results. Furthermore, all other objects which directly or indirectly depend on the queries on this object might have also changed their state as well.

Whats the point behind all these definitions? To show how different concepts in system building can be used to define and program software that is easy to understand and tries to avoid side-effects in imperative languages.

In my opionion, the problems of all imperative approaches are not that important as long as the command / query pattern is used massively and commands are tried to be reduced to those which are really needed, falling back to immutables in all other cases.

Under these circumstances, and given that the types have a low cohesion, so that commands on one object cannot influence the queries on another, an imperative system should avoid all of the common problems with the uncontrollable side-effects.

Note, however, that i am not yet fully aware on how systems would look like if the concept would be consequently applied on a large project. Maybe I should ask Betrand Meyer, the genius how proposed most of these OO-Concepts, esp. CQS, long ago....

Sonntag, 8. August 2010

Simpler Java #8: Control the scope of statefuls

Prelude: This is a follow up both on the command/query, the functional methods and the immutable articles. It takes all of them to show a pattern to make it easy to control the scope of your states.

This is where I wanted to go, exactly: I summed up all the information below for one single task: Postphoning a simple, yet very rigid design principle for all kind of OO:

All stateful object operations (the states) must all happen in the same scope. It is not allowed for any object to execute commands on objects they do not own. Likewise, no object may reveal its inner stateful objects to the outer world. If they want to, they have to provide copies. No singleton or global object is allowed to exist which maintains ANY kind of state.

Lets start with a simple example what this means: I got an kind of stateful object which i mostly use for random-access reading. Say its a "DbClient". The client has countless "read" operations, and some commands to actually write values (which will change its states) and the close operation. In that particular usecase, it must be ensured that all write operations run over the scope of the holder of the db-object, while all "read" operations are freely allowed in any context.

Achieving this aim is after all simpler than it seems: After all, all you need to do is separate between read and write access. Recommended would be a design in which all queries are contained in its own interface, and likewise, the write operations are contained in its own interface as well (the commands), which extends the read interface. Then, define a class which inherits from the write interface and ensure that in all calls, the read interface is given, and the write interface is only used internally in the class.

Actually, its quite obvious that this approach will have countless flaws and will never be a well-defined design priniciple.. After all, there will always be situations, where this design does not work. However, I guess that:
1. These situations are rare and
2. Possibly these are exactly those situations which WILL prove to be problematic at run time, after all.

I guess that most of the so called "needed" global stateful objects with write access are in fact just an example of bad design. To be more precisely: I still believe that its possible by rigid use of immutables, to ALWAYS lock down mutable objects and resources into a more limited scope than the public one.

Trust me: All I want to achieve is the problematic scenario of these "god" objects, which are just maintainance nightmares, and at the lowest level, EVERY mutable object has the potential to screw the advantages of encapsulation if it contains even a single-lined command.

Simpler Java #7: The Query / Command Pattern

Yet another pattern for simpler programming, but this time its a rather controversional one, i suspect.

The language Eiffel devised the "query / command" pattern for its methods:
Queries are methods with result values (functions, in some other languages). A query can be used to calculate values, but may not introduce modifications to the object.
Commands are methods with side effects (produces). A command will change the state of an object or resource, but it does NOT provide information about the process (hence, void return type).

This separation, detailed in more detail in a lot of sources, can be very productive for ANY programmer to follow. In the sense of java, query / command would mean the following:
A query may only return new data. it is not allowed to use commands by itself, expect for temporary objects which it created. Especially is it not allowed to change any kind of global public state.
A command may only modify data, but never yield information. After each command, the internal states of all queries might have changed. Up this point, all queries of the object are guaranteed to yield the same results for the same inputs (are referential transparent).

Advantages / Disadvantages
  • Query command is the clean bridge between the functional and the imperative world. All queries are guaranteed to be pure functions which will always yield the same result UNTIL the state changes.This allows for optimizations.
  • State-changes (void methods) are easily to see in source-code, since they have no result.
  • The code can be more verbose, since command chaining is not allowed, and streaming is more complicated. An example of stream are java file streams and iterators. Both have a kind of "next" method, which yields the next result and the moves forward to the next. In the query-command pattern, this must be separated into to methods (in example, next and get).
  • The former can actually be seen as an advantage: Since commands are required to be given line-by-line and not inspected for informations, they cannot be confused with queries AND this pushes immutable solutions, which solely rely on queries and can be easily chained.
  • The query / comand patter is a very good model of a real-world concept. One of the disadvantages of functional programming is that mutable state requires more complex concepts as effect systems. The query command pattern describes imperative programming on the other hand as valid to "go to the next state" BUT it does not allow it to return values, as calculation must be done in (pure) functions.
 Yielding the query/command pattern should be the goald of any good programmer. Try to really use the "void" type if you want a command with sideeffects. Command-Chaining, like in the StringBuilder, is a very powerful tool, but I am now less convinced wether it is also a valid one.. Making the imperative world a bit harder, but easier to understand, and the functional one more prominent seems to be an approach which is well-worth the trying.

Simpler Java #6: On Methods, Functional Programming and Mutation Limitation

Note: See the prelude below for some of my motivations behind this.

#1 Use immutable objects
See below for details on them.

#2 Limit the scope of mutable objects
You _must_ use mutable objects in java for a lot of usecases, where an immutable concept would be simpler and better to use, but the JDK approach is the default "consensus" for the dev community. The most popular example for this is the dreaded java.util.Collections-Framework: Even though immutable collections are way easier to work with, since they can be given limitless to other classes and methods, the JDK choose to go the mutable route instead WITHOUT immutable interfaces at all... As I've told right from the beginning, I do not want to tell you how to write the best code FOR YOU but for the best of everyone. And in the world of java, everyone expects collections to be mutable - like "dates", urgh.

However, not everything is lost as long as you try to follow the following simple rule:

#3Member functions must provide access to an immutable objects only, to prevent second-level changes.
"Second-level"-changes are changes to the members of an object, which can make it very hard to make safe assumptions about the internal state of an object. Not to provide access to mutable members rules out collections, arrays, dates and a lot of other objects as possible candidates for getters. This is a hard rule, there are no bypasses for it.
In general, it permits one of the common problems: You have an object that holds a list, and you want the outer world to get read access for the list or you maybe even want them to set a new list. In the later case, you will trigger some magic action as a sideeffect. (You shouldn't, but just say you WANT it for now).

Do you see the problem? What happens if someone changes your list inplace, instead of setting a new one?   First of all, your magic action won't work.
Secondly, all other assumptions you have made internally about the list may be screwed as well. Usually, these will be some static calculations for performance reasons and the like. And what its now? Misleading crap. You can program defensively around it, but why would you? Its a lot of work to do it, and a lot of unnecessary code lines (and wasted Performance).

#4How can I then provide access to them, after all?
The two possible options are using immutable "wrappers" (which should not have mutators in their signature) or to provide access only to copies of the object.
The later is the common idiom for arrays and dates, and its completely ok, but it can cost your performance if done heavily. The former is the preffered idiom, and an example of this are the immutable wrappers for collections: Collections.unmodifiableList(...).
In a perfect world, the java collections would have been designed in such a way that the list interface provided no write options, and a subclass called "WriteList" would have been made for this. However, there isn't, and working with immutable collections will always have the potential for UnsupportedOperationExceptions on the client side. That problem, however, must be handled by the client, not the caller of the function.

Thats it. This simple rule prevents dozen of dangerous combinations which are really hard to track. But once again: This counts only for the outer world, i.e. in public AND protected scope. On the lower levels AND in method scope, these limitations do not exist. It can be quite common to have lists of mutable objects (which this rule would permit on public scope) as a short-lived object in one of your methods.
Thats ok by all means, since the problems I described will usually only arise if you do not have the full control over who changes what.

One final word: It is not allowed to have all of your classes in the same package for your project to bypass any of the encapsulation rules. Hm ok. If you really can create a project in only one package, then I guess its ok. ;-)

Samstag, 7. August 2010

Simpler Java #5: Use Immutables

Note: I want to show some simple programming concepts which are easy to follow and result in simpler java code. All the while, I will never break with the default java coding style, so even developers not aware of these "techniques" can still easily read your code.

An immutable object is defined as an object of which all accessible fields available members are set final (so only getters are available) and are immutable themselves. Modification of the values is usually achived by copy on write: Create a new instance with 1 or more modified values. You cannot change them in-place, but you can have variables which hold different immutables during the flow of a method. This is exactly the behavior of the string class, so you will know the concept already.

#1 Immutability counts only for the public interface
Its still allowed to have mutable fields and mutable objects within an immutable. However, the may not be accessible from the outside world in any way which exposes this fact. Even further, the outer world may never be allowed to give these values even on construction! (As they could change them later and break with the immutability. So, having a mutable list with mutable values for some reasons is ok, as long as neither of these can be accessed in such a way from the outside which would permit a mutation.
In general: Immutability must be guaranteed for public and protected scope only. Its completely ok to achieve internal modifications on package-private and private level as long as they are reasonable, of course, and mostly only ever for performance tricks. Note, however, that all these modifications may never change the return values of the object in any way NOR break with its thread-safety, so this is not an "all-in"!.

#2. Immutability allows optimization
See wikipedia and other sources for details on this. Copy-On-Write might seem like a limitation, but especially for small objects its nearly always worth it, esp. since you can reuse objects in multiple contexts, which often saves you from creating an equal instance over and over.

#3. Immutability should not be designed for inheritance.
Simple one. Even though inheritance is possible, you should not allow it unless it is really needed. Make the class final or give it a public constructor. The advantages of inheritance are seldomly needed, and unless you need them you should not risk it to screw everything up if someone introduces a mutable subclass of your immutables. Note that this setting immutables final is commonplace in even in the jdk: Both String and the primive wrappers are all final, after all.

#4 Immutable objects should have a hidden constructor and rely on a factory method or a builder instead.
See the java builder pattern (effective java) for details. In general, factory methos or the builder pattern are always prefereable, since you sometimes want to decide wether you really need to create a new instance.  Immutables can be cached, so with a factory, you can introduce a caching technique later if you need them.
One the bigger flaws in the jdk primitive wrappers is that they all have public cons, which everyone use, even though there is a factory method (valueof), which can nearly always skip the creation for small int values and the like
Learn from this: Create the factory method right off the bat, and give it a simple and inituitive name. "of" should be always preffered for its conciseness.

Prelude for Simpler Java #5: Functional coding style in java

This time, I will lay out my concepts for nice and maintainable programming in java, using a more functional style, but pertaining the imperative approach to have it still look like java. The driving force behind this is, first of all, this: I want to write code which is fast, maintainable, testable and easy to understand when you revisit it in 5.000 years.

#On Strongly imperative programming and its problems
Thats what all the following parts will be about: Its not that imperative programming is bad (at least not in java), as some people might suggest. However, this approach has indeed a lot of problems, which reach far deeper than just the obvious concurrency issues. The real problem tends to be that is very hard to refactor and re-design strongly imperative code with a lot of side-effects into smaller portions. Of course it _is_ possible, but you will not always end up with something that is really "reusable".
Instead, the refactoring will usually yield in a lot of imperative methods which are not useful by themselves and have to be executed in a special order for a special usecase. Thats still good, but is just not as rewarding as developing something that is REALLY reusable.

#Advantages of imperative approaches
Nearly all languages today are imperative, and I guess nearly all of us (including me) have started in that style. Its main selling point, I think, its that it feels more natural for beginners, which simply will need the flexibility of re-defining their variables all the time.
Also, imperative programming is simply the logical conclusion of our work: We want to introduce side-effects, after all. Otherwise, why are we coding? Note: This is answer is not really correct. See Haskell in which even IO and the like are implemented in a very functional way. But I guess its "correct-enough" for all of us for now.
So, implementing workflows and the like in a very imperative way should be ok for now. This code will not be reusable most of the time anyway.

#Advantages of functional approaches
Functional programming has no sideeffects, as every function has input values (which are not mutable) which result in output values which are used as an input for the next function. So output values are indeed what they are meant for, and inputs never modified.
This approach strongly relies on closures (function variables) and really shines for 90% of the code we are working on, esp. Collections and DataStructures. It should be obvious that I want to advocate this style of programming even in java, were it is not a first-class citizen, but using it will still be very rewarding.

#On closures and other advanced functional concepts
Quick answer: I do not advocate to use them in java. Its a pity and a very limiting factor in my design, but I won't use inner classes heavily for all collection operations. They are hard to read, when used on masse, and results in java code which is not easy to understand for the run-of-the-mill java developer. We will have to wait for java 7 for this issue, I have to recognize. (or use JEDI).

Dienstag, 3. August 2010

Simple Java #4: Constructor Rules

After writing tons of java code, and experimenting a lot with the language and the different types of constructors, i will try to give some good rules of thumb for defining constructors, which I will discuss along with some information why you should obey them below.
  1. Use Constructors only to initialize fields, not to evaluate logic
  2. A Cons should initialize all or none of its fields
  3. Use only the 3 standard constructors: Default, Canonical, Copy
  4. Avoid public constructors for generic classes
  5. Don't throw recoverable (checked exceptions)
  6. Use factory methods in all other cases!

#1 Use Constructors for field initialization
This rule is actually the most important one: Do do not anything else than setting the fields. The only possible operation would be to validate input arguments against null or invalid values.
I would not even allow it to wrap input collections into immutable wrappers or something like that.
Thats either the responsibility of the caller or the factory method.

Why? Because it keeps constructors easy to understand, scale AND test.

#2 A cons should set all or none of its fields
If you design an immutable object, then set all values. For mutable ones, use the default constructor. In both cases, it will be easier for your user if he does not have to check which fields have been set and which not.
This does not take internal "cache" fields into account, which the api contract does not tell about.

#3 Use only the standard constructors
There are three common constructors which should help you for most usecases. In general, try to avoid constructor overloading to provide default values for some fields, as it is often done in the JDK. Thats better suited for factory methods.
#3.1 Default constructor.
Unlike the java api tutorials indicate, default constructors are rarely useful. In my opionion, the only possible usecases are mutable objects (which you should try to avoid) and singletons/bootstrap classes (from which you do not need more than one per isolated bundle, completely free of all logic).
A singleton default constructor should be private and mostly concerned with bootstraping. This is the only usecase i know where constructors will do more than just setting their fields: But since its private, it does not really matter at all.
#3.2. Canonical Constructor
This is usually the best choice and the obvious one for immutable objects.
#3.3  Copy Constructor
Since clone has too many problems, implementing copy constructors should usually be a common task for mutable objects. Most of the time, the copy constructor will deconstruct the values and hand them to the canonical constructor.

#4Constructors crap with generics.
This is bascially a java specification failure: On static methods, it is sufficent to mention the generic types in the declaration, for constructors, however, they need to be used on BOTH sides.
Example:
List[String] l = new ArrayList[String]();
BUT:
List[String]l = Lists.emptyList(); // return type inference.
Therefore, you should by all means use factory methods for generic types whenever possible.

Why is it like that you might ask? Because the idiots which defined generics thought that the first (constructor) notation is the one which everyone should use, but they quickly found that that it makes working with generic helper methods needlessly complicated. So they invented a (limited) typeinference for that task. Google it for details.

#5 Don't throw recoverable exceptions.
Recoverable exceptions happen if something in the "big picture" fails, esp  something concerned with the outer world, like i.o. In these cases, you should not even use factory methods: Use factory objects, which know how to handle that world AND are testable.

A classical counterexample are the java-io stream classes.

#6 Use Factories and Factory methods
All complicated cases should be handled by factory methods instead! Why?
Because:
- They can be named, so you can make your intention clear.
- They offer return type inference, making them better suited for generics
- They can choose which implementation to use <-> Do not bind to a specific implementation. Can be a huge performance advantage esp for immutables.
- Refactoring the class into an interface later is a bit easier. If java were able to define static methods on interfaces, it would be even without ANY problems at all...

Sometimes, you should even use a factory instead:
- If you have a dependency on anything complicated (like the outer world), a factory is mandatory
- For checked exceptions with recoverable situations. Static method should not contain any kind of logical branchning.
Whats up with all of this fuzz?Answer: Scaleability AND simpler testing. When you start to test, you might want to "simply" construct objects by a (package) private constructor without those crazy rules, and you might want to test the logic seperately. Both is much easier if checks and the like are not part of the constructor but of static helpers.

Also, inheritance is much easier to implement when you have at max 3 well-known constructors at hand.
And, when thinking about it, factory methods to make you intention much clearer than a plain old-meaningless constructor, from time to time.

Factory Method naming patterns
Heres a simple trick if you do not know a suitable name: Use "of" in all cases where you would have formerly created a trivial constructor.

MyClass.of(Value1, Value2) will then yield the new instance. Its short, sexy and comfortable with generics. :-)