Johnnys Draftblog: 2010

Dienstag, 10. August 2010

From Control Theory to CQS

The term observable state steems from control theory, which is about state-systems. The observable state is the state of a system which can be observed (queried) from the outside world, while the controllable state of a system the state is, which may be changed.

Thinking in the concepts of encapsulation, observability enables it to find something out about the system. In OO, this would mean that all properties of a type which are observable must be public (or protected) queries on the object.
Controllability, on the other hand, defines operation on a system. Ideally, a system (or object) cannot change. In that case, it offers only observable properties and is stateless (immutable).

However, any kind of program needs to modify some state - otherwise, its just a "black box that getters hotter" (google this). For classical OO, I'd like to add a further definition exclusively for programming: Two systems (objects) can be considered to be equal at a given state, if all of their observable properties are also equal (recursively by this definition).

Taking these three approaches: Controllability, Observability and the Equality Definition, it's very easy to define referential transparency as well: All observations which are done with the same parameters on the same observer method at the same system's state must yield an equal observation result. After each control operation, all observer methods might have changed their result (but do not need to), but a guarantee is given that their result will once again be constant until the next control opreation arises.

Quite obviously, this is only true as long the system does not introduce side effects, which would modifiy the observable state internally without external control. Now, we're coming to next and last two definitions:
A system without malicious side effects and just with observations is a functional pure system.

However, there are few programming languages which model this definition, since the referential transparency cannot always be guaranteed: Simply think about external resources which are needed for in- and output by the program, for example. If these would be used in queries, then even for the same parameters (connection settings, file names), the same results could never be guaranteed. Therefore, imperative languages allow it, besides writing pure functions, to introduce invisible side effects as well - usually these are the resource for countless bugs and hard-to-read code.

And now up to the last step: A good approach in these impure languages could be the command and query pattern for system building: Queries are functional pure and referential transparent observations of the system, Commands will change the state of the system.
After a command has been executed, all queries on a single object are no longer guaranteed to yield the same results. Furthermore, all other objects which directly or indirectly depend on the queries on this object might have also changed their state as well.

Whats the point behind all these definitions? To show how different concepts in system building can be used to define and program software that is easy to understand and tries to avoid side-effects in imperative languages.

In my opionion, the problems of all imperative approaches are not that important as long as the command / query pattern is used massively and commands are tried to be reduced to those which are really needed, falling back to immutables in all other cases.

Under these circumstances, and given that the types have a low cohesion, so that commands on one object cannot influence the queries on another, an imperative system should avoid all of the common problems with the uncontrollable side-effects.

Note, however, that i am not yet fully aware on how systems would look like if the concept would be consequently applied on a large project. Maybe I should ask Betrand Meyer, the genius how proposed most of these OO-Concepts, esp. CQS, long ago....

Sonntag, 8. August 2010

Simpler Java #8: Control the scope of statefuls

Prelude: This is a follow up both on the command/query, the functional methods and the immutable articles. It takes all of them to show a pattern to make it easy to control the scope of your states.

This is where I wanted to go, exactly: I summed up all the information below for one single task: Postphoning a simple, yet very rigid design principle for all kind of OO:

All stateful object operations (the states) must all happen in the same scope. It is not allowed for any object to execute commands on objects they do not own. Likewise, no object may reveal its inner stateful objects to the outer world. If they want to, they have to provide copies. No singleton or global object is allowed to exist which maintains ANY kind of state.

Lets start with a simple example what this means: I got an kind of stateful object which i mostly use for random-access reading. Say its a "DbClient". The client has countless "read" operations, and some commands to actually write values (which will change its states) and the close operation. In that particular usecase, it must be ensured that all write operations run over the scope of the holder of the db-object, while all "read" operations are freely allowed in any context.

Achieving this aim is after all simpler than it seems: After all, all you need to do is separate between read and write access. Recommended would be a design in which all queries are contained in its own interface, and likewise, the write operations are contained in its own interface as well (the commands), which extends the read interface. Then, define a class which inherits from the write interface and ensure that in all calls, the read interface is given, and the write interface is only used internally in the class.

Actually, its quite obvious that this approach will have countless flaws and will never be a well-defined design priniciple.. After all, there will always be situations, where this design does not work. However, I guess that:
1. These situations are rare and
2. Possibly these are exactly those situations which WILL prove to be problematic at run time, after all.

I guess that most of the so called "needed" global stateful objects with write access are in fact just an example of bad design. To be more precisely: I still believe that its possible by rigid use of immutables, to ALWAYS lock down mutable objects and resources into a more limited scope than the public one.

Trust me: All I want to achieve is the problematic scenario of these "god" objects, which are just maintainance nightmares, and at the lowest level, EVERY mutable object has the potential to screw the advantages of encapsulation if it contains even a single-lined command.

Simpler Java #7: The Query / Command Pattern

Yet another pattern for simpler programming, but this time its a rather controversional one, i suspect.

The language Eiffel devised the "query / command" pattern for its methods:
Queries are methods with result values (functions, in some other languages). A query can be used to calculate values, but may not introduce modifications to the object.
Commands are methods with side effects (produces). A command will change the state of an object or resource, but it does NOT provide information about the process (hence, void return type).

This separation, detailed in more detail in a lot of sources, can be very productive for ANY programmer to follow. In the sense of java, query / command would mean the following:
A query may only return new data. it is not allowed to use commands by itself, expect for temporary objects which it created. Especially is it not allowed to change any kind of global public state.
A command may only modify data, but never yield information. After each command, the internal states of all queries might have changed. Up this point, all queries of the object are guaranteed to yield the same results for the same inputs (are referential transparent).

Advantages / Disadvantages

Query command is the clean bridge between the functional and the imperative world. All queries are guaranteed to be pure functions which will always yield the same result UNTIL the state changes.This allows for optimizations.
State-changes (void methods) are easily to see in source-code, since they have no result.
The code can be more verbose, since command chaining is not allowed, and streaming is more complicated. An example of stream are java file streams and iterators. Both have a kind of "next" method, which yields the next result and the moves forward to the next. In the query-command pattern, this must be separated into to methods (in example, next and get).
The former can actually be seen as an advantage: Since commands are required to be given line-by-line and not inspected for informations, they cannot be confused with queries AND this pushes immutable solutions, which solely rely on queries and can be easily chained.
The query / comand patter is a very good model of a real-world concept. One of the disadvantages of functional programming is that mutable state requires more complex concepts as effect systems. The query command pattern describes imperative programming on the other hand as valid to "go to the next state" BUT it does not allow it to return values, as calculation must be done in (pure) functions.

Yielding the query/command pattern should be the goald of any good programmer. Try to really use the "void" type if you want a command with sideeffects. Command-Chaining, like in the StringBuilder, is a very powerful tool, but I am now less convinced wether it is also a valid one.. Making the imperative world a bit harder, but easier to understand, and the functional one more prominent seems to be an approach which is well-worth the trying.

Simpler Java #6: On Methods, Functional Programming and Mutation Limitation

Note: See the prelude below for some of my motivations behind this.

#1 Use immutable objects
See below for details on them.

#2 Limit the scope of mutable objects
You _must_ use mutable objects in java for a lot of usecases, where an immutable concept would be simpler and better to use, but the JDK approach is the default "consensus" for the dev community. The most popular example for this is the dreaded java.util.Collections-Framework: Even though immutable collections are way easier to work with, since they can be given limitless to other classes and methods, the JDK choose to go the mutable route instead WITHOUT immutable interfaces at all... As I've told right from the beginning, I do not want to tell you how to write the best code FOR YOU but for the best of everyone. And in the world of java, everyone expects collections to be mutable - like "dates", urgh.

However, not everything is lost as long as you try to follow the following simple rule:

#3Member functions must provide access to an immutable objects only, to prevent second-level changes.
"Second-level"-changes are changes to the members of an object, which can make it very hard to make safe assumptions about the internal state of an object. Not to provide access to mutable members rules out collections, arrays, dates and a lot of other objects as possible candidates for getters. This is a hard rule, there are no bypasses for it.
In general, it permits one of the common problems: You have an object that holds a list, and you want the outer world to get read access for the list or you maybe even want them to set a new list. In the later case, you will trigger some magic action as a sideeffect. (You shouldn't, but just say you WANT it for now).

Do you see the problem? What happens if someone changes your list inplace, instead of setting a new one? First of all, your magic action won't work.
Secondly, all other assumptions you have made internally about the list may be screwed as well. Usually, these will be some static calculations for performance reasons and the like. And what its now? Misleading crap. You can program defensively around it, but why would you? Its a lot of work to do it, and a lot of unnecessary code lines (and wasted Performance).

#4How can I then provide access to them, after all?
The two possible options are using immutable "wrappers" (which should not have mutators in their signature) or to provide access only to copies of the object.
The later is the common idiom for arrays and dates, and its completely ok, but it can cost your performance if done heavily. The former is the preffered idiom, and an example of this are the immutable wrappers for collections: Collections.unmodifiableList(...).
In a perfect world, the java collections would have been designed in such a way that the list interface provided no write options, and a subclass called "WriteList" would have been made for this. However, there isn't, and working with immutable collections will always have the potential for UnsupportedOperationExceptions on the client side. That problem, however, must be handled by the client, not the caller of the function.

Thats it. This simple rule prevents dozen of dangerous combinations which are really hard to track. But once again: This counts only for the outer world, i.e. in public AND protected scope. On the lower levels AND in method scope, these limitations do not exist. It can be quite common to have lists of mutable objects (which this rule would permit on public scope) as a short-lived object in one of your methods.
Thats ok by all means, since the problems I described will usually only arise if you do not have the full control over who changes what.

One final word: It is not allowed to have all of your classes in the same package for your project to bypass any of the encapsulation rules. Hm ok. If you really can create a project in only one package, then I guess its ok. ;-)

Samstag, 7. August 2010

Simpler Java #5: Use Immutables

Note: I want to show some simple programming concepts which are easy to follow and result in simpler java code. All the while, I will never break with the default java coding style, so even developers not aware of these "techniques" can still easily read your code.

An immutable object is defined as an object of which all accessible fields available members are set final (so only getters are available) and are immutable themselves. Modification of the values is usually achived by copy on write: Create a new instance with 1 or more modified values. You cannot change them in-place, but you can have variables which hold different immutables during the flow of a method. This is exactly the behavior of the string class, so you will know the concept already.

#1 Immutability counts only for the public interface
Its still allowed to have mutable fields and mutable objects within an immutable. However, the may not be accessible from the outside world in any way which exposes this fact. Even further, the outer world may never be allowed to give these values even on construction! (As they could change them later and break with the immutability. So, having a mutable list with mutable values for some reasons is ok, as long as neither of these can be accessed in such a way from the outside which would permit a mutation.
In general: Immutability must be guaranteed for public and protected scope only. Its completely ok to achieve internal modifications on package-private and private level as long as they are reasonable, of course, and mostly only ever for performance tricks. Note, however, that all these modifications may never change the return values of the object in any way NOR break with its thread-safety, so this is not an "all-in"!.

#2. Immutability allows optimization
See wikipedia and other sources for details on this. Copy-On-Write might seem like a limitation, but especially for small objects its nearly always worth it, esp. since you can reuse objects in multiple contexts, which often saves you from creating an equal instance over and over.

#3. Immutability should not be designed for inheritance.
Simple one. Even though inheritance is possible, you should not allow it unless it is really needed. Make the class final or give it a public constructor. The advantages of inheritance are seldomly needed, and unless you need them you should not risk it to screw everything up if someone introduces a mutable subclass of your immutables. Note that this setting immutables final is commonplace in even in the jdk: Both String and the primive wrappers are all final, after all.

#4 Immutable objects should have a hidden constructor and rely on a factory method or a builder instead.
See the java builder pattern (effective java) for details. In general, factory methos or the builder pattern are always prefereable, since you sometimes want to decide wether you really need to create a new instance. Immutables can be cached, so with a factory, you can introduce a caching technique later if you need them.
One the bigger flaws in the jdk primitive wrappers is that they all have public cons, which everyone use, even though there is a factory method (valueof), which can nearly always skip the creation for small int values and the like
Learn from this: Create the factory method right off the bat, and give it a simple and inituitive name. "of" should be always preffered for its conciseness.

Prelude for Simpler Java #5: Functional coding style in java

This time, I will lay out my concepts for nice and maintainable programming in java, using a more functional style, but pertaining the imperative approach to have it still look like java. The driving force behind this is, first of all, this: I want to write code which is fast, maintainable, testable and easy to understand when you revisit it in 5.000 years.

#On Strongly imperative programming and its problems
Thats what all the following parts will be about: Its not that imperative programming is bad (at least not in java), as some people might suggest. However, this approach has indeed a lot of problems, which reach far deeper than just the obvious concurrency issues. The real problem tends to be that is very hard to refactor and re-design strongly imperative code with a lot of side-effects into smaller portions. Of course it _is_ possible, but you will not always end up with something that is really "reusable".
Instead, the refactoring will usually yield in a lot of imperative methods which are not useful by themselves and have to be executed in a special order for a special usecase. Thats still good, but is just not as rewarding as developing something that is REALLY reusable.

#Advantages of imperative approaches
Nearly all languages today are imperative, and I guess nearly all of us (including me) have started in that style. Its main selling point, I think, its that it feels more natural for beginners, which simply will need the flexibility of re-defining their variables all the time.
Also, imperative programming is simply the logical conclusion of our work: We want to introduce side-effects, after all. Otherwise, why are we coding? Note: This is answer is not really correct. See Haskell in which even IO and the like are implemented in a very functional way. But I guess its "correct-enough" for all of us for now.
So, implementing workflows and the like in a very imperative way should be ok for now. This code will not be reusable most of the time anyway.

#Advantages of functional approaches
Functional programming has no sideeffects, as every function has input values (which are not mutable) which result in output values which are used as an input for the next function. So output values are indeed what they are meant for, and inputs never modified.
This approach strongly relies on closures (function variables) and really shines for 90% of the code we are working on, esp. Collections and DataStructures. It should be obvious that I want to advocate this style of programming even in java, were it is not a first-class citizen, but using it will still be very rewarding.

#On closures and other advanced functional concepts
Quick answer: I do not advocate to use them in java. Its a pity and a very limiting factor in my design, but I won't use inner classes heavily for all collection operations. They are hard to read, when used on masse, and results in java code which is not easy to understand for the run-of-the-mill java developer. We will have to wait for java 7 for this issue, I have to recognize. (or use JEDI).

Dienstag, 3. August 2010

Simple Java #4: Constructor Rules

After writing tons of java code, and experimenting a lot with the language and the different types of constructors, i will try to give some good rules of thumb for defining constructors, which I will discuss along with some information why you should obey them below.

Use Constructors only to initialize fields, not to evaluate logic
A Cons should initialize all or none of its fields
Use only the 3 standard constructors: Default, Canonical, Copy
Avoid public constructors for generic classes
Don't throw recoverable (checked exceptions)
Use factory methods in all other cases!

#1 Use Constructors for field initialization

This rule is actually the most important one: Do do not anything else than setting the fields. The only possible operation would be to validate input arguments against null or invalid values.

I would not even allow it to wrap input collections into immutable wrappers or something like that.

Thats either the responsibility of the caller or the factory method.

Why? Because it keeps constructors easy to understand, scale AND test.

#2 A cons should set all or none of its fields

If you design an immutable object, then set all values. For mutable ones, use the default constructor. In both cases, it will be easier for your user if he does not have to check which fields have been set and which not.

This does not take internal "cache" fields into account, which the api contract does not tell about.

#3 Use only the standard constructors

There are three common constructors which should help you for most usecases. In general, try to avoid constructor overloading to provide default values for some fields, as it is often done in the JDK. Thats better suited for factory methods.

#3.1 Default constructor.

Unlike the java api tutorials indicate, default constructors are rarely useful. In my opionion, the only possible usecases are mutable objects (which you should try to avoid) and singletons/bootstrap classes (from which you do not need more than one per isolated bundle, completely free of all logic).

A singleton default constructor should be private and mostly concerned with bootstraping. This is the only usecase i know where constructors will do more than just setting their fields: But since its private, it does not really matter at all.

#3.2. Canonical Constructor

This is usually the best choice and the obvious one for immutable objects.

#3.3 Copy Constructor

Since clone has too many problems, implementing copy constructors should usually be a common task for mutable objects. Most of the time, the copy constructor will deconstruct the values and hand them to the canonical constructor.

#4Constructors crap with generics.

This is bascially a java specification failure: On static methods, it is sufficent to mention the generic types in the declaration, for constructors, however, they need to be used on BOTH sides.

Example:
List[String] l = new ArrayList[String]();

BUT:
List[String]l = Lists.emptyList(); // return type inference.

Therefore, you should by all means use factory methods for generic types whenever possible.

Why is it like that you might ask? Because the idiots which defined generics thought that the first (constructor) notation is the one which everyone should use, but they quickly found that that it makes working with generic helper methods needlessly complicated. So they invented a (limited) typeinference for that task. Google it for details.

#5 Don't throw recoverable exceptions.

Recoverable exceptions happen if something in the "big picture" fails, esp something concerned with the outer world, like i.o. In these cases, you should not even use factory methods: Use factory objects, which know how to handle that world AND are testable.

A classical counterexample are the java-io stream classes.

#6 Use Factories and Factory methods

All complicated cases should be handled by factory methods instead! Why?

Because:

- They can be named, so you can make your intention clear.

- They offer return type inference, making them better suited for generics

- They can choose which implementation to use <-> Do not bind to a specific implementation. Can be a huge performance advantage esp for immutables.

- Refactoring the class into an interface later is a bit easier. If java were able to define static methods on interfaces, it would be even without ANY problems at all...

Sometimes, you should even use a factory instead:

- If you have a dependency on anything complicated (like the outer world), a factory is mandatory

- For checked exceptions with recoverable situations. Static method should not contain any kind of logical branchning.

Whats up with all of this fuzz?Answer: Scaleability AND simpler testing. When you start to test, you might want to "simply" construct objects by a (package) private constructor without those crazy rules, and you might want to test the logic seperately. Both is much easier if checks and the like are not part of the constructor but of static helpers.

Also, inheritance is much easier to implement when you have at max 3 well-known constructors at hand.

And, when thinking about it, factory methods to make you intention much clearer than a plain old-meaningless constructor, from time to time.

Factory Method naming patterns
Heres a simple trick if you do not know a suitable name: Use "of" in all cases where you would have formerly created a trivial constructor.

MyClass.of(Value1, Value2) will then yield the new instance. Its short, sexy and comfortable with generics. :-)

Montag, 12. Juli 2010

Features of a language I would design

If I had the choice and time to define a programming language (what i would never do) which features would I include? This is not just a theoretical question, but also a way to find out which language I would seem as "best-designed".

References vs. Values
Java introduced the "Objects are Refs" concepts most of the modern languages seem to follow.Combined with the concept that all params are given "by value", most newbies tend to be a bit confused when using this paragidm.
I would take another, yet very compatible approach:
ALL VARIABLES ARE REFS.
As long as all "value" types (like int) do not allow an "inplace"-change on the first level (lower levels are not my problem), there is no difference between immutable ref types and values. None of them can be changed inplace!
Thats the idea upon which C# and Scala are built: You cannot change they values, you must assign new values.
Advantage of this scenario: The difference between refs and values does not exist from the design standpoint. Easier for the programmier!

Types
I would seem the following decisions to be indisputable concerning types:
static typing for maintainability and a good compiler
type inference so that types are only ever given in method signatures (where they are needed anyway to for maintenance reasons)
implicit conversions, the anti-element of strong typing are, as in scala, user-defineable, so that this powerful, yet dangerous approach to integrate libaries simple is fully accessable.

Generics
No types without generics! Typecasts should be tried to be avoided by clean generics-syntaxes at all costs. If possible, generics should be reified.

No checked Exceptions
Do it the go-lang way! Checked exceptions do not exist, but if a method can have errors, it must return a value which determines the type of error or an sucessful result. Exceptions are NOT USED for recoverable situations, UNLIKE in java. If it were, they MUST be checked (as java did correctly). But using return values in these situations is usually the much much better way.

Objectorientation, but less emphasis on inheritance
Object orientation is "good", but not inheritance. I do not know which kind of modeling concept I would prefer: Explicit interfaces, or the go-lang way with explicit duck-typing interfaces? Also, i am not quite sure if inheritance should not be in the language after all. However, i would never push it as the first solution (as java did).

Simple tuples and Closures
Tuples and Closures do two very important things: Tuples are mulittype arrays and are mainly used for multiple method return values. They are always designed as value-types (immutable). Closures are an integral part of any good language, so I will not discuss they advantages here.

No reflection
I would try to avoid all concepts of reflection in my language, simply because it defeats good design and most concepts and live without it. I would not do this for a language like a jvm-one, but I would do it if I were designing a machine-language.

Pattern matching
I am not sure if it would be as possible as in scala, but pattern matching should be a core element of the language as well, since its needed so often! unlike in scala, however, i would not define "case" classes but "val" classes, which in fact does nearly the same..

Functional approach
Like scala, the approaches would be very functional with only few keywords available, to make it easier to define your own keywords. For the same reason, the rules about brackets should be rather relaxed - but this is not for sure. Lisp shows that a concise, always identical approach has also advantages.

System language
If possible, the language should be powerful, but still good enough for a language for systems programming. Much like google go, but with less special rules which make this language seem a bit quirky in my opinion (pointers are not for maps/arrays i.e.).

Garbage collection
I need this one. The problem is that without a gc, memory management blends into the language. And thats bad, since it beats the entire concept and a lot of performance optimizations into the dust. With a gc, a lot of trivial objects could be allocated on the stack (if for a system language), since the language does not need to make clear where the objects are allocated. In general, the strategy would be to create small structs on the stack, but this rule is not in the influence of the programmer (unlike i.e. in c#). the compiler may try to find the best strategy here!

Method and operator overloading
Operator overloading must in general be allowed, since it must be the theoretical foundation of the basic data types. Method overloading, on the other hand, is hard to judge. I would not want if i really to wanted to make the language needlessly more complicated.

Dienstag, 29. Juni 2010

Simple Java3 #: A conclusion

This time, I just want to wrap up some very useful "rules" one should obey when writing Java:

Methods should not return or handle null, except if they indicate an optional operation

For optional Operations (like Map.get(key)), it must be clearly documented. This should be used seldomly, since it cannot be checked by the Compiler and the programmer must remember to ALWAYS handle null.
Methods which return collections or Strings should return Collections.emptyList / "" rather than null, even for optional values.
Never return null if an error occurs. Throw a checked exception if the error may happen and an unchecked Exception (usually, IllegalStateException) if the error must not happen and the app cannot recover from it.
Set all fields of a bean with initial values in the constructor to prevent them from being null.
Null input parameters should result immediatly in an IllegalArgumentException, to prevent these errors from waiting under the surface.
Default Values: Sometimes, it is possible to define a natural empty or default value for some classes. In these cases, the empty element can be a static final constant (if it is immutable) attached to the class and should be used in all cases were normally "null" would be used. Use this whereever appropiate - its way better than using null. Default Values can also be used as fallbacks when throwing an exception would not be possible, but some kind of value must be given as a certain point if everything else fails...

Collections as input and output parameters
Since JDK-Collections are always mutable by their interface, some additional contracts must be defined:

Methods should by all means never modify input parameter collections. It can be done on private methods, but it shouldn't be on public ones. Instead of changing the input collection, a new one should be created and returned. The performance inpact is negibligible, but the method becomes easier to maintain and understand.
Public methods returning collections must always return unmodifiable Collections, usually by wrapping the result with Collections.unmodifiableList (...). This makes the method more flexible, as it can return constants like Collections.emptyList(). If a method returns a modifiable list, the result would be sometimes mutable and sometimes not (the EmptyList constant) - thats just asking for trouble.
Beans which have collection properties MUST always wrap the results of their public getters in Collections.unmodifiableList, if the list is mutable. Collection properties may only be changed by setters - which replace the collection with another one - and explicit public mutators like public void add(...) on the bean itself. Changing collection properties in-place has countless side-effects and is very hard to maintain.
Whenever a collection of exactly one element should be created, then it should be done via Collections.singleton(Object).
If mutable collections are used as static fields or in singletons (most common static field: a (caching) Map), then the field must be synchronized via Collections.synchronized(...). or the access must be synchronized by hand. The later is the better choice, as it allows a more fine-grained control over the synchronization blocks. In general, just try to avoid mutable static collections, if you do not need them for performance reasons.
Try to use google collections. Google Collections have the ImmutableList/-Set/-Map, which allow it to create immutable collections without wrappers - yielding nice performance benefits on the way. These are so important that its really a pity that the JDK does not support these directly.

Use Unchecked Exceptions to signal programming errors
If it is possible that your program runs into an unstable / unexpected state, then you must throw a runtime exception indicating that there's a programmer's error in there.

Throw an IllegalArgumentException if your input is bad, esp. for null input values. Write static convenience methods for null checks, and use them in all setters and constructors where a null-check is needed.
Throw an IllegalStateException if your program is in an invalid state (bad assigned fields, wrong execution order..). This usually replaces returning null as an idiom if something fails.
Throw an UnsupportedOperationException if you cannot support a method derived from an interface, like write operations on immutable objects.
Throw an IndexOutOfBoundsException for crappy index accesses.
Throw an ClassCastExceptoin (or let the jvm do that for you in an unsafe cast) if your have the wrong object in your hands.

Of course there are more, but these are the most common... In general, all of them except the IllegalArgumentException indicate a (minor) error in your design: A good design tries to enforce compile-time-safety rather than handling problems later with runtime-exceptions:

IllegalStateExceptions can be avoided by using immutable objects as much as possible (see below), as they do not have states.
UnsupportedOperationExceptions can be prevented by defining "ReadInterfaces" and "WriteInterfaces" seperately. In general, the WriteInterfaces then inherit from the "ReadInterfaces" - also, try in general to define "thin" interfaces rather than rich interfaces. the JDK collections-library is a perfect antipattern for this.
IndexOutOfBoundsException: Index-Based-Accesses are in general a problem of the design. You should never rely on positions in your arrays, lists or strings. Instead, create your own classes which handle these problems. Also, you should never iterate within a "for"-loop on an index. for Lists: Use the iterator ( JDK 1.5: java's foreach) instead.
ClassCastExceptions, instanceof checks and casts are the bastard-stepchilds of the pre-JDK1.5 era. Starting from JDK 1.5, nearly all these usecases can be cleanly rewritten by using compile-time-checked generics. Also, you should never ever use collections with different kinds of objects in them!

At its best, a program relies solely on IllegalArgumentExceptions for null-checking and the like to fortify its code base and the maintainability of the code.

Do not swallow checked exceptions - re-throw them unchecked instead!

Everybody tells java programmers that they must not swallow exceptions. Good.
But: Nearly no one tells them what to do instead.
Its no wonder that a lot of programmers rather write empty catch blocks (perhaps with a log statement, if they where i a nice mood) for checked exceptions which they think can never happen.. Unless the program is really screwed or there is an error in the logic.

The solution, however, is very simple. Wrap them in a runtime exception, like this:

try{

Class.forName("horst");

}catch(ClassNotFoundException e){

// have to die if this would ever happen, but need to catch it

throw new IllegalStateException(e);
// If you have the apache-commons-lang.jar available, you should prefer using this:

// throw new UnhandledException(e);

}

Nice, hu? That way, you tell the world that this is an unrecoverable condition and by sure a programming error.
However, whenever you do this, you however have to remind the following: Never, ever do this on a condition which may fail, and does not just indicate a programming error if it is thrown:

If the programmer has made a mistake, a RuntimeException and an app crash are appropiate.
If however a File reading failed or something else from the outer world, you MUST handle that exception somehow.
Of course, you can always start by simply throwing the RuntimeException as above, but that should only be a temporary first solution.

Use Immutables

Always declare your fields final initially, and try to keep this wonderful modifier at all costs, since it helps to make your code understandable.

In general, it's a good idea to separate between immutable "value" objects which hold your abstract data, and "state" objects which have only internal fields which are not accessible, but know how to process these values.
This is a natural functional way to split (immutable) data from logic and makes your programs very fast (as immutables can be shared), easy to maintain and quick to understand at compile-time. Also, this solves tricky problems which can occur when defining hashCode on mutable fields..

Simple rule: Mutators (Setters) are evil.
Another rule. Immutables are not SLOW and not a matter of performance. You can always come up with fast immutable algorithms when needed. Do not sacrifice the maintainability of immutables just for the sake of performance (like the JDK did with its collections..)

Builders for Immutables

Sometimes, it its impossible to initialize an Immutable class just with the constructor - or it is clumsy and error-prone, since that "constructor" has a dozen or so fields, and you do not want to separate the object into smaller units.

In these situations, do not define setters as a last resort. Instead, define an inner class called Builder which allows it create objects of that class more easily. By convention, all method names of builders are those of the properties which are set, and a builder will always return a reference to itself by all calls (to allow method-chaining)

Other useful rules are to define no getters on them (builders are read-only), and a toString method which builds the object and then outputs it with toString (for testing).

class Foo{

public static class Builder{

private String name;

private int value;

public Builder name(String name){

this.name = name;

return this;

}

public Builder value(int value){

this.value = value;

return this;

}

public Foo build(){

return new Foo(name, value);

   }

   /** Common toString operation for all builders.*/
   public String toString(){
   return build().toString();
   }

}

public static builder(){

return new Builder();

}

private final String name;

private final int value;

private Foo(String name, int value){

....

(constructor body, getName, getValue)

}

Obviously, using the builder pattern is very verbose.

Its definitely not suited for simple beans as the one i have described here, but rather for more complex scenarios when you have to initialize immutable beans with collections. However, its better to be verbose (java beans are that anyway) than writing setters which make your code harder to understand and test.

Equals / Hashcode

Do not define equals / hashCode on mutable fields

Never include mutable fields into your hashing / comparision methods, as changing the values in-place will screw all HashSets / Maps where those values are resided in.

Also, its very hard to understand code which compares values with mutable states, as these comparisons could yield unexpected results in a multi-threading scenario.

Use it only for "natural comparison"

Nearly as important: Do not define equals / hashCode for anything else than a "natural" comparision of values in collections. For anything else, write your own named methods for "special" comparisions. There are FEWER applications for hashcode/equals than most programmers think.

Do not write your own custom logic

Rely on ides to build it for you by defining the fields. Everything else is very hard to read. One exception: Commons-lang has hashCodeBuilder and equalsBuilder, which can be both used for this comparison.

Implement proxies for equals / hashCode, when needed

If you have a value with mutable fields which should be put into a collection... than create a mutable "proxy" class which wraps a mutable instance. This can also be used if you want to use different "equals" implementations on a class for using it in different collections. The exact semantics depend greatly on your use case. The most common idiom are "KEY"-Classes for maps, which combine different values (strings, ints) to use them as one hashKey.

All statements said before also apply on "Comparable" classes.

Just as a reminder. In general, you should nearly always prefer the comparator class over implementating comparable, but for simple immutable values, also the later might make sense.

Overwrite toString and use it only for logging

ToString helps the developer (that is you) during a debug session.

Because of that, you should nearly always provide it with all fields of the object. If you do not want to write it by hand, you can use the ToStringBuilder of commons-lang, which provides a toString implementation which is based on reflection.

Never use it for functional parts of your application, except for immutable values like Strings, Integers, ..

For mutable values, always the reflective:"show-all" mechanism of commons-lang should be used instead.

Simpler Java #2 Know your RuntimeExceptions

The most common part that tends to go wrong in java is the exception handling. (Someone there how has never used an empty (or just "log") catch block. Anyone??).

To avoid the common problem, here are some very basic guidelines which should help:

Checked vs. RuntimeExceptions

Use checked exceptions for problems which MAY occur and which are not in scope of the app to handle (io-errors, db-errors). These must be handled in a suitable way, usually you must also inform the user about these problems.
Use unchecked exceptions whenever a problem occurs which cannot be handled in any way. The app might handle it on the highest level (users don't like GUI's which crash into nirvana), but in general the programmer has made a serious mistake here which must be solved in the code.

Exception Wrapping

Sometimes, you encounter a checked exception for something you know which must never go wrong in your app and which you do not want to handle therefore. An example are io-errors on stream.close() or checked exceptions on reflection or cloning (the last ones are flaws in the jdk design. ClassNotFound is not a checked exception by its definition. Usually, you can never handle this one).

In these cases, do this:

try{

// something that throws classnotfound

}catch(ClassNotFoundException e){

throw new RuntimeException(e);

// throw new org.apache.commons.lang.UnhandledException(e);

}

This wraps your checked exception in a runtime exception.

Do not return null

I said it in the last post, and I say it again: Null is not an appropiate return value if something went wrong. Use either a checked exception if an outer condition fails OR an unchecked one (more common) if the programmer made a mistake. Never return null just if something went wrong.
Null is reserved for lookup with maps and the like in java (and even thats crappy), and for nothing else.

Returning null has one good point, though: Its easy to wage the quality of source code simply by counting how often null is used, and why. ;-)

JDK- Unchecked Exceptions

There are some JDK-Exceptions which should be used in some of the most common idioms in java programming:

IllegalArgumentException: Most common exception ever. Use this for all kinds of invalid inputs, like null values (DONT use NoPE here!)

UncheckedOperationException: Used mostly for interfaces which allow mutation (java.util.List) but the concrete implementation does NOT (unmodifiable List).

IllegalStateException: Throw this one if your objects have a certain order in which their operations must be done, and this order has been broken.

IndexOutOfBounds: Very seldom should your implement your own indices. If you do, however, then this is an appropiate exceptions.

Why you should try to avoid using unchecked exceptions

Last paragraph told you HOW to use unchecked exception. Now I tell you the following: The worse your API design, the more you need to deal with these and the more you need to rely on Runtime safety rather than compile time saftey. UncheckedExceptions report problems which should not have been happened. The better your ApplicationDesign, the more of these can be converted into Compiletime problems.

Some tipps:

Don't use interfaces which throw UnsupportedOperationExceptions.. Make smaller interfaces instead and use inheritance! (Most common example: ReadInterface, WriteInterface extends ReadInterface). The JDK-Collections seriously got this one wrong!
Avoid Objects with complex state - and thereby the IllegalStateException. Use Immutables (more on this one later.)
Use IllegalArgumentExceptions heavily! IllegalArgs make your program more robust, and they are easy to avoid and should not occur at runtime. Throwing these early can save you great deal of work instead of searching for the reason why a NoPE or something occured!
Always use the exception if you have to. The last points told you HOW to avoid the situations. If you, however have a lot of states, you must throw that IllegalStateException or everything gets much worse. However, the sole fact that you have to do this should at least make you think about your API design.
Avoid Indexing on Collections. No Index Access -> No IndexOutBounds. Seems logical, right? Try to avoid indexing at all cost. Most of the time, indexing can be done simply better by designing specialised classes (instead of given each index position in a list or an array its own meaning), and by using the extended for loop. Starting from JDK 1.5, Lists should never ever be looped by their indices - arrays should be avoided anyway, exception for lightweight parameters.

Sonntag, 27. Juni 2010

Simpler Java #1: No more NoPE's!!

The NullPointerException ("NPE"-> "NoPE" for its fans) is for sure one of the best friends of the JavaDeveloper. Why?

Null is in Java the Contract for "optional" Values, values which do not need to be there. Example: java.util.Map.get(key) returns null if the key was not found.
Some people tend to extend this even further and use null as the default return if the function call fails (rather than throwing an exception).
Variables in beans are not always initialized when the default constructor is used, which results in other NoPE pitfall.
Sometimes, null is also used to say "nothing". The difference between one and this one is crucial: Optional values are KNOWN to be not there, "nothing" instead is a pit waiting for a fall. It requires the dev to test for null every time explicitly. This comes to its worst when used with strings or or collections, which have a "natural" null value: The empty string / collection..

In all those situations, either the caller (1) might forget to check something or the api is just screwed (2-4) and gives you null to enforce You to nullcheck and clean its garbarge up - more than often, you won't, and your app says (once again): "NoPE"!

What can / should be done in these cases?

For optional values, null is per se ok (since its a java-developer contract accepted by most people). However, you MUST declare it clearly in the api! And try to do it seldomly.
Never return null just if an error occurs. Use _CHECKED_ exceptions! Or throw unchecked exceptions if the problem should have never been occured at all. Never return null just as a fallback: If you do not use the exception way, the program will crash with a _NoPE_ anyway, so returning null does not really improve your situation here..
Try to use "immutable" beans (more on that later), or at least try to make as much values as possible ready by the constructor. All other fields with "natural" empty values (strings, collections) should be given their "empty" default. Never null. Again.
Speaking of empty values: Strings and collections may never NEVER NEVER ever be null. There is the default empty value. USE IT! Do not try to make a difference between "nothing" and "empty". The only appropiate answer your program will give you is "NoPE".
MAKE EMPTY CONSTANTS FOR YOUR OWN OBJECTS. And another one. All immutable beans (that is, all "VALUES") should be given an empty default constant implementation which can be used as a default return value rather than null. Decide when you need it. But most of the time a constant holding the "EMPTY" instance of your class WILL be an asset. There are always situations which you cannot really handle but where you do not want to "NoPE" your app into cyberspace.. There the default comes very often to your help.

So much for some short thoughts and experience on handling null. Burn this into your memory:
If you get null and the api says so, because its an optional value, and you NoPE: SHAME ON YOU!
If you get null and no one has told you: SHAME ON THEM!
In the second case, ALWAYS the called code must be fixed. A caller should never nullcheck unless there is a really good DOCUMENTED reason.
Nullchecking in business apps is not "hardening" your code: Its just means that you have no clue whats going on in the code you've called... (If its YOUR code and not some framework, that is.)

Donnerstag, 4. Februar 2010

What is a Scalable Language good for

Scala again. This time, I will not try to debate abouts its interns and wether its better or worse
than something. Rather, my intention is to find out where the usecases of the Scalability are.

Introduction
Scala is, in my eyes, a language which is pretty execellent at leaving nearly all powers in the hands of the programmer to use "his" scala with "his" preffered style. Unlike most dynamic languages (esp. perl), this however does not mean that scala is magic. Far from it! Instead, it just tries to look behind most of the syntax magic provided by those languages to tries to give a static-typed and logical equivaleng for it.
Those features which most languages have but which they do not offer the user at its full powers are: type inference, operator overload, switch-statements, control structures, closures, type parameters, ...

Offering this kind of power to the user seems strange when comparing this to the concept which lead to the restrictions of java in all these areas: Java contains nearly NONE of them, because java tried to do only OOP in a standard way with a standard method syntax. No operators, no closures, no confusion. And I have to say: After the mess into which C++ might have lead, this was a wise choice by Gosling, which I still think was quite clever at the time.

Java's choice of being simple is a real pain to experienced dev's, which tend to "outgrow" the language at some point of time. But since java contains all of the main aspects which are considered to be important for good coding, i.e. classes ;-), it is a language which is good to begin with. In other words: There are always new - and cheap - programmers available. Complex languages like C++ usually do NOT have this kind of comfort. They rather tend to scare the developers away. ;-)

So whats with Scala? Scala is ultracomplex, far more than C++ will ever be. However, it shines at hiding the complexity. A Scala beginner (coming from Java, I suppose) does not need to know any details about: Operator Overloading, Type Conversions and the depths of generics, sometimes even closures!
The interesting part: Even though he does not KNOW them, he will USE these advanced topics rather quickly: Why? Because + on Strings is Operator overloading (+ on Lists is as easy to understand), implicits are everywhere behind the scenes and generics are nothing else then some simple infos which must be given to know which elements this list holds.
Type Inference? Scary word. var a = "string"? Ok. Understand. Looks like a dynamic language.
And so on.

I you have missed my point: The brilliance of scala is that there are dozens of feature which novice programmers can just use, but do not need to understand. They can just treat them like language constructs and might be very happy the day they find out on how to tailor them for their needs. This GREATLY helps to smoothen the long and steady learning curve which would be needed to learn all of the features.
Also, sometimes the design was just very clever. Java still envies C# for properties sometimes. Scala says: Fields are functions out of the box. No property syntax needed. At least no keywords, since we _CAN overload the operators, right?

So... in my eyes, Scala really cut the boundaries between the advantages of java - easy to learn and understand at first glance - while also offering nearly limitless power when needed.

At least in theory. In pratice, the combination of implicits and closures can be a real PAIN to any novice programmer. And the interoperability with java is, unlike some sources state, actually worse than groovy's for beginners. This has three very important reasons:

Groovy is a dynamic language. Since java is static, it can call java code in nearly any way and there a no "conventions" and different semantics to consider.
Scala has its own collections. While this is good, as Scala is not java and its collections are very different, it really sucks that there are - in 2.7.7 no default conversions between the BASE datatypes enabled. And with "BASE" i mean no conversion for Iterator or at least Iterable. I cannot really understand this, since Iterable is basically a language construct for java and VERY important to support out of the box for scala for-loops. And there should be some easy ways to at least explicitly convert between scala and java in predef. This would greatly help scala beginners which will usually stick with java collections or at least use libraries which use them.
Scala's generics suck. Na, not really. But they DO suck when using them with java raw types. And believe it or not, this happens more often than not! In short terms: There are situations which are completely valid in java but just hideous in scala, as scalas generics are VERY strict and concise to prevent type errors. Problems in this corner usually arise when trying to store scala closures in raw java list or something like that. Seems strange but can happen. And will shred your brain to pieces AND consume a lot of time.

As for me, these two main points are actually the only real problems I faced when learning it. I guess some people will also add closures. Might seem strange, but I am thinking not of people which come to scala but those which probably HAVE to (from Java) since their ProjectManagement wants them to use it. And thats leads back to our main question:

When are Scala capabilities actually good for project? Where are the advantages:

Scala can be tailored to need. This can cause problems (confusing overloads and imports) but also greatly simplify the readability of code. General, this is an advantage for 2 reasons:

Framework Developers (at least open-sourcerers) tend to spend a great deal of time trying to make their API pleasureable and easy to use for programmers. Since scala offers more tools, apis can be learned quicker than in java, i presume. Consider Near-DSL like APIs like "specs".
The danger of the abuse of the language can usually be reduced by a good project manager. Business applications and the like do not really need to hop on the trend of functional languages to create 1-line transactions which are perfect but Write-Only. (this was was not fair...)
Of course, if the project manager is an idiot or does not re-read its programmers code and check for conventions, this will lead right into hell. But a bad pm is always a highway to hell, even without java. No real difference.

The power of the language will IMHO help developers to make progress in the art of programming much quicker than simpler languages. In Java, the cap when there is nothing new to learn is reached rather quickly. Since learning nothing is boring and there are real essentials still missing in java, this might have lead to the trend of some frameworks to include their own non-java DSLs. And countless xml-accents, of course. Happy learning. ;-)
Scala feels like a dynamic language, but it still offers the performance of a static language. Much more important is however that it also offers minimal type annotations. Dynamic lang fans might consider this overhead, but it might actually just reduce documentation overhead: A good scala source with meaningful names does not need documentation hints telling which params are expected for a function. Dynamic languages usually WILL.
Some features of dyn-languages are impossible for scala, of course. Others can be emulated via implicit conversions. For most users, i guess the only real relevant missing feature might be the legendary useful art of duck-typing. Its sad, but inner classes and interfaces have always been there to fight this problem.

Scala-Revisited

Currently, I am evaluating Scala again and compare it to other popular jvm languages, like Groovy, or even Java. My Question is: Is there a room for such a complicated, yet extremely powerful language like scala?

Obviously, i am inclined to scream: "Yes, goddammit! Give it to me! My toy". Hm.. Maybe I am exaggerating a bit lately, as most people do when comparing competition products. I REALLY like scala.

However, when being an experienced programmer, it is easy to be emotional about the personal "favorite" language, but there are usual very good reasons why My fav is not everybody's else fav. There are dozens of good languages out there which no one really use, and one seriously has to ask: why? One example: Eiffel.

I guess one of the main aspects behind this is the momentum which the larger programming communities create (like Java), but that one is easy: If you give a good platform which everybody uses, you will soon have libraries which make the language more popular for everybody and so on and so on... Luckily, the jvm platform, like .NET, defeats this problem: Scala can use Java-Classes without any problems.

But.. That is not the problem of scala. Lately, I am reflecting more and more about the "complexity" of the language and what it means for the average programmer (like me). Speaking along the lines of Odersky, Java is a cathedral. It hides some features from the user which had been known for abuse and complexity which "burdens" the programmers. Scala, on the other hand, is a bazaar: It offers nearly everything a programmer could think of as a language construct (even implicits). Effectively, it makes the task of choosing the right tool for the job harder, as they are much more tools to choose from (operator overloading, closures, implicits). Also, one has to be wary that not all programmers in the team might want to use everything the bazaar offers.

The java's cathedral advantage is that the complexity is lower, the bazaars advantage is the flexibility - with the additional weight of deciding which goods from the bazaar are fine and help, and which will prove to be a burden which is better left out.

What about groovy? Well, groovy I see as Java's "sidekick" - It resembles its original pretty much but is tailored for the situations where java's cathedral's vision are not approiate to solve a problem elegantly. (And in time..) Its a kind of "special force" or "inquistion" for situations where features are important.

So... my next logical step will be to compare java/groovy vs scala. Not based on features but on the intention to find out which targets they are trying to achieve, and how.