Tuesday, September 18, 2007

Is the DAO pattern still valuable in the time of ORM?

The DAO discussion is surfacing again (previously), and there's one point I think is lacking from the current discussion: Testing.

It's a lot easier to mock or stub out PersonDao.getPersonByFirstName() method than EntityManager.createNamedQuery() blah blah blah. Don't mock infrastructure. Avoiding the database access is crucial to making your tests run as fast as possible.

The DAO tests themselves should access the database, of course, as well as the end-to-end functional tests. But running every test in the system against the database is simply uneconomical.

Friday, September 14, 2007

JavaZone 2007: The end

That's it, the conference is over. We can all go home, get some rest and wait for next year (which will probably be even bigger).

Totto, the javaBin's president, noted in his foreword to the JavaZone program that his inspiration came when he attended JavaOne back in 1996. It's really quite amazing how far we've come since then. One can only wonder where we'll be in a couple of years more.

If you arrived here from a another summary of my JavaZone posts, that summary was adopted from the previous blog posts. The blog posts are a bit unpolished (I wrote them live), but may contain a bit more information than the summary, so if you're can look there for details.

After a short holiday, I promise regular programming will resume...

Thursday, September 13, 2007

JavaZone 2007: How to build enterprise applications with Java SE

"The original title of this talk was 'How to make sysadmins love you'"

Trygve Laugstøl from Arktekk, a noted open source contributor and a self-described UNIX lover, talked about considering Java SE instead of EE for building applications.

The motivating thought was to make it easy to admin the solutions we make. By embedding services and making the app runnable from a script, it's a lot easier for operations to admin; they don't have to manage an app server.

This is not a criticism of Java EE, but to make the case for Java SE as an alternative. Java EE 5 is 23 different specs, you have to be a rocket scientist to know it all. Takes a long time to deploy to app server. Dev and production environments are different.

Much faster development cycle. No app server. Test only the bits you need.

Have a main class which starts the entire application, with getters/setters for command arguments (port, location of conf file, etc.). You can now start your entire application from a unit test. Use commons-cli for command-line argument parsing.

Trygve thinks using VMWare for local developing is a waste, but thinks you should still try to develop the app on the same platform it will be deployed on.

A lot of the new and useful technologies in JEE are now in JSE (JPA, JMX, Stax). XFire, Axis, ActiveMQ, Jetty, JNDI work great in JSE. You should embed the features you need. Think UNIX: small, specialized components. You can still include a servlet engine, but this time you control it, it doesn't control you.

Today: Java is the Cobol of the 90's
I'm hoping for: Java is the UNIX for the 90's

Use conventions for file system layout (like in UNIX). Package the app like a standard UNIX app (tarball).

Trygve also recommends to use the appassembler plugin to assemble the app and generating startup scripts (fixes classpath, etc.). Use a separate maven profile for assembly to reduce build time.

Make separate loggers for significant events which operations can use for debugging (connection pool empty, etc.) and pipe them to syslog (UNIX) or the Event log (Windows).

Solaris has Service management facility (SMF) which can integrate nicely with this approach. On Linux, you can create an /etc/init.d script.

Update: Apparently I misquoted Trygve his Java-as-UNIX analogy. Fixed now :-)

JavaZone 2007: Experiences on using Spring WebFlow with JSF

Holger Zobel from Accenture, who is leading the architecture group for the Norwegian Pension project, gave us a taste of the experiences from that project with integrating WebFlow and JSF.

Spring WebFlow is a very general framework, which can be used with a number of presentation frameworks (including Swing). Flows define a user dialogue.

The project has 40-50 developers and uses an IBM software stack and Myfaces JSF.

SWF 1.0 had buggy JSF support. Difficult to debug errors in flow XML.

SWF is more powerful than JSF flow, but power can be misused. Should be used only for flows. Free navigation is still difficult.

SpringIDE helps development a lot. Be sure to have good exception handling in your code.

Is it worth using SWF? In the beginning, it created more trouble than it was worth. Will probably use it for the next project, since he now knows how to use it. Can be used for small parts of an app, or introduced incrementally. SWF is not invasive.

The problems they faced were mostly from the JSF integration.

JavaZone 2007: Getting the most out of GC

"GC is automatic memory management; if it was really automatic, we should all really be at the pub instead of sitting here, innit?"
(My crap rendition of Holly's actually funny joke)

Holly Cummins from IBM at Hursley, UK (where they do CICS, MQ and some JVM things, I guess) gave some general info on GC and demoed EVTK (for which she is lead). I think this is the first time I've seen an IBM-er with blue hair :-)

Holly noted that GC can actually be faster for an app than manually using malloc/free. She quoted some numbers where C apps can spend up to 20% of their time in free. GC can also exploit cache locality better by rearranging objects, compaction, etc. Unfortunately, GC can also stuff the cache full of crap when it traverses the heap.

Mean pause time is not a good indicator of performance. Small heaps mean smaller pauses, but can also mean worse performance. Smaller pauses doesn't necessarily mean better response time; if the app has a heavy workload the response time will be dominated by waiting for the CPU. "Real time is not real fast."

What can verbose GC tell you? New IBM diagnostic tools for the IBM JVM: available as plugins to the IBM Support Assistant. Extensible Verbose Toolkit: verbose GC analysis. Actually handles verbose output from Sun JVM also! Creates reports, graphs, etc. Install the Support Assistant and find EVTK as a plugin.

Holly did a nice demo of EVTK showing how too small a heap can make an app spend too much of it's time running GC, throttling app throughput. And a second demo showing how to look for a memory leak. EVTK can show how many weak references are cleared, etc.

Nice overview of the different GC policies, etc. at the end, I'm too busy following it to make notes :-)

JavaZone 2007: Deploying Maven in the Enterprise

Kristoffer Moum from Arktekk gave us an entertaining overview of how to introduce Maven in a company.

A benefit of Maven is a standardized project structure, which means employees don't have to learn different set-ups.

Start with a company-wide Maven POM with repository references, deployment settings and dependency management.

Set up internal repositories: snapshot, release and third-party repos with Apache WebDAV; set up a Maven proxy which becomes a single entry point to both the internal and the external repos.

Kristoffer uses the Maven Proxy from Codehaus: simple to configure, flexible. Other implementations offer GUI configuration, etc.

All artifacts built by the CI server should be directly deployed to the internal repository (snapshots).

Project experiences

Transitive dependencies: poor third-party POMs add excessive dependencies. One solution is to graph the dependencies and manually exclude dependencies. For instance the maven dependency plugin with the 'tree' goal.

Implicit plugin versioning: version is not specified, so the latest and greatest is downloaded and used. This often turns out to be unstable, which may result in some strange behaviour. Solution: try to control plugin versions manually.

Redeploying released artifacts: causes the new release only to be deployed for those who delete it manually. This can create a really strange situation where half of the developers can't build the project. Should be illegal.

Transitive dependencies can give some problems when you get dependencies to different versions of the same artifact. Solution: add direct dependency.

It's not a good idea to introduce all of Maven at once, it's better to introduce it incrementally.

Can be tricky to find the right module granularity. Be pragmatic when dividing projects in modules; too few or too many are not good.

Everything should be doable through the IDE. Going through Maven every time will be slow.

Too many profiles can make the build hard to understand. Don't try to solve every problem with profiles.

Some good tips here. Maven seems to be used more and more in companies.

JavaZone 2007: Apache Tuscany

After a good (heh) night's sleep (double heh), another day of sessions awaits. Guys, two mornings in a row of Norwegian Rammstein covers is just brutal :-)

Simon Laws from IBM UK had a session on Apache Tuscany, a SCA implementation.

SCA is a model for distrubuted components implemented in any of a range of technologies (Java, BPEL, Ruby, etc.). Components can be integrated using web services, RMI, etc.

An app is assembled with a XML descriptor. It looks really easy to change component distribution. SCA can easily integrate with existing systems and services.

Tuscany also contains a SDO implementation for data binding.

The Tuscany distribution contains a bunch of demos which is probably the best way to start if you want to learn more than this miserable write-up (sorry, Simon, I'm really not doing your session justice).

SCA and SDO have been mostly acronyms to me till now. This session gave a nice, high-level view and rationale, as well as some nice Tuscany demos.

Tuscany blog

Wednesday, September 12, 2007

JavaZone 2007: End of day one

Day one nears the end, at least with respect to sessions.

"Blogging live" about sessions is quite different from normal blogging in that there's more of a deadline: You have to finish one entry before the next sessions begins, or you'll be backlogged. This means some of the entries are more like notes than any deep thoughts. My mind is split on this issue, but I'll try to continue my experiment tomorrow since I think the average quality of today's entries is OK.

Also, it might not always be perfectly clear what is said by the speaker and what is added by me. I'll make it simple for you and suggest you attribute any nonsense to me, and the rest to the speakers.

The ClubZone starts soon, with four different bars open for the attendees. Here's to hoping it will be easy to get up tomorrow!

JavaZone 2007: Style and Taste in Writing FIT Documents

This session had Steve Freeman (of jMock fame) and Mike Hill tackle common problems with FIT. One of the most interesting sessions today, I think.

Example-driven development

Examples force people to be concrete. The next step is to automate testing your system based on these examples. FIT is one way to do this.

The point of FIT is communicating with the customer. Other tools can be used, such as JUnit with a DSL which is easier for an analyst to understand.

Communication trumps testing

Copy-paste of a procedural FIT table can make it hard to understand what the difference is. Can be converted to declarative table where each row or column is its own example. It's harder to refactor FIT tests since there's no IDE support, this means you must be more disciplined.

Having a mega-table with lots of input and output columns ("Tangled tables") can be practical for professional testers, but doesn't communicate clearly. It can be broken up into different, smaller tables. Abstractions that developers come up with can feed back into the business language. Can also be easier to change smaller tables. As in XP, why is something a comment, instead of runnable code?

Web tests in FIT: should hide implementation details such as field names, "url" lingo, etc. Use a domain language which makes sense for your business instead.

FIT in practice

When you work on a project, you become expert on that project. Experts have their own common language. Build a language within your team to encourage a common understanding.

Make test failure easy to understand. The tests shouldn't just satisfy the business side, but the developer side as well.

Apply all good code practices to the tests also: refactoring, etc.

Collaboration problems can affect test quality negatively; if you're trying to avoid collaboration and skip refactoring because of it, you've got a problem.

Agile teams which adopt FIT documents as documentation can end up in a waterfall-like passing of FIT documents with no real dialog.

If the analysts are not very keen on FIT, try to publish the test result and the documents so it's easy to access.

Word documents vs. Fitnesse: Depends on the organization and the analyst users.

JavaZone 2007: Using Spring-OSGi in Swing applications

Nils Hofseth Andersen (Capgemini Norway)

The bundle is the main "unit of modularity" in OSGi. A bundle can register services, to be used by other bundles. A bundle is a jar with a special manifest inside. Bundles can have their own lifecycles. Isolation and visibility is achieved through classloaders. OSGi should be very interesting for the people who want to get rid of their app server, as well as thick client (or other standard Java) developers.

Spring-OSGi is a project started by Interface21 to help integration between OSGi and Spring. You get some special OSGi-tags for your Spring config.

Why Swing? If you already have some Swing parts, you don't really want to combine it with SWT in an app (different look 'n' feel, etc.)

Case from Statoil's Java Enterprise Framework. Uses only Swing. Phases of introducing OSGi:

Phase 1: Introduced modules at the code level. Each module has it's own Spring config. Merging these contexts caused several problems (naming collisions, startup time, etc.).

Phase 2: ModuleFactories introduced. Each module is a child appcontext of the app. App still has to know all modules.

Phase 3: ModuleContribution introduced as the interface between the app and the module. Each module packaged as an OSGi bundle.

Planning to open source the JEF framework when it's possible.

JavaZone 2007: Simplicity - Lessons learned from enterprise application development

BEKK's own Trond Arve Wasskog has a session on doing things the simple way. He has been an architect together with Johannes Brodwall on BBS' NICS project. NICS is a system for moving sums of money between banks.

Simplicity is not easy to define, but there are some ground rules: Fewer are better than many, etc.

Infrastructure

The project used two-phase commit to synchronize a database and a message queue. They needed load balancing and redundancy. The special setup only worked in special server environments, not on the developer machines.

They solved this by dropping JMS and using the database for messages. This required some glue code for sending and receiving messages, but meant they could drop 2PC. It also meant it was now testable on developer machines. As Johannes noted in his talk, they eventually decided to drop WebSphere itself.

Trond Arve thinks the app server itself is a dated concept, and would prefer to mix-and-match features. He now uses Jetty, Hibernate and Spring in his projects.

Process engines are evil! They never work: You need a lot of new knowledge to develop, deploy, support them. In their engine they would need VB-like scripting on misc steps in the process flow. By removing it, they saved a whole lot of complexity.

Deployment: why is it so hard? As Johannes noted, their new application model makes deployment a breeze.

Operations: work early on with IT operations staff.

Design

They designed their own scheduling system using DDD, which turned out only 1 of 5 projects could use successfully. In the end, they opted for a much simpler solution: An existing scheduler which uses JMX on the application. Easier to test since you no longer have to rely on an EJB timer.

Documentation

Process vs. product. Documentation in itself has no value, the communication is the value.

Architecture documents tend to only be shared between architects. Trond Arve used an approach where he intentionally didn't update his architecture documents, but when people asked if they were updated he'd rather take a personal discussion: "What are you trying to achieve?"

Functional testing

User stories should generate functional tests, not unit tests. The spec should mirror the tests. Tools to use: FIT, RSpec, etc. Should be human readable (unlike JUnit :) ).

Instead of a Windows-based automatic test application, they used Fitnesse, which more people knew. It was easier to automate.

Closer integration between testers and developers. No artificial separation.

Technical tests

JUnit test from Hell: A 700-line test case, impossible to understand. When the process engine was removed, the test shrank to a fraction.

No code means no maintenance cost and no bugs.

Development environment

Use a pre-packaged Eclipse version with a number of plugins, etc.

Use VMWare to duplicate complicated environments.

Configuration management

Johannes described the simple application model they now use. It makes it really easy both to deploy and to roll back.

JavaZone 2007: Real-time web

Jonas Jacobi, a Swede with background from Oracle, has a new company called Kaazing which focuses on real-time solutions for the web.

Comet is a software concept that enables web servers to send data to the client program (normally a web browser) without having any need for the client to request it (quoting Wikipedia).

Polling vs streaming: Polling means more control for the client, but also a lot of load on the server. Streaming (long-timed requests)

Traditional AJAX apps means polling. The server has no way to notify the client if the client doesn't poll.

Jonas compared polling to a child asking "Are we there yet?" a gazillion times :)

Firewalls mean client-server breaks. AJAX and Comet works.

Comet means instant response. That's why traders normally use desktop solutions.

Most browsers (including IE and Firefox) will only open two connections to a server, which is a challenge for Comet. This means we need common request handling.

Both Jetty and Glassfish support the Bayeux protocol. Tomcat 6 will support Comet.

They are trying to get support for Comet in the servlet 3.0 spec. Comet is not currently a standard.

Jonas, with Enterprise Comet, considered translating Java source to JavaScript for the client, like in Google Web Toolkit, but there were upgrade issues, etc. They started Project Chai, a VM which dynamically translates Java class files to JavaScript on the fly, which means there are no static JavaScript files. The generated JS is then sent to the client.

Jonas showed a nice online poker demo in Firefox with Comet on the server.

The main benefit of Comet is the ability to scale (unlike traditional approaches to push).

They are considering open source (discussing with Terracotta and their experiences), but they have to please their investors. They don't necessarily want to make money on the Project Chai engine, but whatever they could build on top.

JavaZone 2007: The Sun JVM inside and out

Simon Ritter (Sun) took us through the myriad of options that can be used to change the behaviour of the JVM. From 140-something XX options in Java 1.4 to 532 for the newest Dolphin.

GC

useTLAB: thread local

TLABSize: size of thread local allocation blocks

UseParNewGC: parallel copy collector

Parallel compacting collector

Incremental CMS

UseParallelGC: parallel scavenge

Ergonomics

According to Simon, the heuristic the JVM uses include checking for 2 or more GBs of memory, 2 or more processors and whether or not the machine is running Windows :)

Thread priority policy

Increasing threading options for Java can, as Simon says, be "very, very good or very, very bad". Depending on the other workload on the machine.

My vote for best named option: DontYieldALot. Which on Linux means DontYieldAtAll.

Biased locking

UseBiasedLocking: "lazy" lock/unlock for uncontended locks.

Tiered compilation

TieredCompilation: switching between the client and server JIT compiler. I recommend Alex Millers Java 7 page for details.

Random things

Consider disabling explicit GC: Even though the Java docs says that calling System.gc() will cause a GC to happen some time in the future, in Sun Java it will actually run a GC every time. Who knew!

AggressiveOpt: More aggressive optimizations, may change between versions.

Use486InstrOnly: Use only old instructions

DTrace

DTrace which "injects instructions into your kernel, which is pretty scary."

Lots of Java probes available.

Proposal for new Java SE 7 functionality: possible to enter your own DTrace probes, which you can then turn on when you need them.

I'm sure I missed over half of the interesting options Simon mentioned, was a bit busy with other stuff. It was an interesting sessions since I've heard of some of these options, but had no idea there were so many. Nice explanations as well.

Simon Ritter's weblog

JavaZone 2007: Rational decisions when choosing methods and tools

"We humans like to analyze, yet we don't trust the result. We trust the result of our gut feeling, but not the process."

(Norwegian talk, any errors in the translation are entirely mine)

Magne Jørgensen is a research scientist at Simula Research. I've seen an earlier session of his on estimation, where he showed how wildly estimates can vary based on things like line spacing in the design document, etc. Interesting stuff.

Magne's method is based on the scientific method, made more practical for the real world. Similarly to his earlier session, it's about what our decisions are influenced by, both important and unimportant things. He's written about "Evidence-based software engineering", where you try to base your decisions on facts backed by evidence. This doesn't sound like rocket science, but it's harder than you think to not be influenced by inconsequential stuff. As humans, we rationalize.

The session started (as usually) with an exercise. Everyone in the session tries to answers some questions, which are used later in the session to illustrate the main points.

Use reliable sources. Google Scholar, not Google.

It's very easy to rely on anecdotal evidence instead of hard facts. If science says a medicine is ineffective, but your neighbour says it worked for him, it's easy to believe in it.

Don't trust reference customers: They are not a statistically valid sample.

Use benchmark tests: Allow the supplier to solve a problem for you. Pilot projects can be misleading, but better than nothing. Team members are often top motivated and highly skilled.

Magne used a Kent Beck paper as an example. He finds that people normally read this as prose, from beginning to end. He suggests that readers should skip e.g. the start of such a paper, which normally is about building relations with the reader and laying the groundwork for the writers credibility. If a reader skips these parts and skims the rest for arguments and claims he is much more likely to not be influenced by unimportant things.

What's important is to avoid irrelevant information which can influence us, even though we think it doesn't.

JavaZone 2007: Farewell to the application server

"We have turned the app server inside out"

Despite some technical difficulties (both a malfunctioning PC and a projector; Murphy still lives), Johannes confidently took the stage and outlined his talk: Where he works WebSphere was the chosen app server, however his department discovered that the act of deploying apps was grinding development to a halt.

Johannes is lead software architect for BBS, an IT company owned by Norwegian banks. His approach seems like a pretty revolutionary way to do things, and to talk banking people into doing it this way must be kinda difficult.

Johannes introduced the concept of continuous deployment: continuously deploying and testing applications. He described the technical environment they used for this: Standard Java, Maven for build and deploy and Jetty, which is light-weight and integrates nicely with Maven.

The application model is really simple: They zip up all the code and the external jars and ship it over to a server. It's completely self-contained. They have their own Main class, which means its really easy to start from an IDE or wherever.

They didn't dare to upgrade their WebSphere install, since this caused all kinds of old apps to stop working.

Is Jetty ready for deployment? Johannes quoted NetCraft numbers which showed that Jetty is almost as much used as Tomcat.

Bureaucracy regarding open source software. They use pre-approved licenses which have been vetted by the company lawyers. This made introducing Jetty a lot easier.

After ad-libbing his talk for 20 minutes, the demo was suddenly up and running! And there was much rejoicing.

He showed how they use one bash script (install.sh) which would pull down and install an app on a server. As I understand apps are fetched from the Maven artifact repository.

When installing apps, they simply change a 'curr' (ent) symlink and restart the app. This is an approach taken from Capistrano in the Rails world.

An app has exactly one configuration file: this says which port to run the server on and which database to use. All other configuration elements are stored in the database.

Debugging is really easy to run in the IDE, it's just a main() method.

They use the Maven app assembly plugin for packaging zip files and generating startup scripts. The zip file then contains everything needed to run the application, except the JVM.

cron is used for simple CI. One cron-job is run every minute and checks for new versions, installs and runs it. The entire approach seems really UNIX friendly; it relies on several UNIX features and plays well with them. Johannes later noted that the symlink approach is simply not used on Windows. On Windows they also have to install the app on the root catalog so that the classpath won't be too long.

They chose Jetty over other app containers since they were familiar with it and they didn't need the more advanced features of app servers.

Check out Johannes' blog, I think he's written about these concepts there before.

These notes are taken live, so they may be rambling a bit; you'll have to humour me.

Tuesday, September 11, 2007

Blogging from JavaZone 2007

The moment is here: the annual JavaZone 2007 conference is just about to start here in Oslo (Norway) as of Wednesday, 12th of September. I'm volunteering as usual, but I'll try to blog as much as I can: about the sessions, the food and the feel of the conference.

The conference

JavaZone is one of the biggest Java conferences in Europe. This year, over 2000 tickets have been sold, maxing out Oslo Spektrum, the biggest music hall in town.

While a lot of the sessions are in Norwegian, there are a number of international speakers coming, like Jürgen Höller (Interface21), Cameron Purdy (Oracle), Linda DeMichiel (EJB spec lead), etc.

BEKK's own (inimitable) Aslak will give a session, as will Tom Bang, Trond Arve Wasskog, Thomas Heiberg, Anders Østen, Per Mengshoel, Eivind Waaler, Frithjof Frederiksen, Lars Lunde and probably some other I forgot. Quite a lot of sessions, anyway.

The sessions

I'll be filming the sessions in the room Gate 4, so I'll mostly follow the sessions there. One of the sessions I'm excited about is Johannes Brodwall's "Farewell to the application server", about going back to the basics and build apps without a complex Java EE environment.

Magne Jørgensen then has a session on choosing methods, techniques in tools. He is a researcher, and last year he held a very interesting session on estimation.

Trond Arve Wasskog has a session on simplicity, continuing in the vein of Johannes's session.

Also there are sessions on ServiceMix, Spring-OSGi, FIT and Comet.

We are just about to start, so stay tuned!

Monday, September 10, 2007

Benefits of optional type declarations

Ola Bini wonders if optional type declarations have a place in Ruby. He notes that Common Lisp programmers can use compiler directives to assist the compiler. This reminds me of a good, old Knuth quote (and not the one about premature optimization):

For some reason we all (especially me) had a mental block about optimization, namely that we always regarded it as a behind-the-scenes activity, to be done in the machine language, which the programmer isn't supposed to know.

...

The programmer [...] will write his beautifully-structured, but possibly inefficient, program P; then he will interactively specify transformations that make it efficient. Such a system will be much more powerful and reliable than a completely automatic one

--- D. E. Knuth, Structured programming with go to statements

The standard approach to performance problems in Ruby seems to be rewriting modules in C. What if we could have some of the performance gains of C in an easier-to-maintain Ruby dialect. I understand Squeak uses a similar approach, with much of the VM written in a Smalltalk subset. The code can then generate a binary image or be run in emulated mode on an existing Squeak instance.

The human side

Humans can also benefit from declaring types. While omitting type declarations is generally touted as an advantage of dynamic languages, some have commented that it can make code harder to read:

One day I found myself trying to follow some well-written Ruby code. I found the lack of type information on parameters made life difficult - I kept saying to myself 'what exactly do I have here?'

--- Martin Fowler, DynamicTyping

Other languages

It's interesting how the line between dynamically and statically typed languages increasingly blurs. Java now has several proposals for type inference. Python 3000 has a PEP for function annotations. It will be interesting to see how different language implementations continue to influence each other.