Friday, November 13, 2009

Do your Iterators always fail-fast?

The iterators of the Collection implementations of the Java runtime throw a ConcurrentModificationException[1] when they detect that another thread has modified the Collection while a thread is iterating over it. Such iterators are generally called fail-fast iterators.

Looking at the implementation one can see how this has been made possible. The Collection subclass maintains an integer modCount that is incremented on every operation that structurally modifies the collection (like add, remove, clear). The fail-fast iterators also contain an integer field expectedModCount which is initialized to the modCount while the iterator is created. Later on, during every iteration, the iterator verifies if the expectedModCount is same as the modCount of the collection it is iterating over. A mismatch means that the collection has been modified during the life cycle of the iterator and a ConcurrentModificationException is thrown. The same happens during the collection serialization process. The modification count prior to writing to the stream should be same as the count after writing to the stream.

But is this behavior guaranteed? Are these iterators always fail-fast? I used to think so, until I saw the declaration of the modCount field in the Collection implementations.

The modCount field is not marked as volatile. This would mean - while a thread makes structural changes to the collection and increments the modCount field, the other thread that performs the iteration or serialization might not see the incremented value of the modCount field. It might end up doing a comparison with the stale modCount that is visible to it. This might cause the fail-fast iterators to behave in a different way and failing at a later point.
The field is non-volatile in Apache Harmony too.

A bug[2] in Sun Java explains this. It says the modCount was intentionally made non-volatile. Making it volatile would have added contention at every collection modification operation because of the memory fencing that’s required.

Also the Javadoc clearly says “fail-fast behavior cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification”.

This behavior would be the same even on the Synchronized wrapper's of these collections.


Wednesday, July 15, 2009

Before You Say "It's Not Working"

A few days back a couple of interns working in my office came up to me and said "It's not working". They were talking about a part of an application they had built. Their application was invoking an API given out by my team and it was throwing an error. They were invoking one of our Web Services through a client side Java API provided by us. Their app was running on Tomcat and that was accessing our product that runs on another Tomcat server. So they have access to both these servers. After we exchanged a few IM messages we solved the issue - but I am sure that if they once again face any issue, they would not immediately ping and ask "Why is it not working?".

After we figured out the issue, I was thinking of sending a note to the interns on the first step they need to do when they face any such issue and I ended up writing this.

Check the Logs. Repeat- Check the Logs.

From what I have seen 99.99% issues get solved the moment you Check the Logs. And Yes, many people don't do that before they report the issue. In the present case, the WS client in the SDK was throwing an exception and it said "An Error occurred with the client". So when they were seeing this, they actually could not make out anything. But that was at the client side. The first thing anyone should do is check the server logs. Server logs in this case would mean the Application logs on the App Server where the service is hosted and the Application server's own logs.

Many times you see that the logs are way too much to deduce anything. They run into pages. And I have seen, this very fact demotivates any newbie who looks at it. In that case, If the issue can be replicated I suggest -

  • Shutdown the server and client
  • Backup your logs
  • Cleanup the logs directory
  • Restart the servers and reproduce the issue

Now the logs are much lesser and comfortable to read. Also it would be helpful if you can clearly separate the server start-up logs from the subsequent logging that happened during the issue replication. Now read that line by line and most of the times the issue would be solved. If not, read every line of the logs few more times. Yes, Few more times.

Still unable to figure out?

Check the logging level

Every logging framework provides you various logging levels, that you can configure. Most of them have a "debug" logging mode, which when enabled- logs the most granular details. Enable this level of logging on the application and the server. Well written applications log their code flow on to the debug level. Hence with this level of logging set, there is a very high chance that you get to reach the root of the problem.
Even if you are not able to figure out the issue even after looking at the debug level logs, you still have helped the person who has to take a look at them. You have reduced the turnaround time. These debug logs will definitely help the API/Service developer who is going to look at your issue. He now has most of the information with him.

Not everything on the Server - There's client too

Most Apps are like this - You write a client program that accesses a service running on a server. The Application that runs on the server would also provide client side API's that would do the task of marshalling your arguments and passing it on the wire through SOAP/RMI etc. A whole lot of things happen between the time you invoke the client API and the time the message is sent on the wire (Goes out of this VM). The same is the case when you get the return message - from the time the message comes through the wire and till the time it's handed over to your client program, in the format it accepts. Typically these are marshalling/unmarshalling tasks carried out by the client side API's. Now the concern is - what happens if something goes wrong here?

Most times we ignore the logging/trace on the client side. Yes, on the client side too you could configure the logging level. The way you do this depends on the logging framework that the API you are accessing uses. Most times you would have to place a logging config properties file in the client classpath. The logging framework on the client API that you use, loads this file and determines the level of logging and the appender/handler to which logs have to be written.
So once you have the client side logs, you would be able to figure out what really was sent from the client, what it received and other info related to marshalling/unmarshalling etc. One suggestion that I have here - In case you have access to both the client and server ensure that both of them have the same system time. With this, you could easily track the flow from client to server and the way back with respect to time.

All these really help even if you are debugging an issue on something that you have developed or are developing - Something that can be done before debugging your code by placing breakpoints or even worse - by adding those System.out statements.

Too many threads?

Imagine the case where you have too many threads in your application that are doing their work simultaneously. The log records might be jumbled together. In such a scenario, to be able to uniquely identify and track the flow of one particular thread, you could have an identifier in the log record that contains the thread name.

Logging only failures?

A lot of times when your application is running on production, it would not be a very good idea to log all of the threads on the server side. You would be interested in logging only the failures. In such a case, what you typically do is - Accumulate the log records for each thread and at a later point of time depending on the success or failure of that thread, you decide whether to write its records to the log files or not. Note that these records are in the memory for the whole duration of the operation - and that could be quite expensive.


Most application servers would have their own logging framework which logs the tasks performed by the server. In certain logging frameworks you need to tell the framework when to start and stop logging on a thread. So when the server assigns a particular thread from a pool to your request it would do the job of telling the framework to start the logging. So If you explicitly create threads on an application server, these threads might not be logging information at all, though your code might have log statements.

Another important thing to note is not all issues that you encounter will get replicated even one more time. A lot of times I have seen that the moment you change the logging level or add a new log statement the issue goes away. Often this the case with issues related to multi-threaded programs, where even one new instruction would change the code dynamics or timing.

Solved. What should I do?

If a log information really helped you to better understand an issue, you then have a very important learning. You too should "Log" in your application. Many times I end up spending more time determining what should be logged and at what levels than the time I take to write the code. After all, logging is an art.

Update: Also appeared at

Thursday, May 21, 2009

"My Location" feature on Non GPS-Enabled devices

These days I often use the routing and "Get Directions" of Google Maps on my Window Mobile. And Yes, It works in India too to a great extent. One feature that always surprises me is "My Location" - Auto-detecting my geographic location. Simply because my mobile does no have a GPS receiver.

So how does Google find out my location to a precision of 500 metres (sometimes 5KM when I go for a drive outside the city)? When I used to talk to people about this the unanimous answer was "Google finds it from your mobile signal".
I too was under the assumption that Google Maps, through your handset asks your mobile phone operator the geographical co-ordinates of the tower from where it's catching the signal. Simple huh? But not.

On reading I found out that the Mobile operator does not say anything about the location coordinates of your cell tower or your device. Google does a trick here.
Each mobile coverage cellular area (remember the Hexagon shaped cell) has a number attached to it (Some kind of serial number to uniquely identify that hexagon shaped cell). Google then uses a database that maps this cell number to geographical coordinates.

But how does Google get the coordinates that fall under that cell?

To do that, Google uses other GPS-enabled devices that are active in your cell area. It collects data like geographical location and service cell ID from these anonymous devices. At the end what Google has is an expanding database of service cell ID (Vs) set of coordinates that fall under that cell.

So that's how the "My Location" in a Non-GPS enabled device is a bigger circle and Google says its your location approximated to some specific metres. It also explains why Google does not show it at all at some remote places when I travel on the highway outside the city. Looks like nobody has gone there and activated their GPS reciever and "helped" Google :-D

Monday, February 16, 2009

Browser - The new desktop?

Everyday as I use Google talk, I wonder "Why are they not coming up with a new version? It's been almost two years now and every month they add something new to the web based talk." Be it invisible mode, conference chat, video chat or even emoticons - the web gadget has it all.

Whenever I read about a new feature in the web version, I make up my mind to start using the talk gadget or Gmail chat. I login through the gadget, start chatting with people, in parallel do my work, and at some point when I do an alt+tab to bring the browser to focus, I see my friends irritated - I have not seen their messages for the past 30 minutes :-(
Yes, those notifications are something I miss the most in the web version. If you minimize the browser, while a different tab was on focus, you would not be able to see the blinking Google talk tab, when someone messages you.
For me, when it comes to an instant messenggers, the usability is much higher when it sits on the desktop.

And the need to use Google talk - All my friends have moved to it. Otherwise I still prefer Yahoo! Messenger. Thank God I am not a Linux user - they don't even have the desktop version on Linux. Sigh.

After seeing Google Chrome, my doubts of Google never again releasing a new desktop GTalk version has gone up steeply. The new home page, the new tab page, the very idea of application shortcuts, the browser task manager etc. look a lot like Google's attempt to make the browser the new home of the PC.

Does Google consider browser to be the next desktop? Sometimes it looks like; but remember - Microsoft Sells and would continue to.