Tess Ferrandez has written a great post describing the questions she asks when troubleshooting a problem. Her nine questions are not only a great way to create a problem description; they’re also a great way to define events that occur in your application, and to write an operations guide that describes the events and what to do when they do occur. For example:
"What is happening?”
- What happened? The customer wish list feature was unable to retrieve the number of items in a customer’s wish list because of an unhandled exception: …
- When did the event occur? September 12, 2009
- What application did the event occur in? Web storefront
- Who was using the application when the event occurred? colincatjtleigh
"What did you expect to happen?”
- What was the application doing when the event occurred? Calling the Commerce Server 2007 orders API to retrieve the customer’s wish list basket.
"When is it happening?”
- What was the user doing when the event occurred? Loading a page that displays the quick cart link in the top navigation menu.
- …on what page? /Profile/MyAccount.aspx
- …on what server? EXT-WEB-12
- …in what environment? Production
- …etc? SiteName = “SweetDeal”; BasketId = {…}; BasketName = “WishList”
"When did it start happening?”
"How does this problem affect you?”
- How does the event affect the user or other users of the application? The user won’t see the number of items in their wish list in the quick cart link in the top navigation menu. The page will still load, however.
- How severe is the event? This is an error.
- What priority should be given to investigating and resolving the event relative to other events, and why? This should be medium priority. This may be a data issue affecting only this customer, but it may also be an issue with the orders subsystem that would affect other customers. This may prevent this customer from viewing, adding items to, or removing items from their wish list. This will not prevent this customer from using the core e-commerce functionality.
"What do you think the problem is? and what data are you basing this on?"
- What are some of the causes for the event? Invalid wish list basket data. The orders database being unavailable.
- What has historically caused the event? In most cases, this issue is caused by incorrect wish list basket data from old wish list baskets that were improperly migrated from R4.0 to R4.5.
"What have you tried so far?"
- What are some of the steps that should be taken when investigating the event? Confirm that the orders subsystem is functional. Export the customer’s wish list basket using the BASEXPORT utility. Inspect the exported basket to ensure it contains the correct data for a wish list basket.
- Where would information helpful or necessary to investigating the event be? The server event log. The customer’s wish list basket.
- What are the standard processes for resolving the event? If this is a data issue affecting only this customer, flush the customer’s wish list basket using the BASFLUSH utility.
"What is the expected resolution?"
- How can you determine whether the event has been resolved? The customer can add an item to their wish list, load a subsequent page, and see the number of items in their wish list in the quick cart link.
"Is there anything that would prohibit certain troubleshooting steps or solutions?"
It's easy to log events, but it's much more difficult to define events that are worth logging or that are useful. By asking these questions as you define events, and by documenting the answers in the events and in an operations guide, you can make it much easier for others to operate and understand your application.
Cheers,
Colin