Nov. 24th, 2011

jennyst: Jenny on a photo of space (Default)
As many of you are already aware, the most recent AO3 deploy did not go as smoothly as we hoped, and we’ve sometimes had issues on previous major releases. The big items are all fixed now, but it reminded me that I know a few places (both work projects at my day job and Dreamwidth) where we deal with similar issues. Here are a few ideas I have been thinking about, around the principles of managing incidents on an IT service.

Sometimes, when a technical group is trying to deal with a major problem or a code release that's gone wrong, management and task prioritisation is an issue. You have everyone putting out little fires with buckets, when actually it needs someone to go, "Wait, guys, this is a pretty big building and it's all on fire. I'm ringing the fire service - they have trucks with big hoses." But to do that you have to have one person let go of a bucket in order to pick up the phone.

The general part )

The AO3 part )

Profile

jennyst: Jenny on a photo of space (Default)
Jenny S-T

December 2016

S M T W T F S
     123
45678910
11121314151617
18192021222324
25262728293031

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated Sep. 20th, 2017 09:11 am
Powered by Dreamwidth Studios