Software Fault Tolerance Techniques and Implementation

Product Description
This innovative resource provides the most comprehensive coverage of software fault tolerance techniques to guide professionals through design, operation and performance. It features an in-depth discussion on the advantages and disadvantages of specific techniques, so practitioners can decide which ones are best suited for their work. The book examines key programming techniques such as assertions, checkpointing, and atomic actions, and provides design tips and mode… More >>

Software Fault Tolerance Techniques and Implementation

1 reply
  1. Dmitry Dvoinikov says:

    Read this book.

    I mean, read this book if you need to get your hands on software fault-tolerance, one of the big things that make the software you write better.

    The book shows you the techniques you could use so that your programs keep working in presence of failures. In few words it’s all about having redundancy in software – different variants of software components doing same things which then somehow reach consensus on what’s right and what’s not.

    The basics are simple, it took me two days in the middle of reading to write a solid class framework in Python covering most (if not all) the discussed techniques. And you know what ? It worked !

    On the other hand, the basics may be simple, but the details are not. There are quite a few interesting details that you wouldn’t guess beforehand.

    The author finds the right balance of explaining the stuff so that it’s simple to understand but detailed, mathematically rigorous and technical enough.

    The author has her doctorate thesis on the subject. The funny thing is though, the chapter that has to do with her doctorate thesis area of research (5.3.2. Two Pass Adjudicators and Multiple Correct Results) explicitly leaves the topic unexplored. It starts like “there is 9 kinds of this stuff, let’s discuss the first” and you sit deeper in your chair ready for a big thing to come, but after explaining the first flavour it suddenly goes “ok, you’ve had it, the rest can be found in [reference]”, where reference points to the thesis. This is not bad, because you’ve already got the basic idea, simply amusing.

    The book is overly redundant to be read from it’s first page to the last, and this fact is clearly mentioned in the introduction. Upon reading about a half I found myself skipping pages saying “yeah, yeah, I got it already”. For example, among the discussed say 30 techniques, each can be such and such, and then such and such, leaving 4 buckets in which each individual thing can fall. Moreover, these taxonomies correlate, making just about 3 buckets. But each technique’s advantages, performance problems etc. are described in full again and again, so it goes like “A is B and as such it inherits all B’s pro’s and con’s”. Simple, but it’s two pages worth of text, and a large table. It may be good for reference, but not for thorough reading.

    Same thing with samples – the sample for a simple technique could be couple a pages of text plus a flowchart diagram plus a table with a state diagram. Very very very very very clear already. But then again, it’s pleasant to have such a clear book for a change.

    Anyhow, if you are a software developer interested in building high quality software (as you should), and you need an introduction to software fault tolerance, this book should not be missed.
    Rating: 5 / 5

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.