Recovery Oriented Computing (ROC) was started as a joint Berkeley/Stanford project in order to investigate novel techniques for building highly-dependable Internet services . Contrary to traditional fault-tolerance approaches in distributed systems, ROC doest not assume that total avoidance and prediction of failures is possible. The philosophy of ROC is to accept the fact that failures will happen, but to be well prepared to recover quickly. It emphasizes fast recovery from failures rather than complete failure-avoidance.
In large distributed software systems, failures are inevitable, and the exact cause of the fault or error is not always necessary for many recovery techniques. A simple method to recover quickly is to restart the corresponding components of the system: the server, the service or single EJBs. The goal of ROC is to create dependable systems and to reach higher levels of fault tolerance and scalability. The two principles of ROC are therefore
- Microreboot - selective restart of small system parts
- System-Level Undo - reconfiguration by rollback recovery