Abstract
Researchers at the University of Central Florida and North
Carolina State University have developed a way to reduce the execution time and
write amplification associated with restoring data from non-volatile main
memory (NVMM). Current crash recovery solutions use logging or checkpointing to
provide failure safety to applications. However, these solutions are for
volatile main memory and non-volatile disks, not NVM-based systems. As a
result, they incur much higher execution time and write endurance overheads. Existing
technologies also require specific hardware support or instruction set
architecture (ISA) support to recover NVMM-stored data. These forms of support
are not readily available in most machines today.
In comparison, the UCF technology provides unique data writing/backup
methods to effectively use persistent main memory so that data recovery after a
crash is faster and more accurate. The approach uses two main methods: recomputation
and lazy persistency (LP). This combination avoids the need to expend large
amounts of energy to rewrite lost data. Companies can
rewrite their software and run them on any hardware platform to obtain the
system recovery benefits.
Technical Details
The UCF invention comprises methods for accelerating program
execution on NVM while at the same time reducing the number of writes. Included
are steps for organizing a set of instructions into multiple regions. At least
one of the regions is a recovery unit, and another is an error checking unit.
The recovery unit includes written data to be transferred to NVMM, while the
error checking unit summarizes the written data into a value.
One key aspect of the invention relaxes requirements for
data consistency in logging and checkpointing schemes. Instead, it allows data
to be in an inconsistent state during some phases of a program's lifetime by
only logging enough state to enable recomputation. When a failure occurs, the approach
recovers to a consistent state by determining which parts of the computation
were incomplete and then recomputes them. Another aspect is the use of LP, a software
persistency method. LP exploits the natural cache evictions to provide
persistency without the need to eagerly flush cache blocks from the cache to
the NVMM. Thus, the technique allows caches to slowly send dirty blocks (that
is, modified and unsaved data) to the NVMM through natural evictions. Software
error detection mechanisms (checksums) enable the system to discover persistency
failures. Compared to the state-of-the-art Eager Persistency technique, LP
reduces the execution time and write amplification overheads from 9 percent and
21 percent to only 1 percent and 3 percent, respectively.
Stage of Development
Prototype available.
Benefit
Provides near-zero execution time overhead and write endurance overhead Works on any hardware platform without requiring any changes to the hardware and ISA Eliminates the need for additional writes to NVMM while maintaining write enduranceMarket Application
Emerging NVMs Software development at the library levels to provide failure recovery to existing code Loop-based kernels used in scientific computing
Brochure