Technology Profile

Crash Recovery Improvements in Non-Volatile Main Memory (NVMM)

Description

Abstract

Researchers at the University of Central Florida and North Carolina State University have developed a way to reduce the execution time and write amplification associated with restoring data from non-volatile main memory (NVMM). Current crash recovery solutions use logging or checkpointing to provide failure safety to applications. However, these solutions are for volatile main memory and non-volatile disks, not NVM-based systems. As a result, they incur much higher execution time and write endurance overheads. Existing technologies also require specific hardware support or instruction set architecture (ISA) support to recover NVMM-stored data. These forms of support are not readily available in most machines today.

In comparison, the UCF technology provides unique data writing/backup methods to effectively use persistent main memory so that data recovery after a crash is faster and more accurate. The approach uses two main methods: recomputation and lazy persistency (LP). This combination avoids the need to expend large amounts of energy to rewrite lost data. Companies can rewrite their software and run them on any hardware platform to obtain the system recovery benefits.

Technical Details

The UCF invention comprises methods for accelerating program execution on NVM while at the same time reducing the number of writes. Included are steps for organizing a set of instructions into multiple regions. At least one of the regions is a recovery unit, and another is an error checking unit. The recovery unit includes written data to be transferred to NVMM, while the error checking unit summarizes the written data into a value.

One key aspect of the invention relaxes requirements for data consistency in logging and checkpointing schemes. Instead, it allows data to be in an inconsistent state during some phases of a program's lifetime by only logging enough state to enable recomputation. When a failure occurs, the approach recovers to a consistent state by determining which parts of the computation were incomplete and then recomputes them. Another aspect is the use of LP, a software persistency method. LP exploits the natural cache evictions to provide persistency without the need to eagerly flush cache blocks from the cache to the NVMM. Thus, the technique allows caches to slowly send dirty blocks (that is, modified and unsaved data) to the NVMM through natural evictions. Software error detection mechanisms (checksums) enable the system to discover persistency failures. Compared to the state-of-the-art Eager Persistency technique, LP reduces the execution time and write amplification overheads from 9 percent and 21 percent to only 1 percent and 3 percent, respectively.

Stage of Development

Prototype available.

Benefit

Provides near-zero execution time overhead and write endurance overhead

Works on any hardware platform without requiring any changes to the hardware and ISA

Eliminates the need for additional writes to NVMM while maintaining write endurance

Market Application

Emerging NVMs

Software development at the library levels to provide failure recovery to existing code

Loop-based kernels used in scientific computing

Brochure

Research Terms

Engineering Information Technology Physical Sciences Electronic Equipment and Devices

Researchers

Contact Information

Raju Nagaiah

raju@ucf.edu

(407) 882-0593

Websites

Tech Transfer Office

Brochure

Crash Recovery Improvements in Non-Volatile Main Memory (NVMM)

Description

Abstract

Benefit

Market Application

Research Terms

Researchers

Contact Information

Websites

About

Discover

Connect

Keyword Search

Browse by STEM