This was fixed in version 12.24 (back in 2021) according to the version history page[0], but the current version still uses "eval" in several places[1]. This seems like an unnecessarily dangerous approach – wouldn't it have been a good idea to fix all the instances in the codebase after the bug was discovered?
> In Perl, eval can be used with a block to trap exceptions which is why it was being used everywhere.
...which your search doesn't seem to exclude (I'm not signing up to Github to find out so I'm just guessing from your URL that you didn't exclude "eval {" or "eval q{")
It does still use eval on occasion and they seem happy with it.
* for their own controlled expressions to customise base parsing behaviour (which could be refactored to use function references instead, but it's not a case of evaluating external data)
* to test if modules can be imported (which you've excluded from your search)
and in one case in CanonRaw.pm, using eval on an expression that matches [0-9./]+ , I'm not sure it looks at external data but it might, and at best you could cause a divide-by-zero error (e.g. 1/0) or syntax error (e.g. ///) if you mess with the data it parses.
Your observation is proper in that it did not exclude the patterns you mentioned, but in this specific case not necessary because all 4 results are shaped the exact same way:
lib/Image/ExifTool/Exif.pm
#### eval Start ($valuePtr, $val)
my $newStart = eval($$subdir{Start});
unless (Image::ExifTool::IsInt($newStart)) {
#### eval Base ($start,$base)
$subdirBase = eval($$subdir{Base}) + $base;
}
lib/Image/ExifTool/MakerNotes.pm
#### eval Start ($valuePtr)
$newStart = eval($$subdir{Start});
}
#### eval Base ($start,$base)
my $baseShift = eval($$subdir{Base});
# shift directory base (note: we may do this again below
#### eval OffsetPt ($valuePtr)
$ifdOffsetPos = eval($$subdir{OffsetPt}) - $dirStart;
}
You're right that excluding "eval {" is not necessary because it only occurs in the already excluded subset, but that search link is missing a lot of matches since the space character in my pattern has turned into a plus sign in yours. Try this: https://sourcegraph.com/search?q=context%3Aglobal+repo%3A%5E...
There are also some cases (excluded in this search) where a charset parameter is passed around and eventually passed to eval in the LoadCharset function, e.g. from the RTF parser through the Recompose method. Not sure if that's always safe.
Catching exceptions from eval-ing untrusted code doesn't strike me as being a big help from security POV, or is there something else special about the block form of eval that helps here?
Indeed! Arbitrary text was smuggled into the eval() statement through a weak input sanitization step:
The second quote was not escaped because in the regex $tok =~ /(\\+)$/ the $ will match the end of a string, but also match before a newline at the end of a string, so the code thinks that the quote is being escaped when it’s escaping the newline.
This is why I hate using regexes, or deciphering other people's code which uses them. The syntax just isn't obvious. We need tooling that removes mental overhead, not adds to it. I've always found plain old procedural parsing code easier to formulate, trace and reason about.
I'm not some some fanatic preaching they should be banned from all languages, but I do feel there are places they're inappropriate and input sanitization is one of them. It's also good practice to centralize this sort of logic in some flavor of escape() function to reduce the whack-a-mole when bugs like this are found.
> "will match the end of a string, but also match before a newline at the end of a string."
Is it just me, or is this a poor description? I understood $ to be an anchor for end of line, not end of string, i.e. "match before a newline anywhere". The emphasis "newline at the end of a string" feels misleading; the exploit works because the matched newline isn't at the end of the string and there is exploit code after it.
> "I've always found plain old procedural parsing code easier to formulate, trace and reason about."
A regex is a domain specific language which lets you express a lot of computation in a small amount of code. At the extreme, you're saying "I can write a regex engine easier than I can write a regex". "I can write assembler easier than I can write C". "I can write a list of a thousand items quicker than I can write range(1000)". It doesn't make sense, and for anything more than trivial cases, I don't believe it.
(I'll agree that there are places they are inappropriate, expressions which aren't clear, and maybe the extra effort of doing it by hand is worth it in security sensitive situations, but I claim it is extra effort and less clear what a pile of ifs/loops/switch is trying to achieve vs "[a-f]{3}(\d)" or whatever).
The meaning of ^ and $ in perl regexes can be altered by modifiers at the end of the regex. A 'multi-line' regex, with a /m at the end, makes them match the start and end of any line.
The code doesn't use /m. So indeed the problem is that $ will match before \n at the end of the string the Perl interpreter is working with, which is not the whole thing containing the payload.
I've just worked through it to understand it; it's a more subtle exploit than I thought. My above comment about $ being end-of-line is wrong, the above comment saying you need the /m modifier for that to happen is correct.
In case anyone cares, the bug matches <backslash> <newline> <quote> then the regex in question matches <backslash> <newline> and the code logic is that one backslash must be escaping the next character from the input, but instead of adding on the next character it always adds a quote - after searching for a quote the next character must be a quote, right? But it wasn't, it was a newline, which was missed because of $ behaviour with newline at the end of a string. That acts to shift the quote one to the left <backslash> <quote> <newline> and now that makes an escaped quote which won't break eval() and the loop carries on and reads in the exploit text up to the real end quote.
----
Details: the Perl code takes this pattern in the input (the quotes are part of the input, in the file data):
"a\
""
six characters, describing a quoted string of four characters: <a> <backslash> <newline> <quote>
The Perl code finds the string starting quote and moves past it, and sets $tok empty to hold the quoted string content. Then it searches for the next quote (not the last, the next):
last Tok unless $$dataPt =~ /"/sg;
This will match at the quote after the newline. Then it substrings from the saved opening quote position to 1 before the found quote. So the substring includes the newline char:
# the closing quote position. (not including the quote).
$tok .= substr($$dataPt, $pos, pos($$dataPt)-1-$pos);
That gets <a> <backslash> <newline> and not the following <quote> <quote>
Then it does the odd-number-of-backslashes test on the substring <a> <backslash> <newline>:
last unless $tok =~ /(\\+)$/ and length($1) & 0x01;
And the regex matches for <backslash> <newline> instead of the intended <backslash> ENDOFSTRING so the code thinks there is an escaped character. It doesn't add the next character from the string into the token, it assumes the escaped character must be a quote and always adds a quote to $tok.
$tok .= '"'; # quote is part of the string
Effectively shifting the input quote to the other side of the newline, from:
"a\
""
to
"a\"
"
and in the exploit case:
"a\
"exploit code"
to
"a\"
exploit code"
Then the inner loop runs again and finds the <exploit code to closing quote> text and adds that on to $tok. Now there's a string with an escaped quote and some exploit code.
Exiftool is one of those open source tools which provides a lot of features just for free. It’s really amazing that people can work on something like this for so long.
I would be more interested in a "what I like about it" sentence over a preemptively pacifying "to each their own" tbh. The tool's own description doesn't compare against exiftool either, besides that it sounds like it does only images and not, e.g., documents
> Exiv2 is a C++ library and a command-line utility to read, write, delete and modify Exif, IPTC, XMP and ICC image metadata.
So how do I know if there is bug bounty available for vulnerabilities in exiftool? Or ghostscript? Or ffmpeg, openssl, gnutls, sox, or any number of other packages I may be using?
For open source libraries, you are usually better off finding a large company that uses it in some exposed way, then submitting it to their bug bounty.
Sometimes you can even collect the bounty multiple times by sending it to multiple companies, so long as the first one doesn't submit the fix before the second even looks at the report...
There is at least one bug bounty available for all open source projects.
I run a bug bounty where if you tell me about the high level details of a bug and it sounds interesting, I will buy you a drink (up to $20, so that's the bounty) in exchange for you telling me all the details.
All applications for my bug bounty program must be in person, feel free to take me up on it!
[0] https://exiftool.org/ancient_history.html#v12.24
[1] https://github.com/search?q=repo%3Aexiftool%2Fexiftool+%2Fev...