Apache Spark released the latest security bulletin on July 18, which contains a shell command injection vulnerability (CVE-2022-33891). The severity is important. The security researcher Kostya Kortchinsky (Databricks) has been credited with reporting this flaw.
Spark is a unified analytics engine for large-scale data processing. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing. The PoC code exploits are available on GitHub.
Affected version
Apache Spark versions 3.0.3 and earlier, versions 3.1.1 to 3.1.2, and versions 3.2.0 to 3.2.1
Solution
In this regard, we recommend that users upgrade Apache Spark to version 3.1.3, 3.2.2, or 3.3.0 or later in time to fix CVE-2022-33891.
Vulnerable component
http://localhost:8080/?doAs=`[command injection here]`
How does it work?
The command injection occurs because Spark checks the group membership of the user passed in the ?doAs
parameter by using a raw Linux command.
User commands are processed through ?doAs
parameter and nothing reflected back on the page during command execution, so this is blind OS injection. Your commands run, but there will be no indication if they worked or not or even if the program you’re running is on target.
OS commands that are passed on the URL parameters?doAs
will trigger the background Linux bash process which calls cmdseq
will run the process with the command line id -Gn
.Running of bash with id -Gn is a a
good sign of indicator that your server is vulnerable or it is already compromised.
If an attacker is sending reverse shell commands. There is also a high chance of granting apache spark server access to the attackers’ machine.
private def getUnixGroups(username: String): Set[String] = {
val cmdSeq = Seq("bash", "-c", "id -Gn " + username)
// we need to get rid of the trailing "\n" from the result of command execution
Utils.executeAndGetOutput(cmdSeq).stripLineEnd.split(" ").toSet
Utils.executeAndGetOutput(idPath :: "-Gn" :: username :: Nil).stripLineEnd.split(" ").toSet
}
}
Vulnerable source code: https://github.com/apache/spark/pull/36315/files#diff-96652ee6dcef30babdeff0aed66ced6839364ea4b22b7b5fdbedc82eb655eeb5L41
Detection & Response:
Splunk:
index=* c-uri="*?doAs=`*"
index=* (Image="*\\bash" AND (CommandLine="*id -Gn*"))
Qradar:
SELECT UTF8(payload) from events where LOGSOURCENAME(logsourceid) ilike '%Linux%' and "Image" ilike '%\bash' and ("Process CommandLine" ilike '%id -Gn%')
SELECT UTF8(payload) from events where "URL" ilike '%?doAs=`%'
Elastic Query:
url.original:*?doAs\=`*
(process.executable:*\\bash AND process.command_line:*id\ \-Gn*)
Carbon Black:
(process_name:*\\bash AND process_cmdline:*id\ \-Gn*)
FireEye:
(process:`*\bash` args:`id -Gn`)
GrayLog:
(Image.keyword:*\\bash AND CommandLine.keyword:*id\ \-Gn*)
c-uri.keyword:*?doAs=`*
RSA Netwitness:
(web.page contains '?doAs=`')
((Image contains 'bash') && (CommandLine contains 'id -Gn'))
Logpoint:
(Image="*\\bash" CommandLine IN "*id -Gn*")
c-uri="*?doAs=`*"
Source/References:
github.com/apache/spark/pull/36315/files#diff-96652ee6dcef30babdeff0aed66ced6839364ea4b22b7b5fdbedc82eb655eeb5L41
github.com/HuskyHacks/cve-2022-33891
github.com/W01fh4cker/cve-2022-33891/blob/main/cve_2022_33891_poc.py