文章目录
前言
经常使用pyinstaller将一些写的python程序打包成了各种exe,时间一长,源码丢失,为了恢复一部分源码,只得将先前编译好的exe反编译成py文件。
Pyinstaller在打包过程中会将py文件编译为pyc文件,然后去掉pyc文件开头的16个字节。然后将python解释器、依赖文件和修改后的pyc文件一起,用一种特殊的自解压格式打包起来,形成可执行文件。这16个字节是 Python 字节码文件的一个Magic Number和版本信息,它们用于标识这是一个 Python 字节码文件以及它是由哪个版本的 Python 编译的。
本文以test.exe为例,将其反编译为test.py
反编译过程大致分为以下三步:
1.使用pyinstxtractor.py将exe文件转换成pyc文件
2.给pyc文件添加文件头
3.使用pycdc工具反编译pyc文件,获得源码
1.使用pyinstxtractor.py将exe文件转换成pyc文件
新建一个pyinstxtractor.py(具体代码较长贴在文末)
将test.exe跟pyinstxtractor.py放在同一个目录中,cmd窗体执行如下命令:
python
python pyinstxtractor.py test.exe
出现如下Successfully字眼则表示成功
在该路径下已经生成了一个"test.exe_extracted"的文件夹
在"test.exe_extracted"的文件夹找到test文件(没有后缀名),一般你的程序是xxx.exe,就找xxx文件
将test文件名改为test.pyc
2.给pyc文件添加文件头
这里使用到了一个工具Sublime Text,是一个轻量级、跨平台的文本和源代码编辑器,也可以用其他编辑器。
Sublime Text下载地址:https://download.csdn.net/download/qq_41273999/89396003?spm=1001.2014.3001.5503
将test.pyc用Sublime Text工具打开,此时是以16进制的方式打开的
打开"test.exe_extracted"文件夹下的PYZ-00.pyz_extracted文件夹,还是用Sublime Text随便打开其中一个pyc文件
将xxx.pyc文件的第一行16个字节复制下来,添加到test.pyc文件的第一行,保存,如下图:
3.使用pycdc工具反编译pyc文件,获得源码
有个很好用的库Uncompyle 6可以反编译pyc文件(pip install uncompyle6 安装好后,运行:uncompyle6 test.pyc)
但需要注意的是:Uncompyle 6暂时无法反编译Python 3.9和更高版本产生的pyc文件,所以推荐一个pycdc工具可以将.pyc文件转换为.py,适用于 Python 3.9及更高版本。目前笔者已测试python3.9和最高版本python3.12,可以反编译成功。
获取pycdc工具有两种途径:
(1)可以去Github手动下载pycdc安装包(但程序需要编译):https://github.com/zrax/pycdc
程序的编译需要用到CMake,还比较麻烦。。。
如果想试试的话可以参考文章:《Windows下搭建Cmake编译环境进行C/C++文件的编译》
(2)除此之外可以下载编译好的可执行文件:https://download.csdn.net/download/qq_41273999/89397541
将修改后的test.pyc文件和pycdc.exe放在同目录下
执行命令 pycdc.exe test.pyc>test.py,同目录下生成转换后的test.py文件,该文件就是转换出来的源码
附录
参考文章:《Windows下搭建Cmake编译环境进行C/C++文件的编译》
Sublime Text工具:https://download.csdn.net/download/qq_41273999/89396003?spm=1001.2014.3001.5503
pycdc工具:https://download.csdn.net/download/qq_41273999/89397541
pyinstxtractor.py内容如下:
python
"""
PyInstaller Extractor v1.9 (Supports pyinstaller 3.3, 3.2, 3.1, 3.0, 2.1, 2.0)
Author : Extreme Coders
E-mail : extremecoders(at)hotmail(dot)com
Web : https://0xec.blogspot.com
Date : 29-November-2017
Url : https://sourceforge.net/projects/pyinstallerextractor/
For any suggestions, leave a comment on
https://forum.tuts4you.com/topic/34455-pyinstaller-extractor/
This script extracts a pyinstaller generated executable file.
Pyinstaller installation is not needed. The script has it all.
For best results, it is recommended to run this script in the
same version of python as was used to create the executable.
This is just to prevent unmarshalling errors(if any) while
extracting the PYZ archive.
Usage : Just copy this script to the directory where your exe resides
and run the script with the exe file name as a parameter
C:\path\to\exe\>python pyinstxtractor.py <filename>
$ /path/to/exe/python pyinstxtractor.py <filename>
Licensed under GNU General Public License (GPL) v3.
You are free to modify this source.
CHANGELOG
================================================
Version 1.1 (Jan 28, 2014)
-------------------------------------------------
- First Release
- Supports only pyinstaller 2.0
Version 1.2 (Sept 12, 2015)
-------------------------------------------------
- Added support for pyinstaller 2.1 and 3.0 dev
- Cleaned up code
- Script is now more verbose
- Executable extracted within a dedicated sub-directory
(Support for pyinstaller 3.0 dev is experimental)
Version 1.3 (Dec 12, 2015)
-------------------------------------------------
- Added support for pyinstaller 3.0 final
- Script is compatible with both python 2.x & 3.x (Thanks to Moritz Kroll @ Avira Operations GmbH & Co. KG)
Version 1.4 (Jan 19, 2016)
-------------------------------------------------
- Fixed a bug when writing pyc files >= version 3.3 (Thanks to Daniello Alto: https://github.com/Djamana)
Version 1.5 (March 1, 2016)
-------------------------------------------------
- Added support for pyinstaller 3.1 (Thanks to Berwyn Hoyt for reporting)
Version 1.6 (Sept 5, 2016)
-------------------------------------------------
- Added support for pyinstaller 3.2
- Extractor will use a random name while extracting unnamed files.
- For encrypted pyz archives it will dump the contents as is. Previously, the tool would fail.
Version 1.7 (March 13, 2017)
-------------------------------------------------
- Made the script compatible with python 2.6 (Thanks to Ross for reporting)
Version 1.8 (April 28, 2017)
-------------------------------------------------
- Support for sub-directories in .pyz files (Thanks to Moritz Kroll @ Avira Operations GmbH & Co. KG)
Version 1.9 (November 29, 2017)
-------------------------------------------------
- Added support for pyinstaller 3.3
- Display the scripts which are run at entry (Thanks to Michael Gillespie @ malwarehunterteam for the feature request)
"""
from __future__ import print_function
import os
import struct
import marshal
import zlib
import sys
import imp
import types
from uuid import uuid4 as uniquename
class CTOCEntry:
def __init__(self, position, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name):
self.position = position
self.cmprsdDataSize = cmprsdDataSize
self.uncmprsdDataSize = uncmprsdDataSize
self.cmprsFlag = cmprsFlag
self.typeCmprsData = typeCmprsData
self.name = name
class PyInstArchive:
PYINST20_COOKIE_SIZE = 24 # For pyinstaller 2.0
PYINST21_COOKIE_SIZE = 24 + 64 # For pyinstaller 2.1+
MAGIC = b'MEI\014\013\012\013\016' # Magic number which identifies pyinstaller
def __init__(self, path):
self.filePath = path
def open(self):
try:
self.fPtr = open(self.filePath, 'rb')
self.fileSize = os.stat(self.filePath).st_size
except:
print('[*] Error: Could not open {0}'.format(self.filePath))
return False
return True
def close(self):
try:
self.fPtr.close()
except:
pass
def checkFile(self):
print('[*] Processing {0}'.format(self.filePath))
# Check if it is a 2.0 archive
self.fPtr.seek(self.fileSize - self.PYINST20_COOKIE_SIZE, os.SEEK_SET)
magicFromFile = self.fPtr.read(len(self.MAGIC))
if magicFromFile == self.MAGIC:
self.pyinstVer = 20 # pyinstaller 2.0
print('[*] Pyinstaller version: 2.0')
return True
# Check for pyinstaller 2.1+ before bailing out
self.fPtr.seek(self.fileSize - self.PYINST21_COOKIE_SIZE, os.SEEK_SET)
magicFromFile = self.fPtr.read(len(self.MAGIC))
if magicFromFile == self.MAGIC:
print('[*] Pyinstaller version: 2.1+')
self.pyinstVer = 21 # pyinstaller 2.1+
return True
print('[*] Error : Unsupported pyinstaller version or not a pyinstaller archive')
return False
def getCArchiveInfo(self):
try:
if self.pyinstVer == 20:
self.fPtr.seek(self.fileSize - self.PYINST20_COOKIE_SIZE, os.SEEK_SET)
# Read CArchive cookie
(magic, lengthofPackage, toc, tocLen, self.pyver) = \
struct.unpack('!8siiii', self.fPtr.read(self.PYINST20_COOKIE_SIZE))
elif self.pyinstVer == 21:
self.fPtr.seek(self.fileSize - self.PYINST21_COOKIE_SIZE, os.SEEK_SET)
# Read CArchive cookie
(magic, lengthofPackage, toc, tocLen, self.pyver, pylibname) = \
struct.unpack('!8siiii64s', self.fPtr.read(self.PYINST21_COOKIE_SIZE))
except:
print('[*] Error : The file is not a pyinstaller archive')
return False
print('[*] Python version: {0}'.format(self.pyver))
# Overlay is the data appended at the end of the PE
self.overlaySize = lengthofPackage
self.overlayPos = self.fileSize - self.overlaySize
self.tableOfContentsPos = self.overlayPos + toc
self.tableOfContentsSize = tocLen
print('[*] Length of package: {0} bytes'.format(self.overlaySize))
return True
def parseTOC(self):
# Go to the table of contents
self.fPtr.seek(self.tableOfContentsPos, os.SEEK_SET)
self.tocList = []
parsedLen = 0
# Parse table of contents
while parsedLen < self.tableOfContentsSize:
(entrySize,) = struct.unpack('!i', self.fPtr.read(4))
nameLen = struct.calcsize('!iiiiBc')
(entryPos, cmprsdDataSize, uncmprsdDataSize, cmprsFlag, typeCmprsData, name) = \
struct.unpack( \
'!iiiBc{0}s'.format(entrySize - nameLen), \
self.fPtr.read(entrySize - 4))
name = name.decode('utf-8').rstrip('\0')
if len(name) == 0:
name = str(uniquename())
print('[!] Warning: Found an unamed file in CArchive. Using random name {0}'.format(name))
self.tocList.append( \
CTOCEntry( \
self.overlayPos + entryPos, \
cmprsdDataSize, \
uncmprsdDataSize, \
cmprsFlag, \
typeCmprsData, \
name \
))
parsedLen += entrySize
print('[*] Found {0} files in CArchive'.format(len(self.tocList)))
def extractFiles(self):
print('[*] Beginning extraction...please standby')
extractionDir = os.path.join(os.getcwd(), os.path.basename(self.filePath) + '_extracted')
if not os.path.exists(extractionDir):
os.mkdir(extractionDir)
os.chdir(extractionDir)
for entry in self.tocList:
basePath = os.path.dirname(entry.name)
if basePath != '':
# Check if path exists, create if not
if not os.path.exists(basePath):
os.makedirs(basePath)
self.fPtr.seek(entry.position, os.SEEK_SET)
data = self.fPtr.read(entry.cmprsdDataSize)
if entry.cmprsFlag == 1:
data = zlib.decompress(data)
# Malware may tamper with the uncompressed size
# Comment out the assertion in such a case
assert len(data) == entry.uncmprsdDataSize # Sanity Check
with open(entry.name, 'wb') as f:
f.write(data)
if entry.typeCmprsData == b's':
print('[+] Possible entry point: {0}'.format(entry.name))
elif entry.typeCmprsData == b'z' or entry.typeCmprsData == b'Z':
self._extractPyz(entry.name)
def _extractPyz(self, name):
dirName = name + '_extracted'
# Create a directory for the contents of the pyz
if not os.path.exists(dirName):
os.mkdir(dirName)
with open(name, 'rb') as f:
pyzMagic = f.read(4)
assert pyzMagic == b'PYZ\0' # Sanity Check
pycHeader = f.read(4) # Python magic value
if imp.get_magic() != pycHeader:
print(
'[!] Warning: The script is running in a different python version than the one used to build the executable')
print(
' Run this script in Python{0} to prevent extraction errors(if any) during unmarshalling'.format(
self.pyver))
(tocPosition,) = struct.unpack('!i', f.read(4))
f.seek(tocPosition, os.SEEK_SET)
try:
toc = marshal.load(f)
except:
print('[!] Unmarshalling FAILED. Cannot extract {0}. Extracting remaining files.'.format(name))
return
print('[*] Found {0} files in PYZ archive'.format(len(toc)))
# From pyinstaller 3.1+ toc is a list of tuples
if type(toc) == list:
toc = dict(toc)
for key in toc.keys():
(ispkg, pos, length) = toc[key]
f.seek(pos, os.SEEK_SET)
fileName = key
try:
# for Python > 3.3 some keys are bytes object some are str object
fileName = key.decode('utf-8')
except:
pass
# Make sure destination directory exists, ensuring we keep inside dirName
destName = os.path.join(dirName, fileName.replace("..", "__"))
destDirName = os.path.dirname(destName)
if not os.path.exists(destDirName):
os.makedirs(destDirName)
try:
data = f.read(length)
data = zlib.decompress(data)
except:
print('[!] Error: Failed to decompress {0}, probably encrypted. Extracting as is.'.format(fileName))
open(destName + '.pyc.encrypted', 'wb').write(data)
continue
with open(destName + '.pyc', 'wb') as pycFile:
pycFile.write(pycHeader) # Write pyc magic
pycFile.write(b'\0' * 4) # Write timestamp
if self.pyver >= 33:
pycFile.write(b'\0' * 4) # Size parameter added in Python 3.3
pycFile.write(data)
def main():
if len(sys.argv) < 2:
print('[*] Usage: pyinstxtractor.py <filename>')
else:
arch = PyInstArchive(sys.argv[1])
if arch.open():
if arch.checkFile():
if arch.getCArchiveInfo():
arch.parseTOC()
arch.extractFiles()
arch.close()
print('[*] Successfully extracted pyinstaller archive: {0}'.format(sys.argv[1]))
print('')
print('You can now use a python decompiler on the pyc files within the extracted directory')
return
arch.close()
if __name__ == '__main__':
main()