2014年10月1日 星期三

[Java] Parse JavaScript with jsoup

In a HTML page, i want to pick the value of a javascript variable. Below is the snippet of HTML page.
<input id="hidval" value="" type="hidden"> 
<form method="post" style="padding: 0px;margin: 0px;" name="profile" autocomplete="off">
<input name="pqRjnA" id="pqRjnA" value="" type="hidden">
<script type="text/javascript">

Use Jsoup + manual parsing

Here's an example how to get the key with jsoupand some "manual" code:
Document doc = ...
Element script = doc.select("script").first(); // Get the script part

Pattern p = Pattern.compile("(?is)key=\"(.+?)\""); // Regex for the value of the key
Matcher m = p.matcher(script.html()); // you have to use html here and NOT text! Text will drop the 'key' part

while( m.find() )
    System.out.println(m.group()); // the whole key ('key = value')
    System.out.println(m.group(1)); // value only
Output (using your html part):

