{"id":7,"date":"2006-11-05T07:58:10","date_gmt":"2006-11-05T15:58:10","guid":{"rendered":"https:\/\/oroup.com\/blog\/2006\/11\/05\/the-joys-of-screenscraping\/"},"modified":"2006-11-05T07:58:10","modified_gmt":"2006-11-05T15:58:10","slug":"the-joys-of-screenscraping","status":"publish","type":"post","link":"https:\/\/oroup.com\/blog\/2006\/11\/the-joys-of-screenscraping\/","title":{"rendered":"The joys of screenscraping"},"content":{"rendered":"<p>A while back, I ran across a great <a href=\"http:\/\/www.hackdiary.com\/archives\/000041.html\">HackDiary entry<\/a> extolling the virtues of using <a href=\"http:\/\/home.ccil.org\/~cowan\/XML\/tagsoup\/\">TagSoup<\/a> and XPATH to do screenscraping from the web. TagSoup is a library that coerces all the ugly nasty HTML you find out in the wild into well-formed (although not necessarily valid) XML. While there&#8217;s no guarantee that the results are semantically the same as the input, it lets you use all your nice XML tools like XPATH to extract data. The entry does a great job of showing you how to use TagSoup with <a href=\"http:\/\/xml.apache.org\/xalan-j\/\">Xalan<\/a>. However, the JDK has been updated with it&#8217;s own XPATH parser so it&#8217;s no longer necessary to import the Xalan library. Below is a code sample for using TagSoup and the default XPATH parser to retrieve the stock price of Google from Google Finance. Note that the whole &#8220;MutableNamespaceContext&#8221; implementation is just a workaround for a missing JDK method as documented in <a href=\"http:\/\/bugs.sun.com\/bugdatabase\/view_bug.do?bug_id=5101859\">JDK Bug 5101859<\/a>. If that bug gets fixed, the code could be simplified substantially.<\/p>\n<p>The usual disclaimers apply about this all being sample quality code. All error handling has been punted to keep the example length short, but you&#8217;d never really want to do that. Also, I&#8217;m having trouble preserving indenting in this HTML View, I&#8217;ll work on that. This code assumes JDK 1.5. Click through for the code itself.<\/p>\n<p>Does this code support the <a href=\"http:\/\/www.oreilly.com\/catalog\/beyondjava\/\">argument<\/a> that Java is WAY too verbose? Absolutely.<br \/>\n<!--more-->   <\/p>\n<style type=\"text\/css\">     <!--code { font-family: Courier New, Courier; font-size: 10pt; margin: 0px; }-->   <\/style>\n<div align=\"left\" class=\"java\">\n<table cellspacing=\"0\" cellpadding=\"3\" border=\"0\" bgcolor=\"#ffffff\">\n<tr>\n<td valign=\"top\" nowrap=\"nowrap\" align=\"left\"><code> <font color=\"#3f5fbf\">\/**<\/font><br \/>\n<font color=\"#ffffff\">\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0Demo\u00a0of\u00a0screenscraping\u00a0using\u00a0TagSoup\u00a0and\u00a0XPATH\u00a0as\u00a0described\u00a0at\u00a0<\/font><br \/>\n<font color=\"#ffffff\">\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0http:\/\/blog.oroup.com\/2006\/11\/05\/the-joys-of-screenscraping\/<\/font><br \/>\n<font color=\"#ffffff\">\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0<\/font><br \/>\n<font color=\"#ffffff\">\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0This\u00a0example\u00a0class\u00a0downloads\u00a0the\u00a0content\u00a0of\u00a0a\u00a0page\u00a0from\u00a0Google<\/font><br \/>\n<font color=\"#ffffff\">\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0Finance\u00a0and\u00a0parses\u00a0it\u00a0for\u00a0the\u00a0Google\u00a0stock\u00a0price.\u00a0It\u00a0completely<\/font><br \/>\n<font color=\"#ffffff\">\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0omits\u00a0all\u00a0error\u00a0handling\u00a0for\u00a0brevity.\u00a0Also\u00a0a\u00a0lot\u00a0of\u00a0objects<\/font><br \/>\n<font color=\"#ffffff\">\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0should\u00a0be\u00a0cached\u00a0and\u00a0re-used\u00a0if\u00a0you\u00a0were\u00a0really\u00a0going\u00a0to\u00a0call<\/font><br \/>\n<font color=\"#ffffff\">\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0this\u00a0multiple\u00a0times.<\/font><br \/>\n<font color=\"#ffffff\">\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0<\/font><br \/>\n<font color=\"#ffffff\">\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0<\/font><font color=\"#7f9fbf\">@author\u00a0<\/font><font color=\"#3f5fbf\">Oliver\u00a0Roup\u00a0<<a href=\"mailto:oroup@oroup.com\">oroup@oroup.com<\/a>><\/font><font color=\"#7f7f9f\"><br \/>\n<font color=\"#ffffff\">\u00a0<\/font><font color=\"#3f5fbf\">*\/<\/font><br \/>\n<font color=\"#ffffff\"><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">java.io.InputStream;<\/font><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">java.io.StringWriter;<\/font><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">java.net.URL;<\/font><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">java.net.URLConnection;<\/font><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">java.util.ArrayList;<\/font><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">java.util.HashMap;<\/font><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">java.util.Iterator;<\/font><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">java.util.List;<\/font><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">java.util.Map;<\/font><br \/>\n<font color=\"#ffffff\"><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">javax.xml.namespace.NamespaceContext;<\/font><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">javax.xml.transform.OutputKeys;<\/font><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">javax.xml.transform.Result;<\/font><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">javax.xml.transform.Source;<\/font><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">javax.xml.transform.Transformer;<\/font><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">javax.xml.transform.TransformerFactory;<\/font><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">javax.xml.transform.dom.DOMResult;<\/font><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">javax.xml.transform.dom.DOMSource;<\/font><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">javax.xml.transform.sax.SAXTransformerFactory;<\/font><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">javax.xml.transform.sax.TransformerHandler;<\/font><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">javax.xml.transform.stream.StreamResult;<\/font><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">javax.xml.xpath.XPath;<\/font><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">javax.xml.xpath.XPathConstants;<\/font><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">javax.xml.xpath.XPathFactory;<\/font><br \/>\n<font color=\"#ffffff\"><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">org.w3c.dom.Node;<\/font><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">org.w3c.dom.NodeList;<\/font><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">org.xml.sax.InputSource;<\/font><br \/>\n<font color=\"#ffffff\"><br \/>\n<font color=\"#3f7f5f\">\/\/\u00a0The\u00a0Tagsoup\u00a0library.<\/font><br \/>\n<font color=\"#7f0055\"><strong>import\u00a0<\/strong><\/font><font color=\"#000000\">org.ccil.cowan.tagsoup.Parser;<\/font><br \/>\n<font color=\"#ffffff\"><br \/>\n<font color=\"#7f0055\"><strong>public\u00a0class\u00a0<\/strong><\/font><font color=\"#000000\">QueryHtml\u00a0<\/font><font color=\"#000000\">{<\/font><br \/>\n<font color=\"#ffffff\"><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0<\/font><font color=\"#7f0055\"><strong>public\u00a0static\u00a0<\/strong><\/font><font color=\"#7f0055\"><strong>void\u00a0<\/strong><\/font><font color=\"#000000\">main<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">String<\/font><font color=\"#000000\">[]\u00a0<\/font><font color=\"#000000\">args<\/font><font color=\"#000000\">)\u00a0<\/font><font color=\"#7f0055\"><strong>throws\u00a0<\/strong><\/font><font color=\"#000000\">Exception\u00a0<\/font><font color=\"#000000\">{<\/font><br \/>\n<font color=\"#ffffff\"><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#3f7f5f\">\/\/\u00a0Get\u00a0the\u00a0page\u00a0and\u00a0coerce\u00a0it\u00a0to\u00a0an\u00a0XML\u00a0DOM.\u00a0This\u00a0loads\u00a0the\u00a0whole<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#3f7f5f\">\/\/\u00a0thing\u00a0into\u00a0memory\u00a0so\u00a0massive\u00a0pages\u00a0should\u00a0be\u00a0cut\u00a0down\u00a0first<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#3f7f5f\">\/\/\u00a0using\u00a0SAX\u00a0or\u00a0something\u00a0similar.<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">Node\u00a0node\u00a0=\u00a0getHtmlUrlNode<\/font><font color=\"#000000\">(<\/font><font color=\"#2a00ff\">\"http:\/\/finance.google.com\/finance?q=GOOG\"<\/font><font color=\"#000000\">)<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\"><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#3f7f5f\">\/\/\u00a0Create\u00a0a\u00a0mutable\u00a0namespace\u00a0context.\u00a0This\u00a0should\u00a0really\u00a0be\u00a0provided<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#3f7f5f\">\/\/\u00a0by\u00a0the\u00a0JDK,\u00a0but\u00a0the\u00a0default\u00a0implementation\u00a0does\u00a0not\u00a0allow\u00a0new\u00a0entries<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#3f7f5f\">\/\/\u00a0to\u00a0be\u00a0added.<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">MutableNamespaceContext\u00a0nc\u00a0=\u00a0<\/font><font color=\"#7f0055\"><strong>new\u00a0<\/strong><\/font><font color=\"#000000\">MutableNamespaceContext<\/font><font color=\"#000000\">()<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\"><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#3f7f5f\">\/\/\u00a0Set\u00a0the\u00a0prefix\u00a0\"html\"\u00a0to\u00a0correspond\u00a0to\u00a0the\u00a0xhtml\u00a0namespace.<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#3f7f5f\">\/\/\u00a0This\u00a0can\u00a0be\u00a0called\u00a0multiple\u00a0times\u00a0with\u00a0different\u00a0prefixes.<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">nc.setNamespace<\/font><font color=\"#000000\">(<\/font><font color=\"#2a00ff\">\"html\"<\/font><font color=\"#000000\">,\u00a0<\/font><font color=\"#2a00ff\">\"http:\/\/www.w3.org\/1999\/xhtml\"<\/font><font color=\"#000000\">)<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\"><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#3f7f5f\">\/\/\u00a0This\u00a0is\u00a0the\u00a0query\u00a0we\u00a0run\u00a0against\u00a0the\u00a0DOM\u00a0coereced\u00a0from\u00a0the\u00a0web\u00a0page.<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#3f7f5f\">\/\/\u00a0If\u00a0the\u00a0HTML\u00a0changes\u00a0in\u00a0a\u00a0relevant\u00a0way,\u00a0it\u00a0will\u00a0break\u00a0this\u00a0query.\u00a0<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#7f0055\"><strong>final\u00a0<\/strong><\/font><font color=\"#000000\">String\u00a0QUERY\u00a0=\u00a0<\/font><font color=\"#2a00ff\">\"\/html:html\/html:body\/html:table[2]\/html:tr\"\u00a0<\/font><font color=\"#000000\">+\u00a0<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#2a00ff\">\"\/html:td\/html:table\/html:tr[2]\/html:td[2]\/html:div\/html:table\"\u00a0<\/font><font color=\"#000000\">+\u00a0<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#2a00ff\">\"\/html:tr\/html:td\/html:span\/html:span\/text()\"<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\"><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#3f7f5f\">\/\/\u00a0Run\u00a0the\u00a0xpath\u00a0query\u00a0against\u00a0the\u00a0DOM.<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">NodeList\u00a0result\u00a0=\u00a0xPathQuery<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">node,\u00a0QUERY,\u00a0nc<\/font><font color=\"#000000\">)<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#3f7f5f\">\/\/\u00a0Print\u00a0out\u00a0the\u00a0result.<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">System.out.println<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">dumpNode<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">result.item<\/font><font color=\"#000000\">(<\/font><font color=\"#990000\">0<\/font><font color=\"#000000\">)<\/font><font color=\"#000000\">,\u00a0<\/font><font color=\"#7f0055\"><strong>true<\/strong><\/font><font color=\"#000000\">))<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0<\/font><font color=\"#000000\">}<\/font><br \/>\n<font color=\"#ffffff\"><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0<\/font><font color=\"#3f5fbf\">\/**<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0<\/font><font color=\"#7f9fbf\">@param\u00a0<\/font><font color=\"#3f5fbf\">urlString\u00a0The\u00a0URL\u00a0of\u00a0the\u00a0page\u00a0to\u00a0retrieve<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0<\/font><font color=\"#7f9fbf\">@return\u00a0<\/font><font color=\"#3f5fbf\">A\u00a0Node\u00a0with\u00a0a\u00a0well\u00a0formed\u00a0XML\u00a0doc\u00a0coerced\u00a0from\u00a0the\u00a0page.<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0<\/font><font color=\"#7f9fbf\">@throws\u00a0<\/font><font color=\"#3f5fbf\">Exception\u00a0if\u00a0something\u00a0goes\u00a0wrong.\u00a0No\u00a0error\u00a0handling\u00a0at\u00a0all<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0for\u00a0brevity.<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0<\/font><font color=\"#3f5fbf\">*\/<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0<\/font><font color=\"#7f0055\"><strong>public\u00a0static\u00a0<\/strong><\/font><font color=\"#000000\">Node\u00a0getHtmlUrlNode<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">String\u00a0urlString<\/font><font color=\"#000000\">)\u00a0<\/font><font color=\"#7f0055\"><strong>throws\u00a0<\/strong><\/font><font color=\"#000000\">Exception\u00a0<\/font><font color=\"#000000\">{<\/font><br \/>\n<font color=\"#ffffff\"><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">SAXTransformerFactory\u00a0stf\u00a0=\u00a0<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">SAXTransformerFactory<\/font><font color=\"#000000\">)\u00a0<\/font><font color=\"#000000\">TransformerFactory.newInstance<\/font><font color=\"#000000\">()<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">TransformerHandler\u00a0th\u00a0=\u00a0stf.newTransformerHandler<\/font><font color=\"#000000\">()<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\"><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#3f7f5f\">\/\/\u00a0This\u00a0dom\u00a0result\u00a0will\u00a0contain\u00a0the\u00a0results\u00a0of\u00a0the\u00a0transformation<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">DOMResult\u00a0dr\u00a0=\u00a0<\/font><font color=\"#7f0055\"><strong>new\u00a0<\/strong><\/font><font color=\"#000000\">DOMResult<\/font><font color=\"#000000\">()<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">th.setResult<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">dr<\/font><font color=\"#000000\">)<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\"><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">Parser\u00a0parser\u00a0=\u00a0<\/font><font color=\"#7f0055\"><strong>new\u00a0<\/strong><\/font><font color=\"#000000\">Parser<\/font><font color=\"#000000\">()<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">parser.setContentHandler<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">th<\/font><font color=\"#000000\">)<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\"><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">URL\u00a0url\u00a0=\u00a0<\/font><font color=\"#7f0055\"><strong>new\u00a0<\/strong><\/font><font color=\"#000000\">URL<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">urlString<\/font><font color=\"#000000\">)<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">URLConnection\u00a0urlConn\u00a0=\u00a0url.openConnection<\/font><font color=\"#000000\">()<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">InputStream\u00a0stream\u00a0=\u00a0urlConn.getInputStream<\/font><font color=\"#000000\">()<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#3f7f5f\">\/\/\u00a0This\u00a0is\u00a0where\u00a0the\u00a0magic\u00a0happens\u00a0to\u00a0convert\u00a0HTML\u00a0to\u00a0XML<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">parser.parse<\/font><font color=\"#000000\">(<\/font><font color=\"#7f0055\"><strong>new\u00a0<\/strong><\/font><font color=\"#000000\">InputSource<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">stream<\/font><font color=\"#000000\">))<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#7f0055\"><strong>return\u00a0<\/strong><\/font><font color=\"#000000\">dr.getNode<\/font><font color=\"#000000\">()<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0<\/font><font color=\"#000000\">}<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0<\/font><font color=\"#3f5fbf\">\/**<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0<\/font><font color=\"#7f9fbf\">@param\u00a0<\/font><font color=\"#3f5fbf\">node\u00a0An\u00a0XML\u00a0DOM\u00a0Tree\u00a0for\u00a0query<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0<\/font><font color=\"#7f9fbf\">@param\u00a0<\/font><font color=\"#3f5fbf\">query\u00a0An\u00a0XPATH\u00a0query\u00a0to\u00a0run\u00a0against\u00a0the\u00a0DOM\u00a0Tree<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0<\/font><font color=\"#7f9fbf\">@param\u00a0<\/font><font color=\"#3f5fbf\">nc\u00a0The\u00a0namespaceContext\u00a0that\u00a0maps\u00a0prefixes\u00a0to\u00a0XML\u00a0namespace<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0<\/font><font color=\"#7f9fbf\">@return\u00a0<\/font><font color=\"#3f5fbf\">A\u00a0list\u00a0of\u00a0nodes\u00a0that\u00a0result\u00a0from\u00a0running\u00a0the\u00a0query\u00a0against<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0the\u00a0node.<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0<\/font><font color=\"#7f9fbf\">@throws\u00a0<\/font><font color=\"#3f5fbf\">Exception\u00a0If\u00a0anything\u00a0goes\u00a0wrong.\u00a0No\u00a0error\u00a0handling\u00a0for\u00a0brevity<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0<\/font><font color=\"#3f5fbf\">*\/<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0<\/font><font color=\"#7f0055\"><strong>public\u00a0static\u00a0<\/strong><\/font><font color=\"#000000\">NodeList\u00a0xPathQuery<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">Node\u00a0node,\u00a0String\u00a0query,\u00a0<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">NamespaceContext\u00a0nc<\/font><font color=\"#000000\">)\u00a0<\/font><font color=\"#7f0055\"><strong>throws\u00a0<\/strong><\/font><font color=\"#000000\">Exception\u00a0<\/font><font color=\"#000000\">{<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">XPathFactory\u00a0xpf\u00a0=\u00a0XPathFactory.newInstance<\/font><font color=\"#000000\">()<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">XPath\u00a0xpath\u00a0=\u00a0xpf.newXPath<\/font><font color=\"#000000\">()<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">xpath.setNamespaceContext<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">nc<\/font><font color=\"#000000\">)<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#7f0055\"><strong>return\u00a0<\/strong><\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">NodeList<\/font><font color=\"#000000\">)\u00a0<\/font><font color=\"#000000\">xpath.evaluate<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">query,\u00a0node,\u00a0XPathConstants.NODESET<\/font><font color=\"#000000\">)<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0<\/font><font color=\"#000000\">}<\/font><br \/>\n<font color=\"#ffffff\"><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0<\/font><font color=\"#3f5fbf\">\/**<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0<\/font><font color=\"#7f9fbf\">@param\u00a0<\/font><font color=\"#3f5fbf\">node\u00a0A\u00a0node\u00a0to\u00a0be\u00a0dumped\u00a0to\u00a0a\u00a0string<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0<\/font><font color=\"#7f9fbf\">@param\u00a0<\/font><font color=\"#3f5fbf\">omitDeclaration\u00a0A\u00a0boolean\u00a0whether\u00a0to\u00a0omit\u00a0the\u00a0XML\u00a0declaration<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0<\/font><font color=\"#7f9fbf\">@return\u00a0<\/font><font color=\"#3f5fbf\">A\u00a0string\u00a0representation\u00a0of\u00a0the\u00a0node.<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0<\/font><font color=\"#7f9fbf\">@throws\u00a0<\/font><font color=\"#3f5fbf\">Exception\u00a0If\u00a0anything\u00a0goes\u00a0wrong.\u00a0Error\u00a0handling\u00a0omitted.<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0<\/font><font color=\"#3f5fbf\">*\/<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0<\/font><font color=\"#7f0055\"><strong>public\u00a0static\u00a0<\/strong><\/font><font color=\"#000000\">String\u00a0dumpNode<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">Node\u00a0node,\u00a0<\/font><font color=\"#7f0055\"><strong>boolean\u00a0<\/strong><\/font><font color=\"#000000\">omitDeclaration<\/font><font color=\"#000000\">)\u00a0<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#7f0055\"><strong>throws\u00a0<\/strong><\/font><font color=\"#000000\">Exception\u00a0<\/font><font color=\"#000000\">{<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">Transformer\u00a0xformer\u00a0=\u00a0<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">TransformerFactory.newInstance<\/font><font color=\"#000000\">()<\/font><font color=\"#000000\">.newTransformer<\/font><font color=\"#000000\">()<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#7f0055\"><strong>if\u00a0<\/strong><\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">omitDeclaration<\/font><font color=\"#000000\">)\u00a0{<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">xformer.setOutputProperty<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">OutputKeys.OMIT_XML_DECLARATION,\u00a0<\/font><font color=\"#2a00ff\">\"yes\"<\/font><font color=\"#000000\">)<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">}<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">StringWriter\u00a0sw\u00a0=\u00a0<\/font><font color=\"#7f0055\"><strong>new\u00a0<\/strong><\/font><font color=\"#000000\">StringWriter<\/font><font color=\"#000000\">()<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">Result\u00a0result\u00a0=\u00a0<\/font><font color=\"#7f0055\"><strong>new\u00a0<\/strong><\/font><font color=\"#000000\">StreamResult<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">sw<\/font><font color=\"#000000\">)<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">Source\u00a0source\u00a0=\u00a0<\/font><font color=\"#7f0055\"><strong>new\u00a0<\/strong><\/font><font color=\"#000000\">DOMSource<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">node<\/font><font color=\"#000000\">)<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">xformer.transform<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">source,\u00a0result<\/font><font color=\"#000000\">)<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#7f0055\"><strong>return\u00a0<\/strong><\/font><font color=\"#000000\">sw.toString<\/font><font color=\"#000000\">()<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0<\/font><font color=\"#000000\">}<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0<\/font><br \/>\n<font color=\"#000000\">}<\/font><br \/>\n<font color=\"#ffffff\"><br \/>\n<font color=\"#3f5fbf\">\/**\u00a0There\u00a0is\u00a0a\u00a0bug\u00a0in\u00a0the\u00a0JDK\u00a0which\u00a0omits\u00a0the\u00a0setNamespace\u00a0declaration<\/font><br \/>\n<font color=\"#ffffff\">\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0from\u00a0implementations\u00a0of\u00a0NamespaceContext.\u00a0We\u00a0have\u00a0to\u00a0create\u00a0our<\/font><br \/>\n<font color=\"#ffffff\">\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0own\u00a0implementation\u00a0to\u00a0work\u00a0around\u00a0it.\u00a0Documented\u00a0here:<\/font><br \/>\n<font color=\"#ffffff\">\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0http:\/\/bugs.sun.com\/bugdatabase\/view_bug.do?bug_id=5101859<\/font><br \/>\n<font color=\"#ffffff\">\u00a0<\/font><font color=\"#3f5fbf\">*\u00a0<\/font><font color=\"#7f9fbf\">@author\u00a0<\/font><font color=\"#3f5fbf\">Oliver\u00a0Roup\u00a0&lt;oroup@oroup.com&gt;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0<\/font><font color=\"#3f5fbf\">*\/<\/font><br \/>\n<font color=\"#7f0055\"><strong>class\u00a0<\/strong><\/font><font color=\"#000000\">MutableNamespaceContext\u00a0<\/font><font color=\"#7f0055\"><strong>implements\u00a0<\/strong><\/font><font color=\"#000000\">NamespaceContext\u00a0<\/font><font color=\"#000000\">{<\/font><br \/>\n<font color=\"#ffffff\"><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0<\/font><font color=\"#7f0055\"><strong>private\u00a0<\/strong><\/font><font color=\"#000000\">Map\u00a0map;<\/font><br \/>\n<font color=\"#ffffff\"><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0<\/font><font color=\"#7f0055\"><strong>public\u00a0<\/strong><\/font><font color=\"#000000\">MutableNamespaceContext<\/font><font color=\"#000000\">()\u00a0{<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">map\u00a0=\u00a0<\/font><font color=\"#7f0055\"><strong>new\u00a0<\/strong><\/font><font color=\"#000000\">HashMap<\/font><font color=\"#000000\">()<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0<\/font><font color=\"#000000\">}<\/font><br \/>\n<font color=\"#ffffff\"><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0<\/font><font color=\"#7f0055\"><strong>public\u00a0<\/strong><\/font><font color=\"#7f0055\"><strong>void\u00a0<\/strong><\/font><font color=\"#000000\">setNamespace<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">String\u00a0prefix,\u00a0String\u00a0namespaceURI<\/font><font color=\"#000000\">)\u00a0{<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">map.put<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">prefix,\u00a0namespaceURI<\/font><font color=\"#000000\">)<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0<\/font><font color=\"#000000\">}<\/font><br \/>\n<font color=\"#ffffff\"><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0<\/font><font color=\"#7f0055\"><strong>public\u00a0<\/strong><\/font><font color=\"#000000\">String\u00a0getNamespaceURI<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">String\u00a0prefix<\/font><font color=\"#000000\">)\u00a0{<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#7f0055\"><strong>return\u00a0<\/strong><\/font><font color=\"#000000\">map.get<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">prefix<\/font><font color=\"#000000\">)<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0<\/font><font color=\"#000000\">}<\/font><br \/>\n<font color=\"#ffffff\"><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0<\/font><font color=\"#7f0055\"><strong>public\u00a0<\/strong><\/font><font color=\"#000000\">String\u00a0getPrefix<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">String\u00a0namespaceURI<\/font><font color=\"#000000\">)\u00a0{<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#7f0055\"><strong>for\u00a0<\/strong><\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">String\u00a0prefix\u00a0:\u00a0map.keySet<\/font><font color=\"#000000\">())\u00a0{<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#7f0055\"><strong>if\u00a0<\/strong><\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">map.get<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">prefix<\/font><font color=\"#000000\">)<\/font><font color=\"#000000\">.equals<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">namespaceURI<\/font><font color=\"#000000\">))\u00a0{<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#7f0055\"><strong>return\u00a0<\/strong><\/font><font color=\"#000000\">prefix;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">}<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">}<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#7f0055\"><strong>return\u00a0null<\/strong><\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0<\/font><font color=\"#000000\">}<\/font><br \/>\n<font color=\"#ffffff\"><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0<\/font><font color=\"#7f0055\"><strong>public\u00a0<\/strong><\/font><font color=\"#000000\">Iterator\u00a0getPrefixes<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">String\u00a0namespaceURI<\/font><font color=\"#000000\">)\u00a0{<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">List\u00a0prefixes\u00a0=\u00a0<\/font><font color=\"#7f0055\"><strong>new\u00a0<\/strong><\/font><font color=\"#000000\">ArrayList<\/font><font color=\"#000000\">()<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#7f0055\"><strong>for\u00a0<\/strong><\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">String\u00a0prefix\u00a0:\u00a0map.keySet<\/font><font color=\"#000000\">())\u00a0{<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#7f0055\"><strong>if\u00a0<\/strong><\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">map.get<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">prefix<\/font><font color=\"#000000\">)<\/font><font color=\"#000000\">.equals<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">namespaceURI<\/font><font color=\"#000000\">))\u00a0{<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">prefixes.add<\/font><font color=\"#000000\">(<\/font><font color=\"#000000\">prefix<\/font><font color=\"#000000\">)<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">}<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#000000\">}<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0\u00a0\u00a0<\/font><font color=\"#7f0055\"><strong>return\u00a0<\/strong><\/font><font color=\"#000000\">prefixes.iterator<\/font><font color=\"#000000\">()<\/font><font color=\"#000000\">;<\/font><br \/>\n<font color=\"#ffffff\">\u00a0\u00a0<\/font><font color=\"#000000\">}<\/font><br \/>\n<font color=\"#000000\">}\u00a0\u00a0<\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/code><font color=\"#7f7f9f\"><font color=\"#ffffff\"><font color=\"#ffffff\"><font color=\"#ffffff\"><font color=\"#ffffff\"><font color=\"#ffffff\"><font color=\"#ffffff\"><font color=\"#ffffff\"><font color=\"#ffffff\"><font color=\"#ffffff\"><font color=\"#ffffff\"><font color=\"#ffffff\"><font color=\"#ffffff\"><font color=\"#ffffff\"><font color=\"#ffffff\"><font color=\"#ffffff\"><font color=\"#ffffff\"><font color=\"#ffffff\"><font color=\"#ffffff\"><font color=\"#ffffff\"><font color=\"#ffffff\"><font color=\"#ffffff\"><font color=\"#ffffff\"><font color=\"#ffffff\"><font color=\"#ffffff\">         <\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/font><\/td>\n<\/tr>\n<\/table>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>A while back, I ran across a great HackDiary entry extolling the virtues of using TagSoup and XPATH to do screenscraping from the web. TagSoup is a library that coerces all the ugly nasty HTML you find out in the&hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-7","post","type-post","status-publish","format-standard","hentry","category-code"],"_links":{"self":[{"href":"https:\/\/oroup.com\/blog\/wp-json\/wp\/v2\/posts\/7","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/oroup.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/oroup.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/oroup.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/oroup.com\/blog\/wp-json\/wp\/v2\/comments?post=7"}],"version-history":[{"count":0,"href":"https:\/\/oroup.com\/blog\/wp-json\/wp\/v2\/posts\/7\/revisions"}],"wp:attachment":[{"href":"https:\/\/oroup.com\/blog\/wp-json\/wp\/v2\/media?parent=7"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/oroup.com\/blog\/wp-json\/wp\/v2\/categories?post=7"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/oroup.com\/blog\/wp-json\/wp\/v2\/tags?post=7"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}