Package org.jsoup.parser
Class Parser
- java.lang.Object
-
- org.jsoup.parser.Parser
-
public class Parser extends java.lang.Object
-
-
Constructor Summary
Constructors Constructor Description Parser(org.jsoup.parser.TreeBuilder treeBuilder)Create a new Parser, using the specified TreeBuilder
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description ParseErrorListgetErrors()Retrieve the parse errors, if any, from the last parse.org.jsoup.parser.TreeBuildergetTreeBuilder()Get the TreeBuilder currently in use.static ParserhtmlParser()Create a new HTML parser.booleanisContentForTagData(java.lang.String normalName)(An internal method, visible for Element.booleanisTrackErrors()Check if parse error tracking is enabled.ParsernewInstance()Creates a new Parser as a deep copy of this; including initializing a new TreeBuilder.static Documentparse(java.lang.String html, java.lang.String baseUri)Parse HTML into a Document.static DocumentparseBodyFragment(java.lang.String bodyHtml, java.lang.String baseUri)Parse a fragment of HTML into thebodyof a Document.static java.util.List<Node>parseFragment(java.lang.String fragmentHtml, Element context, java.lang.String baseUri)Parse a fragment of HTML into a list of nodes.static java.util.List<Node>parseFragment(java.lang.String fragmentHtml, Element context, java.lang.String baseUri, ParseErrorList errorList)Parse a fragment of HTML into a list of nodes.java.util.List<Node>parseFragmentInput(java.lang.String fragment, Element context, java.lang.String baseUri)DocumentparseInput(java.io.Reader inputHtml, java.lang.String baseUri)DocumentparseInput(java.lang.String html, java.lang.String baseUri)static java.util.List<Node>parseXmlFragment(java.lang.String fragmentXml, java.lang.String baseUri)Parse a fragment of XML into a list of nodes.ParseSettingssettings()Parsersettings(ParseSettings settings)ParsersetTrackErrors(int maxErrors)Enable or disable parse error tracking for the next parse.ParsersetTreeBuilder(org.jsoup.parser.TreeBuilder treeBuilder)Update the TreeBuilder used when parsing content.static java.lang.StringunescapeEntities(java.lang.String string, boolean inAttribute)Utility method to unescape HTML entities from a stringstatic ParserxmlParser()Create a new XML parser.
-
-
-
Method Detail
-
newInstance
public Parser newInstance()
Creates a new Parser as a deep copy of this; including initializing a new TreeBuilder. Allows independent (multi-threaded) use.- Returns:
- a copied parser
-
parseInput
public Document parseInput(java.lang.String html, java.lang.String baseUri)
-
parseInput
public Document parseInput(java.io.Reader inputHtml, java.lang.String baseUri)
-
parseFragmentInput
public java.util.List<Node> parseFragmentInput(java.lang.String fragment, Element context, java.lang.String baseUri)
-
getTreeBuilder
public org.jsoup.parser.TreeBuilder getTreeBuilder()
Get the TreeBuilder currently in use.- Returns:
- current TreeBuilder.
-
setTreeBuilder
public Parser setTreeBuilder(org.jsoup.parser.TreeBuilder treeBuilder)
Update the TreeBuilder used when parsing content.- Parameters:
treeBuilder- current TreeBuilder- Returns:
- this, for chaining
-
isTrackErrors
public boolean isTrackErrors()
Check if parse error tracking is enabled.- Returns:
- current track error state.
-
setTrackErrors
public Parser setTrackErrors(int maxErrors)
Enable or disable parse error tracking for the next parse.- Parameters:
maxErrors- the maximum number of errors to track. Set to 0 to disable.- Returns:
- this, for chaining
-
getErrors
public ParseErrorList getErrors()
Retrieve the parse errors, if any, from the last parse.- Returns:
- list of parse errors, up to the size of the maximum errors tracked.
- See Also:
setTrackErrors(int)
-
settings
public Parser settings(ParseSettings settings)
-
settings
public ParseSettings settings()
-
isContentForTagData
public boolean isContentForTagData(java.lang.String normalName)
(An internal method, visible for Element. For HTML parse, signals that script and style text should be treated as Data Nodes).
-
parse
public static Document parse(java.lang.String html, java.lang.String baseUri)
Parse HTML into a Document.- Parameters:
html- HTML to parsebaseUri- base URI of document (i.e. original fetch location), for resolving relative URLs.- Returns:
- parsed Document
-
parseFragment
public static java.util.List<Node> parseFragment(java.lang.String fragmentHtml, Element context, java.lang.String baseUri)
Parse a fragment of HTML into a list of nodes. The context element, if supplied, supplies parsing context.- Parameters:
fragmentHtml- the fragment of HTML to parsecontext- (optional) the element that this HTML fragment is being parsed for (i.e. for inner HTML). This provides stack context (for implicit element creation).baseUri- base URI of document (i.e. original fetch location), for resolving relative URLs.- Returns:
- list of nodes parsed from the input HTML. Note that the context element, if supplied, is not modified.
-
parseFragment
public static java.util.List<Node> parseFragment(java.lang.String fragmentHtml, Element context, java.lang.String baseUri, ParseErrorList errorList)
Parse a fragment of HTML into a list of nodes. The context element, if supplied, supplies parsing context.- Parameters:
fragmentHtml- the fragment of HTML to parsecontext- (optional) the element that this HTML fragment is being parsed for (i.e. for inner HTML). This provides stack context (for implicit element creation).baseUri- base URI of document (i.e. original fetch location), for resolving relative URLs.errorList- list to add errors to- Returns:
- list of nodes parsed from the input HTML. Note that the context element, if supplied, is not modified.
-
parseXmlFragment
public static java.util.List<Node> parseXmlFragment(java.lang.String fragmentXml, java.lang.String baseUri)
Parse a fragment of XML into a list of nodes.- Parameters:
fragmentXml- the fragment of XML to parsebaseUri- base URI of document (i.e. original fetch location), for resolving relative URLs.- Returns:
- list of nodes parsed from the input XML.
-
parseBodyFragment
public static Document parseBodyFragment(java.lang.String bodyHtml, java.lang.String baseUri)
Parse a fragment of HTML into thebodyof a Document.- Parameters:
bodyHtml- fragment of HTMLbaseUri- base URI of document (i.e. original fetch location), for resolving relative URLs.- Returns:
- Document, with empty head, and HTML parsed into body
-
unescapeEntities
public static java.lang.String unescapeEntities(java.lang.String string, boolean inAttribute)Utility method to unescape HTML entities from a string- Parameters:
string- HTML escaped stringinAttribute- if the string is to be escaped in strict mode (as attributes are)- Returns:
- an unescaped string
-
htmlParser
public static Parser htmlParser()
Create a new HTML parser. This parser treats input as HTML5, and enforces the creation of a normalised document, based on a knowledge of the semantics of the incoming tags.- Returns:
- a new HTML parser.
-
xmlParser
public static Parser xmlParser()
Create a new XML parser. This parser assumes no knowledge of the incoming tags and does not treat it as HTML, rather creates a simple tree directly from the input.- Returns:
- a new simple XML parser.
-
-