Extract attributes, text, and HTML from elements
Problem
After parsing a document, and finding some elements, you'll want to get at the data inside those elements.
Solution
- To get the value of an attribute, use the
Node.attr(String key)
method - For the text on an element (and its combined children), use
Element.text()
- For HTML, use
Element.html()
, orNode.outerHtml()
as appropriate
For example:
String html = "<p>An <a href='http://example.com/'><b>example</b></a> link.</p>";
Document doc = Jsoup.parse(html);
Element link = doc.select("a").first();
String text = doc.body().text(); // "An example link"
String linkHref = link.attr("href"); // "http://example.com/"
String linkText = link.text(); // "example""
String linkOuterH = link.outerHtml();
// "<a href="http://example.com"><b>example</b></a>"
String linkInnerH = link.html(); // "<b>example</b>"
Description
The methods above are the core of the element data access methods. There are additional others:
All of these accessor methods have corresponding setter methods to change the data.
See also
- The reference documentation for
Element
and the collectionElements
class - Working with URLs
- Finding elements with the CSS selector syntax
Cookbook
Introduction
Input
- Parse a document from a String
- Parsing a body fragment
- Load a Document from a URL
- Load a Document from a File
Extracting data
- Use DOM methods to navigate a document
- Use CSS selectors to find elements
- Use XPath selectors to find elements and nodes
- Extract attributes, text, and HTML from elements
- Working with relative and absolute URLs
- Example program: list links