Extract attributes, text, and HTML from elements

Problem

After parsing a document, and finding some elements, you'll want to get at the data inside those elements.

Solution

To get the value of an attribute, use the Node.attr(String key) method
For the text on an element (and its combined children), use Element.text()
For HTML, use Element.html(), or Node.outerHtml() as appropriate

For example:

String html = "<p>An <a href='http://example.com/'><b>example</b></a> link.</p>";
Document doc = Jsoup.parse(html);
Element link = doc.select("a").first();

String text = doc.body().text(); // "An example link"
String linkHref = link.attr("href"); // "http://example.com/"
String linkText = link.text(); // "example""

String linkOuterH = link.outerHtml(); 
    // "<a href="http://example.com"><b>example</b></a>"
String linkInnerH = link.html(); // "<b>example</b>"

Description

The methods above are the core of the element data access methods. There are additional others:

Element.id()
Element.tagName()
Element.className() and Element.hasClass(String className)

All of these accessor methods have corresponding setter methods to change the data.

Cookbook

Introduction

Parsing and traversing a Document

Input

Extracting data

Use DOM methods to navigate a document
Use CSS selectors to find elements
Use XPath selectors to find elements and nodes
Extract attributes, text, and HTML from elements
Working with relative and absolute URLs
Example program: list links

jsoup

Extract attributes, text, and HTML from elements

Problem

Solution

Description

See also

Cookbook

Introduction

Input

Extracting data

Modifying data

Cleaning HTML

Working with the web