Load a Document from a URL
Problem
You need to fetch and parse a HTML document from the web, and find data within it (screen scraping).
Solution
Use the Jsoup.connect(String url)
method:
Document doc = Jsoup.connect("http://example.com/").get();
String title = doc.title();
Description
The connect(String url)
method creates a new Connection
, and get()
fetches and parses a HTML file. If an error occurs whilst fetching the URL, it will throw an IOException
, which you should handle appropriately.
The Connection
interface is designed for method chaining to build specific requests:
Document doc = Jsoup.connect("http://example.com")
.data("query", "Java")
.userAgent("Mozilla")
.cookie("auth", "token")
.timeout(3000)
.post();
This method only suports web URLs (http
and https
protocols); if you need to load from a file, use the parse(File in, String charsetName)
method instead.
Cookbook
Introduction
Input
- Parse a document from a String
- Parsing a body fragment
- Load a Document from a URL
- Load a Document from a File
Extracting data
- Use DOM methods to navigate a document
- Use CSS selectors to find elements
- Use XPath selectors to find elements and nodes
- Extract attributes, text, and HTML from elements
- Working with relative and absolute URLs
- Example program: list links