Class HtmlToPlainText

java.lang.Object
org.jsoup.examples.HtmlToPlainText

public class HtmlToPlainText extends Object
HTML to plain-text. This example program demonstrates the use of jsoup to convert HTML input to lightly-formatted plain-text. That is divergent from the general goal of jsoup's .text() methods, which is to get clean data from a scrape.

Note that this is a fairly simplistic formatter -- for real world use you'll want to embrace and extend.

To invoke from the command line, assuming you've downloaded the jsoup jar to your current directory:

java -cp jsoup.jar org.jsoup.examples.HtmlToPlainText url [selector]

where url is the URL to fetch, and selector is an optional CSS selector.
Author:
Jonathan Hedley, jonathan@hedley.net
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    String
    getPlainText(Element element)
    Format an Element to plain-text
    static void
    main(String... args)
     

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • HtmlToPlainText

      public HtmlToPlainText()
  • Method Details

    • main

      public static void main(String... args) throws IOException
      Throws:
      IOException
    • getPlainText

      public String getPlainText(Element element)
      Format an Element to plain-text
      Parameters:
      element - the root element to format
      Returns:
      formatted text