We are happy to announce the start of our new project: textracto.com. textracto is a html web extraction tool, which offers an free-to-use api.
Our html content extractor extracts plaintext from blog-posts and articles, its perfect for site scraping. It automatically identifies the main content, and removes the surplus “clutter” (boilerplate, templates) around it. The content extractor works best for news articles and blog posts.