Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.
Apache Tika 1.1 contains a number of improvements and bug fixes. Details can be found in the changes file:
http://www.apache.org/dist/tika/CHANGES-1.1.txt
Apache Tika is available in source form from the following download page:
http://www.apache.org/dyn/closer.cgi/tika/apache-tika-1.1-src.zip
Apache Tika is also available in binary form or for use using Maven 2 from the Central Maven Repository:
http://repo1.maven.org/maven2/org/apache/tika/
In the initial 48 hours, the release may not be available on all mirrors.
When downloading from a mirror site, please remember to verify the downloads using signatures found on the Apache site:
http://www.apache.org/dist/tika/KEYS
For more information on Apache Tika, visit the project home page:
http://tika.apache.org/