A Poor Journalistss Text Mining Toolkit pudoorg Friedrich Lindenberg
How can journalists search and analyze collections of documents on their own computers with simple tools At last weekends DataHarvest we ran a workshop trying to answer that question This writeup to covers using Apache Tika for content extraction and regular expressions in Sublime Text as an advanced search tool