Skip to main content

CRL Reports

Buy Article:

$35.00 + tax (Refund Policy)

There has been a fair amount of discussion of late in information industry circles about text mining. Researchers in academia now have access to immense corpora of text that are openly available on the Web: the millions of public domain books and serials available courtesy of Google; and vast troves of government documents courtesy of “open government” initiatives in the U.S. and U.K. and third-party actors like WikiLeaks and the National Security Archive. The growing application of text mining techniques and technologies in many fields of research has implications that are beginning to be felt by libraries.

Text mining is generally defined as the automated processing of large amounts of digital data or textual content for purposes of information retrieval, extraction, interpretation, and analysis. Modern researchers now employ proprietary and open source software and tools to process and make sense of the oceans of information at their disposal in ways never before possible. Most text mining involves downloading a fixed body of text and accompanying metadata to a local host system or platform, and running it through certain processes that can detect patterns, trends, biases, and other phenomena in the underlying content. These phenomena can then form the basis for new observations, visualizations, models, and so forth.1

Publication date: October 1, 2012

  • Access Key
  • Free content
  • Partial Free content
  • New content
  • Open access content
  • Partial Open access content
  • Subscribed content
  • Partial Subscribed content
  • Free trial content