What is Text & Data Mining (TDM)?
Posted on September 4, 2018
TDM helps us discover, through the analysis of complex data sets, new information in the form of trends, patterns, and relationships. The overall goal of text and data mining is to extract information from a data set and transform it into an understandable structure for further use. Data mining is a misnomer, because the goal is the extraction of patterns and knowledge not the extraction (mining) of data itself. Now a days our data sets are less text-based and increasingly visual. There has been much progress in the areas of visual data mining and multimedia data mining. For example, here is a word cloud made from this blog post using a web-based program called Wordle:
How do I use mined data for my class or research?
TDM can benefit many areas of research and has been used in the medical, business, and humanities fields, just to name a few! The process of data mining is simple and consists of pre-processing, data mining, and results validation. No TDM plan? Don't worry; we can help you through this process!
TDM @ W&M Libraries
TDM support at W&M Libraries is mostly text-based at this time and restricted to currently owned and licensed databases. Librarians have curated a list of databases (see below) that allow TDM. Some databases charge for this service and have other restrictions/conditions. Contact your liaison Librarian (find the Librarian for your department here) or e-mail Content Services for a complete list or with any questions.
1. ProQuest Databases
Cost associated, 6-month embargo (plan ahead!), data delivered via hard-drive or FTP, and data must be used within license terms and conditions.
- The American Civil War: Letters & Diaries
- American Periodicals Series Online
- British Periodicals
- Colonial State Papers
- Early English Books Online (EEBO)
- Early European Books: Printed Sources to 1700
- Women's Magazine Archive
Cost associated, data delivered via hard-drive, and information must be used within license terms and conditions, contact for a complete list of databases available.
3. Adam Matthews
No cost associated, must be used within license terms and conditions, contact for a complete list of databases available.
4. ReadEx /News Bank
Cost associated, must be used within license terms and conditions, contact for a complete list of databases available.