In the past several years, information technology created lots of innovations in various fields such as business, education, health, defense, etc. Everyday organizations from these areas collect high quality data on a large scale from many different sources. The huge amount of data can be a gold mine for organizational management. It is therefore increasingly important to build a central information processing unit that merges all the different sources and thus provides us a greater advantage in information processing and decision making. However, timely and accurately processing tremendous data analysis in traditional methods is a difficult task. The ability to analyze and utilize massive data lags far behind the capability of gathering and storing it. This gives rise to new challenges for businesses and researchers in the extraction of useful information.
Consider a very large collection of textual items, such as an encyclopedia or a digital library where data is accumulated from different sources. It would be of great help for browsing it, if the items could be pre-ordered hierarchically according to their contents. For the ordering one needs a similarity measure for the pairs of items. One might wish to have a measure that compares the meanings of the contents linguistically. When the text corpus is really large, such linguistic analyses become soon computationally overwhelming. It has transpired, however, that rather descriptive and useful similarity relations between text items are already reflected in the use of the words in them.
Therefore our aim is to develop a user-friendly and an efficient method and a tool that will satisfy our different kinds of information needs and tasks regarding organizing, visualizing, searching, categorizing and filtering textual data.
|