Automatic text summarization

Automatic text summarization is the process of shortening a text document by automatically creating a short, accurate, and fluent summary with the main points of the original document using software. It is a common problem in machine learning and natural language processing.

Since humans have the capacity to understand the meaning of a text document and extract the most important information from the original source using their own words, we are generally quite good at making summaries of a text. However, manual creation of summaries is very time consuming, and therefore a need for automatic summary has arisen. Not only are the automatic summarization tools much faster, they are also less biased than humans.

Nowadays, there are several methods of text summary, but there are two basic approaches to text summary that are based on the output type: extractive and abstractive. In an extractive summary, the most important sentences are extracted and joined to get a brief summary. The abstract text summary algorithms create new sentences and sentences that provide the most useful information from the original text - just as humans do.

This lecture provides insight into most common algorithms and tools used for automatic text summarization today, together with the methods used to evaluate automated summaries.