Testing Sentence Compression without Paired Corpora
Sentence compression, “the task of shortening sentences while retaining the original meaning,” has traditionally depended on large corpora of text containing an original, verbose sentence paired with a corresponding shorter, pithier sentence. Master of Data Science students at the Center for Data Science, Thibault Févry and Jason Phang, explore ways to limit the need for paired corpora in their newly published work on sentence compression. In order to accomplish this task, Févry and Phang “apply neural summarization techniques to…sentence compression, focusing on extractive summarization.” Extraction, which attempts to isolate relevant tokens and phrases from a text, differs from an abstractive approach, which aims to compress by paraphrasing.
In their research, Févry and Phang conclude, “a simple denoising auto-encoder, trained on removing and reordering words from a noised input sequence, can learn effective sentence compression.” The model the researchers have designed is trained using the Annotated Gigaword dataset. This dataset consists of news headlines, regarded here as “summaries” and their corresponding main sentences, “references.” In typical supervised models, training depends on both components of this dataset. Supervised models train using references, which are paired with their respective summaries. In contrast, in this paper, the researchers used only the reference sentences in training, providing an “unsupervised” approach in which the model is never shown good summaries.
To do so, Févry and Phang utilize additive noising, whereby words are added to a compressed sentence resulting in a noised sentence 40–60% longer than the original. The objective is for the model to recover the original sentence from the noised sentence. In the figure provided in the research paper, a reference sentence undergoes the noising and shuffling process. The model takes in the nonsensical “‘sentence,’” and attempts to reproduce the original reference sentence. Notably, although this model underperforms when measured against other models trained on both reference and summary sentences by the standard ROUGE metric, it “performs competitively with supervised baseline models in human evaluation.”
Févry and Phang also experimented with included additional information from Natural Language Processing models, such as InferSent sentence embeddings. While incorporating InferSent embeddings improved the model’s ROUGE scores, it negatively impacted human evaluations. Consistent with previous researchers’ conclusions, Févry and Phang called into question the effectiveness of the ROUGE evaluations. Févry and Phang resolve that their experimental results provide a promising basis meriting further exploration in the task of sentence compression.
By Sabrina de Silva