[1909.03186] On Extractive and Abstractive Neural Document Summarization with Transformer Language Models
While we believe that this work is a step forward towards generating more abstractive summaries, it remains an open challenge to develop models that respect the underlying facts of the content being summarized while matching the creative ability of humans to coherently and concisely synthesize summaries.
Abstract: We present a method to produce abstractive summaries of long documents that
exceed several thousand words via neural abstractive summarization. We perform
a simple extractive step before generating a summary, which is then used to
condition the transformer language model on relevant information before being
tasked with generating a summary. We show that this extractive step
significantly improves summarization results. We also show that this approach
produces more abstractive summaries compared to prior work that employs a copy
mechanism while still achieving higher rouge scores. Note: The abstract above
was not written by the authors, it was generated by one of the models presented
in this paper.
‹Figure 1: Proposed model for abstractive summarization of a scientific article. An older version of this paper is shown as the reference document. First, a sentence pointer network extracts important sentences from the paper. Next, these sentences are provided along with the whole scientific article to be arranged in the following order: Introduction, extracted Sentences, abstract & the rest of the paper. A transformer language model is trained on articles organized in this format. During inference, the introduction and the extracted sentences are given to the language model as context to generate a summary. In domains like news and patent documents, the introduction is replaced by the entire document. (Introduction)Figure 2: n-gram overlaps between the abstracts generated by different models and the input article on the arXiv dataset. We show in detail which part of the input was copied for our TLM conditioned on intro + extract. (Transformer Language Models (TLM))Figure 3: t-sne visualization of the TLM-learned word embeddings. The model appears to partition the space based on the broad paper categoty in which it frequently occurs. (T-SNE of learned word embeddings)›