Abstract: In this paper, we present a model for generating summaries of text documents\nwith respect to a query. This is known as query-based summarization. We adapt\nan existing dataset of news article summaries for the task and train a\npointer-generator model using this dataset. The generated summaries are\nevaluated by measuring similarity to reference summaries. Our results show that\na neural network summarization model, similar to existing neural network models\nfor abstractive summarization, can be constructed to make use of queries to\nproduce targeted summaries.\n\n "]},{"cell_type":"markdown","metadata":{},"source":["While the overall performance of the model is not enough to outperform our extractive baseline, we have shown that it can incorporate a query and utilize the information to create more focused summaries."]},{"cell_type":"markdown","metadata":{},"source":["![title](https://a2c.fyi/q/MSzTSWNWZU/f0.png)"]},{"cell_type":"markdown","metadata":{},"source":["Figure 1: Diagram of the GRU architecture. xt corresponds to the input and ht the output of the GRU cell. Boxes with multiple input vectors have the input concatenated. A circle signifies an operation , where + is vector addition, · is elementwise multiplication, and \"1−\" computes the complement probability for the input elements in [0, 1]. (Gated Recurrent Units)"]},{"cell_type":"markdown","metadata":{},"source":["$$\n\\newcommand{\\larrowover}[1]{\\stackrel{\\leftarrow}{#1}}\n\\newcommand{\\rarrowover}[1]{\\stackrel{\\rightarrow}{#1}}\n\\newcommand{\\argmax}[1]{\\underset{#1}{\\operatorname{arg}\\,\\operatorname{max}}\\;}\\DeclareMathOperator{\\GRU}{GRU}\n\\begin{align*}\nr_t &= \\sigma(W^r[x_{t}, h_{t-1}] + b^r) \\\\\nz_t &= \\sigma(W^z[x_{t}, h_{t-1}] + b^z)\n\\\\[0.3cm]\nh'_{t} &= \\tanh(W^h[x_{t}, r_t \\odot h_{t-1}] + b^h)\n\\\\\nh_{t} &= z_t \\odot h_{t-1} + (1-z_t) \\odot h'_{t},\n\\end{align*}$$"]},{"cell_type":"code","execution_count":0,"outputs":[],"metadata":{},"source":["r_t == sigma(W**r(x_t, h_(t - 1)) + b**r) \n"," z_t == sigma(W**z(x_t, h_(t - 1)) + b**z) \n"," 0.3cm h__deriv_t == tanh(W**h(x_t, r_t odot h_(t - 1)) + b**h)\n"," \n"," h_t == z_t odot h_(t - 1) +(1 - z_t) odot h__deriv_t"]},{"cell_type":"markdown","metadata":{},"source":["![title](https://a2c.fyi/q/MSzTSWNWZU/f2.png)"]},{"cell_type":"markdown","metadata":{},"source":["Figure 3: Overview of our model. It illustrates connections between parts of the model at a fixed decoder time step t. The bottom part, containing labeled boxes, correspond to the different RNNs. The top part is intended to visualize the two ways the output word yt can be selected, through the pointer and generator mechanism, to the left and right respectively. (Model)"]},{"cell_type":"markdown","metadata":{},"source":["$$\n\\newcommand{\\larrowover}[1]{\\stackrel{\\leftarrow}{#1}}\n\\newcommand{\\rarrowover}[1]{\\stackrel{\\rightarrow}{#1}}\n\\newcommand{\\argmax}[1]{\\underset{#1}{\\operatorname{arg}\\,\\operatorname{max}}\\;}\\DeclareMathOperator{\\GRU}{GRU}\n\\begin{align*}\nh_t = \\GRU(h_{t-1}, x_t).\n\\end{align*}$$"]},{"cell_type":"code","execution_count":0,"outputs":[],"metadata":{},"source":["h_t == GRU"]},{"cell_type":"markdown","metadata":{},"source":["$$\n\\newcommand{\\larrowover}[1]{\\stackrel{\\leftarrow}{#1}}\n\\newcommand{\\rarrowover}[1]{\\stackrel{\\rightarrow}{#1}}\n\\newcommand{\\argmax}[1]{\\underset{#1}{\\operatorname{arg}\\,\\operatorname{max}}\\;}\\DeclareMathOperator{\\GRU}{GRU}\n\\begin{align*}\nr_t &= \\sigma(W^r[x_{t}, h_{t-1}] + b^r) \\\\\nz_t &= \\sigma(W^z[x_{t}, h_{t-1}] + b^z)\n\\\\[0.3cm]\nh'_{t} &= \\tanh(W^h[x_{t}, r_t \\odot h_{t-1}] + b^h)\n\\\\\nh_{t} &= z_t \\odot h_{t-1} + (1-z_t) \\odot h'_{t},\n\\end{align*}$$"]},{"cell_type":"code","execution_count":0,"outputs":[],"metadata":{},"source":["r_t == sigma(W**r(x_t, h_(t - 1)) + b**r) \n"," z_t == sigma(W**z(x_t, h_(t - 1)) + b**z) \n"," 0.3cm h__deriv_t == tanh(W**h(x_t, r_t odot h_(t - 1)) + b**h)\n"," \n"," h_t == z_t odot h_(t - 1) +(1 - z_t) odot h__deriv_t"]},{"cell_type":"markdown","metadata":{},"source":["![title](https://a2c.fyi/q/MSzTSWNWZU/f3.png)"]},{"cell_type":"markdown","metadata":{},"source":["Figure 4: Visualization of the attention distribution as the summary in Table ?? is generated. The words of the document are shown on the horizontal axis, from left to right. Only a limited number of document words are shown. The vertical axis shows the output words, from top to bottom, after the