toyhilt.blogg.se - Auto summarize records

AUTO SUMMARIZE RECORDS UPDATE

EvaluationĮxamples from the human-evaluation study, featuring input texts and summaries produced using both MLE and the ConSeq model, which is trained using QUALS.Īs baselines for the evaluation of our approach, we used two models. One group, S +, contains ground truth summaries that have high QUALS scores (indicating factually accurate summaries) the other, S - contains generated summaries that have low QUALS scores (indicating factually inaccurate summaries).įinally, we retrain the summarization model, using a loss function that encourages it to generate summaries like those in S + and discourages it from generating summaries like those in S. Next, we use the trained model to generate new summaries for all the source texts in the training data and create two different groups of summaries. First, we train a summarization model using the standard approach, which uses maximum-likelihood estimation (MLE) to approximate a phrasal-overlap score. So in our paper, we propose contrastive learning as a method for using QUALS to train a summarization model.

AUTO SUMMARIZE RECORDS UPDATE

Differences in QUALS score can’t simply be back-propagated through the QAGen model to update the summarization model. The QUALS score gives us an efficiently computable measure of a summary’s factual accuracy, but using it to train a machine learning model is not straightforward. The large probability differences for the answer in the right-hand example give it a much lower QUALS score (-2.615) than the right-hand example (-0.054). Probabilities per token (words and other standalone symbols) of two different question-answer pairs, based on a summary (blue) and an input document (orange). Intuitively, the discrepancy suggests that the question-answer pair was plausible for the summary but not in the source text, indicating factual inconsistency. When, for the source text, the probability of generating a particular question-answer pair is small compared to the probability for the summary, the QUALS will be low. We use the resulting tree to calculate the probabilities of the same question-answer pairs we extracted for the summary. Next, we feed the source text on which the summary is based to the QAGen model. Then we throw out all the question-answer pairs whose answers are not sequences of words found in the summary. Our search algorithm ensures that we explore diverse paths through the tree, in order to generate a variety of candidate questions and answers. The output of the QAGen model for a given input can be thought of as a huge tree, in which the nodes are words and each edge encodes the likelihood that a particular word will be followed by another word.įor a given summary, we search the resulting tree to produce 60 high-probability question-and-answer pairs. It takes a text as input and generates question-and-answer pairs pertaining to it. That one model is the joint question-and-answer generation (QAGen) model that members of our group presented at last year’s ACL. Our approach, which we call QUALS (for question answering with language model score for summarization), reduces the number of models to one, which makes it 55 times as fast as QAGS.

But it requires the sequential application of three different deep-learning networks, which is inefficient.

If they cause different answers, then the summary has probably garbled some facts.īy accounting for factual accuracy, QAGS offers a better assessment of summary quality than metrics based on phrasal overlap. The intuition is that if both the summary and the source text cause the question-answering model to answer the questions in the same way, the summary is factually accurate. The final score assesses the similarity between the answers based on the source text and the answers based on the summary.