Home   >   CSC-OpenAccess Library   >    Manuscript Information
A Survey on Neural Text Generation and Degeneration
Elham Madjidi, Christopher Crick
Pages - 19 - 35     |    Revised - 30-08-2023     |    Published - 01-10-2023
Volume - 14   Issue - 2    |    Publication Date - October 2023  Table of Contents
MORE INFORMATION
KEYWORDS
Natural Language Processing, Text Generation, Neural Text Degeneration, Large Language Models, Decoding Technique.
ABSTRACT
The evolution of text generation has been revolutionized by the rise of transformer-based models. This has brought about significant changes in various fields, including news, social media, and scientific research. However, there is a need for a comprehensive review that covers the historical evolution, challenges, and potential solutions in this domain. To address this gap, we have conducted a thorough survey that provides a comprehensive overview of text generation. We also investigate text degeneration, providing insights and mitigation strategies. Our survey sheds light on the current landscape of neural text generation, identifies forthcoming challenges, and highlights research areas that require exploration within the academic community.
Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). A learning algorithm for boltzmann machines. Cognitive science, 9(1), 147-169.
Appelt, D. E. (1985). Planning english referring expressions. Artificial intelligence, 26(1), 1-33.
Bahdanau, D., Cho, K. H., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015.
Banerjee, S., & Lavie, A. (2005). Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pp. 65-72.
Bannard, C., & Callison-Burch, C. (2005). Paraphrasing with bilingual parallel corpora. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pp. 597-604.
Basu, S., Ramachandran, G., Keskar, N., & Varshney, L. (2021). Mirostat: A neural text decoding algorithm that directly controls perplexity. In Proceedings of the 9th International Conference on Learning Representations (ICLR).
Bengio, S., Vinyals, O., Jaitly, N., & Shazeer, N. (2015a). Scheduled sampling for sequence prediction with recurrent neural networks. In Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., & Garnett, R. (Eds.), Advances in Neural Information Processing Systems, Vol. 28, p. 1171-1179. Curran Associates, Inc.
Bengio, S., Vinyals, O., Jaitly, N., & Shazeer, N. (2015b). Scheduled sampling for sequence prediction with recurrent neural networks. Advances in neural information processing systems, 28.
Bengio, Y., Ducharme, R., & Vincent, P. (2000). A neural probabilistic language model. Advances in neural information processing systems, 13.
Brown, P. F., Della Pietra, S. A., Della Pietra, V. J., Lai, J. C., & Mercer, R. L. (1992). An estimate of an upper bound for the entropy of english. Computational Linguistics, 18(1), 31-40.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901.
Celikyilmaz, A., Clark, E., & Gao, J. (2020). Evaluation of text generation: A survey. arXiv preprint arXiv:2006.14799.
Chen, S. F., & Goodman, J. (1999). An empirical study of smoothing techniques for language modeling. Computer Speech & Language, 13(4), 359-394.
Cho, K., Merrienboer, B., Gulcehre, C., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using rnn encoder-decoder for statistical machine translation. In EMNLP.
Choi, Y. (2018). The missing representation in neural language models. In 3rd Workshop on Representation Learning for NLP (RepL4NLP).
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H. W., Sutton, C., Gehrmann, S., et al. (2022). Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. In NIPS 2014 Workshop on Deep Learning, December 2014.
Cohen, P. R. (1979). On knowing what to say: Planning speech acts. Ph.D. thesis, ProQuest Information & Learning.
Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, pp. 160-167.
Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q., & Salakhutdinov, R. (2019). TransformerXL: Attentive language models beyond a fixed-length context. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2978-2988, Florence, Italy. Association for Computational Linguistics.
Fader, A., Zettlemoyer, L., & Etzioni, O. (2013). Paraphrase-driven learning for open question answering. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1608-1618.
Fan, A., Lewis, M., & Dauphin, Y. (2018). Hierarchical neural story generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 889-898.
Fatima, N., Imran, A. S., Kastrati, Z., Daudpota, S. M., & Soomro, A. (2022). A systematic literature review on text generation using deep neural network models. IEEE Access, 10, 53490-53503.
Graves, A. (2013). Generating sequences with recurrent neural networks. CoRR, abs/1308.0850.
Grice, H. P. (1975). Logic and conversation. In Speech acts, pp. 41-58. Brill.
Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017). On calibration of modern neural networks. In International conference on machine learning, pp. 1321-1330. PMLR.
Hashimoto, T. B., Zhang, H., & Liang, P. (2019). Unifying human and statistical evaluation for natural language generation. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 1689-1701.
He, W., He, Z., Wu, H., & Wang, H. (2016). Improved neural machine translation with smt features. In Thirtieth AAAI conference on artificial intelligence.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
Holtzman, A., Buys, J., Du, L., Forbes, M., & Choi, Y. (2019). The curious case of neural text degeneration. In International Conference on Learning Representations.
Kenton, J. D. M.-W. C., & Toutanova, L. K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, pp. 4171- 4186.
Koehn, P., & Knowles, R. (2017). Six challenges for neural machine translation. In Proceedings of the First Workshop on Neural Machine Translation, pp. 28-39.
Koehn, P., Och, F. J., & Marcu, D. (2003). Statistical phrase-based translation. Tech. rep., University of Southern California Marina Del Rey Information Sciences Inst.
Kusner, M. J., & Hernandez-Lobato, J. M. (2016). Gans for sequences of discrete elements with the gumbel-softmax distribution.. CoRR, abs/1611.04051.
Lewis, M., Liu, Y., Goyal, N., Ghazvininejad, M., Mohamed, A., Levy, O., Stoyanov, V., & Zettlemoyer, L. (2020). BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7871-7880, Online. Association for Computational Linguistics.
Li, L., Holtzman, A., Fried, D., Liang, P., Eisner, J., Hashimoto, T., Zettlemoyer, L., & Lewis, M. (2022). Contrastive decoding: Open-ended text generation as optimization. arXiv e-prints, 30.
Li, M., Roller, S., Kulikov, I., Welleck, S., Boureau, Y.-L., Cho, K., & Weston, J. (2020). Don’t say that! making inconsistent dialogue unlikely with unlikelihood training. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4715-4728.
Lin, C.-Y. (2004). Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pp. 74-81.
Liu, Y., Gu, J., Goyal, N., Li, X., Edunov, S., Ghazvininejad, M., Lewis, M., & Zettlemoyer, L. (2020). Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics, 8, 726-742.
Mann, W. C. (1983). An overview of the penman text generation system. In AAAI, pp. 261-265.
Mann, W. C., & Moore, J. A. (1981). Computer generation of multiparagraph english text. American Journal of Computational Linguistics, 7(1), 17-29.
Maynez, J., Narayan, S., Bohnet, B., & McDonald, R. (2020). On faithfulness and factuality in abstractive summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1906-1919.
McDonald, D. D. (1980). Natural language production as a process of decision-making under constraints. Ph.D. thesis, Massachusetts Institute of Technology.
McKeown, K. (1992). Text generation. Cambridge University Press.
Meister, C., Pimentel, T., Wiher, G., & Cotterell, R. (2022). Typical decoding for natural language generation. arXiv preprint arXiv:2202.00666, 30.
Mikolov, T., Deoras, A., Kombrink, S., Burget, L., & Cernocky`, J. (2011). Empirical eval-ˇ uation and combination of advanced language modeling techniques. In Proceedings of Interspeech.
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al. (2022). Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155, 32.
Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp. 311-318.
Paris, C. (2015). User modelling in text generation. Bloomsbury Publishing.
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 2227-2237, New Orleans, Louisiana. Association for Computational Linguistics.
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al. (2018). Improving language understanding by generative pre-training. 33.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners.
Ranzato, M., Chopra, S., Auli, M., & Zaremba, W. (2015). Sequence level training with recurrent neural networks. arXiv preprint arXiv:1511.06732, 28.
Riesbeck, C. K., Schank, R. C., Goldman, N. M., & Rieger III, C. J. (1975). Inference and paraphrase by computer. Journal of the ACM (JACM), 22(3), 309-328.
Rosenfeld, R. (2000). Two decades of statistical language modeling: Where do we go from here?. Proceedings of the IEEE, 88(8), 1270-1278.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. nature, 323(6088), 533-536.
Schuster, M., & Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE transactions on Signal Processing, 45(11), 2673-2681.
Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379-423.
Sheng, E., Chang, K.-W., Natarajan, P., & Peng, N. (2019). The woman worked as a babysitter: On biases in language generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3407-3412, Hong Kong, China. Association for Computational Linguistics.
Simmons, R., & Slocum, J. (1972). Generating english discourse from semantic networks. Communications of the ACM, 15(10), 891-905.
Su, Y., Lan, T., Wang, Y., Yogatama, D., Kong, L., & Collier, N. (2022). A contrastive framework for neural text generation. In Oh, A. H., Agarwal, A., Belgrave, D., & Cho, K. (Eds.), Advances in Neural Information Processing Systems.
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27.
Swartout, W. R. (1981). Explaining and justifying expert consulting programs. In Proceedings of the 7th International Joint Conference on Artificial Intelligence - Volume 2, IJCAI’81, p. 815-823, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.
Thoppilan, R., De Freitas, D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H.-T., Jin, A., Bos, T., Baker, L., Du, Y., et al. (2022). Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239.
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozi`ere, B., Goyal, N., Hambro, E., Azhar, F., et al. (2023a). Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., Bashlykov, N., Batra, S., Bhargava, P., Bhosale, S., et al. (2023b). Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L ., & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Vinyals, O., Fortunato, M., & Jaitly, N. (2015). Pointer networks. Advances in neural information processing systems, 28.
Wang, A., & Cho, K. (2019). Bert has a mouth, and it must speak: Bert as a markov random field language model. arXiv preprint arXiv:1902.04094.
Welleck, S., Kulikov, I., Roller, S., Dinan, E., Cho, K., & Weston, J. (2019a). Neural text generation with unlikelihood training. In International Conference on Learning Representations.
Welleck, S., Weston, J., Szlam, A., & Cho, K. (2019b). Dialogue natural language inference. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3731-3741.
Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., et al. (2016). Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 88.
Xie, Z. (2017). Neural text generation: A practical guide. arXiv preprint arXiv:1711.09534.
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., & Le, Q. V. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32.
Yu, L., Zhang, W., Wang, J., & Yu, Y. (2017). Seqgan: Sequence generative adversarial nets with policy gradient. In Proceedings of the AAAI conference on artificial intelligence.
Zhang, S., Dinan, E., Urbanek, J., Szlam, A., Kiela, D., & Weston, J. (2018). Personalizing dialogue agents: I have a dog, do you have pets too?. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2204-2213, Melbourne, Australia. Association for Computational Linguistics.
Miss Elham Madjidi
Department of Computer Science, Oklahoma State University, Stillwater, OK 74078 - United States of America
elham.madjidi@okstate.edu
Professor Christopher Crick
Department of Computer Science, Oklahoma State University, Stillwater, OK 74078 - United States of America


CREATE AUTHOR ACCOUNT
 
LAUNCH YOUR SPECIAL ISSUE
View all special issues >>
 
PUBLICATION VIDEOS