Home   >   CSC-OpenAccess Library   >    Manuscript Information
Explainable Topic Continuity in Political Discourse: A Sentence Pair BERT Model Analysis
Juan Francisco Reyes
Pages - 11 - 34     |    Revised - 28-02-2025     |    Published - 30-04-2025
Volume - 15   Issue - 2    |    Publication Date - April 2025  Table of Contents
MORE INFORMATION
KEYWORDS
Topic Continuity, Text Segmentation, Sentence Pair Modeling, Explainable AI, BERT, Transformers Interpret.
ABSTRACT
This study leverages Sentence Pair Modeling (SPM), BERT, and the Transformers Interpret library to analyze topic continuity in political discourse. Defined by specific linguistic features, topic continuity is crucial for understanding political communications. Using a dataset of 2,884 sentence pairs, we fine-tuned TopicContinuityBERT to focus on how these linguistic features influence topic continuity across sentences. Our analysis reveals that coreferentiality, lexical cohesion, and transitional cohesion are pivotal in maintaining thematic consistency through sentence pairs. This research enhances our understanding of political rhetoric and improves transparency in natural language processing (NLP) models, offering insights into the dynamics of political discourse.
Abdalla, M., Vishnubhotla, K., & Mohammad, S. (2023). What makes sentences semantically related? A textual relatedness dataset and empirical study. In A. Vlachos & I. Augenstein (Eds.), Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (pp. 782-796). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.eacl-main.55.
Abdolahi, M., & Zahedi, M. (2016). An overview on text coherence methods. In 2016 Eighth Conference on Information and Knowledge Technology (IKT), 1-5. https://doi.org/10.1109/IKT.2016.7777794.
Akiba, T., Sano, S., Yanase, T., Ohta, T., &Koyama, M. (2019). Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD 2019) (pp. 2623-2631). Association for Computing Machinery. https://doi.org/10.1145/3292500.3330701.
AnjaliM, K., &BabuAnto, P. (2014). Ambiguities in Natural Language Processing. International Journal of Innovative Research in Computer and Comunnication Engineering, 2, 392-394.
Arase, Y., &Tsujii, J. (2021). Transfer fine-tuning of BERT with phrasal paraphrases. Computer Speech & Language, 66, 101164. https://doi.org/10.1016/j.csl.2020.101164.
Ariel, M. (1990). Accessing Noun-Phrase Antecedents. London: Routledge.
Batubara, M., Rahila, C., &Ridaini, R. (2021). An Analysis Lexical Cohesion In Jakarta Post News. Journal of Linguistics, Literature and Language Teaching (JLLLT). https://doi.org/10.37249/jlllt.v1i1.278.
Carrell, P. (1982). Cohesion Is Not Coherence. TESOL Quarterly, 16(4), 479-488. https://doi.org/10.2307/3586466.
Chomsky, N. (1988). Language and Politics (C. P. Otero, Ed.). Black Rose Books.
Davison, A. (1984). Syntactic markedness and the definition of sentence topic. Language, 60(4), 797-846. https://doi.org/10.1353/LAN.1984.0012.
Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171-4186. https://doi.org/10.18653/v1/N19-1423.
Drew, P., & Heritage, J. (1992). Analyzing talk at work: An introduction. In P. Drew & J. Heritage (Eds.), Talk at work: Interaction in institutional settings (pp. 3-65). Cambridge University Press.
Duan, Z., Tan, S., Zhao, S., Wang, Q., Chen, J., & Zhang, Y. (2019). Reviewer assignment based on sentence pair modeling. Neurocomputing, 366, 97-108. https://doi.org/10.1016/J.NEUCOM.2019.06.074.
Fan, Y., Jiang, F., Li, P., & Li, H. (2024). Uncovering the potential of ChatGPT for discourse analysis in dialogue: An empirical study. In N. Calzolari, M.-Y. Kan, V. Hoste, A. Lenci, S. Sakti, & N. Xue (Eds.), Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 16998-17010). ELRA and ICCL. https://aclanthology.org/2024.lrec-main.1477/.
Fletcher, C. (1984). Markedness and topic continuity in discourse processing. Journal of Verbal Learning and Verbal Behavior, 23(4), 487-493. https://doi.org/10.1016/S0022-5371(84)90309-8.
Gale, W., & Church, K. (1993). A Program for Aligning Sentences in Bilingual Corpora. Comput. Linguistics, 19, 75-102.
Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W., Wallach, H., Daumé III, H., & Crawford, K. (2018). Datasheets for datasets. arXiv preprint. https://arxiv.org/abs/1803.09010.
Givón, T. (1983). Topic continuity in discourse: A quantitative cross-language study. Amsterdam: John Benjamins. https://doi.org/10.1075/tsl.3.
Givón, T. (1995). Coherence in Text vs. Coherence in Mind. In M. A. Gernsbacher& T. Givón (Eds.), Coherence in Spontaneous Text (pp. 59-115). John Benjamins Publishing Company. https://doi.org/10.1075/pc.1.2.01giv.
Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, & Computers, 36(2), 193-202. https://doi.org/10.3758/BF03195564.
Greenspan, S., & Segal, E. (1984). Reference and comprehension: A topic-comment analysis of sentence-picture verification. Cognitive Psychology, 16(4), 556-606. https://doi.org/10.1016/0010-0285(84)90020-3.
Halliday, M. A. K., & Hasan, R. (1976). Cohesion in English. London: Longman.
Halliday, M. A. K., & Matthiessen, C. (2014). Halliday's Introduction to Functional Grammar (4th ed.). Routledge.
Haruno, M., & Yamazaki, T. (1996). High-Performance Bilingual Text Alignment Using Statistical and Dictionary Information. https://doi.org/10.3115/981863.981881.
Honnibal, M., Montani, I., Van Landeghem S., & Boyd A. (2020). spaCy: Industrial-strength Natural Language Processing in Python. https://dx.doi.org/10.5281/zenodo.1212303.
Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9(3), 90-95. https://doi.org/10.1109/MCSE.2007.55.
Kokhlikyan, N., Miglani, V., Martin, M., Wang, E., Alsallakh, B., Reynolds, J., Melnikov, A., Kliushkina, N., Araya, C., Yan, S., & Reblitz-Richardson, O. (2020). Captum: A unified and generic model interpretability library for PyTorch. arXiv. https://arxiv.org/abs/2009.07896.
Ledoux, K., Gordon, P., Camblin, C., &Swaab, T. (2007). Coreference and lexical repetition: Mechanisms of discourse integration. Memory & Cognition, 35, 801-815. https://doi.org/10.3758/BF03193316.
Melamed, I. (1999). Bitext Maps and Alignment via Pattern Recognition. Comput. Linguistics, 25, 107-130.
Newman, E., Stokes, N., Dunnion, J., & Carthy, J. (2005). Textual Entailment Recognition Using a Linguistically-Motivated Decision Tree Classifier. In Machine Learning Challenges. Evaluating Predictive Uncertainty, Visual Object Classification, and Recognising Textual Entailment (MLCW 2005), 372-384. https://doi.org/10.1007/11736790_21.
Orwell, G. (1946). Politics and the English Language. Horizon.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., & Chintala, S. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019) (8026-8037). https://doi.org/10.48550/arXiv.1912.01703.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Louppe, G., Prettenhofer, P., Weiss, R., Weiss, R., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., &Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. https://doi.org/10.48550/arXiv.1201.0490.
Peng, Q., Weir, D., & Weeds, J. (2023). Testing Paraphrase Models on Recognising Sentence Pairs at Different Degrees of Semantic Overlap. In Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023), 259-269. https://doi.org/10.18653/v1/2023.starsem-1.24.
Peters, G., & Woolley, J. T. (n.d.). The American Presidency Project. University of California, Santa Barbara. https://www.presidency.ucsb.edu/.
Pi, S.-T., Bagavan, P., Li, Y., Disha, D., & Liu, Q. (2024). Don't shoot the breeze: Topic continuity model using nonlinear naive Bayes with attention. In F. Dernoncourt, D. Preoţiuc-Pietro, & A. Shimorina (Eds.), Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track (pp. 65–72). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.emnlp-industry.6.
Pierse, C. D. (2024). Transformers Interpret. GitHub repository. https://github.com/cdpierse/transformers-interpret.
Putri, A., & Sudaryat, Y. (2021). GrammaticalCohesion in Moh. Sanoesi's Siti Rayati. Proceedings of the Fifth International Conference on Language, Literature, Culture, and Education (ICOLLITE 2021). https://doi.org/10.2991/assehr.k.211119.007.
Reyes, J. F. (2023a). webCrawler, a web crawler for political discourse texts [Source code]. GitHub repository. https://github.com/pacoreyes/webCrawler.
Reyes, J. F. (2023b). annotationNLP, a web application for annotating NLP datasets [Source code]. GitHub repository. https://github.com/pacoreyes/annotationNLP.
Ryu, J., & Jeon, M. (2023). An analysis of the inter-grade continuity of the reading passages of high school English mock CSAT tests using Coh-Metrix. The English Teachers Association in Korea. https://doi.org/10.35828/etak.2023.29.2.41.
Schiffrin, D. (1994). Approaches to discourse. Blackwell Publishers. https://archive.org/details/approachestodisc0000schi.
Shen, G., Yang, Y., & Deng, Z. (2017). Inter-weighted Alignment Network for Sentence Pair Modeling. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 1179-1189. https://doi.org/10.18653/v1/D17-1122.
Sidnell, J. (2010). Conversation Analysis: An Introduction. Wiley-Blackwell.
Tang, N., & Moindjie, M. (2024). Lexical Cohesion in English-Chinese Business Translation: Human Translators Versus ChatGPT. World Journal of English Language. https://doi.org/10.5430/wjel.v15n2p286.
Tannen, D. (Ed.). (1984). Coherence in Spoken and Written Discourse. Ablex Publishing.
The pandas development team (2020). pandas-dev/pandas: Pandas. Zenodo. https://doi.org/10.5281/zenodo.3509134.
Van Dijk, T. A. (1980). Macrostructures: An Interdisciplinary Study of Global Structures in Discourse, Interaction, and Cognition (1st ed.). Routledge. https://doi.org/10.4324/9780429025532.
Wang, K., Zhao, X., Li, Y., & Peng, W. (2023). M3Seg: A maximum-minimum mutual information paradigm for unsupervised topic segmentation in ASR transcripts. In H. Bouamor, J. Pino, & K. Bali (Eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 7928-7934). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.492.
Waskom, M. (2021). Seaborn: Statistical Data Visualization. Journal of Open Source Software, 6(60), 3021. https://doi.org/10.21105/joss.03021.
Xu, S., Shijia, E., & Xiang, Y. (2020). Enhanced attentive convolutional neural networks for sentence pair modeling. Expert Systems with Applications, 151. https://doi.org/10.1016/j.eswa.2020.113384.
Yang, Y., Qi, S., Liu, C., Wang, Q., Gao, C., & Xu, Z. (2023). Once is enough: A light-weight cross-attention for fast sentence pair modeling. In H. Bouamor, J. Pino, & K. Bali (Eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (pp. 2800-2806). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.168.
Yin, W., Schütze, H., Xiang, B., & Zhou, B. (2016). ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs. In Transactions of the Association for Computational Linguistics, 4, 259-272. https://doi.org/10.1162/tacl_a_00097.
Yu, R., Lu, W., Lu, H., Wang, S., Li, F., Zhang, X., & Yu, J. (2021). Sentence pair modeling based on semantic feature map for human interaction with loT devices. International Journal of Machine Learning and Cybernetics, 12, 3081-3099. https://doi.org/10.1007/s13042-021-01349-x.
Yu, S., Su, J., &Luo, D. (2019). Improving BERT-Based Text Classification With Auxiliary Sentence and Domain Knowledge. IEEE Access, 7, 176600-176612. https://doi.org/10.1109/ACCESS.2019.2953990.
Zhao, J., Lan, M., Niu, Z., & Lu, Y. (2015). Integrating word embeddings and traditional NLP features to measure textual entailment and semantic relatedness of sentence pairs. In 2015 International Joint Conference on Neural Networks (IJCNN), 1-7. https://doi.org/10.1109/IJCNN.2015.7280462.
Mr. Juan Francisco Reyes
Institute of Computer Science, Brandenburgische Technische Universität Cottbus-Senftenberg, Cottbus, 03046 - Germany
pacoreyesp@gmail.com


CREATE AUTHOR ACCOUNT
 
LAUNCH YOUR SPECIAL ISSUE
View all special issues >>
 
PUBLICATION VIDEOS