A Technique for Discovering Similarities between Texts Based on Extracting Features from the Text

Jihad, Alaa Abdalqahar; Hamad, Mortadha M.

doi:10.37652/juaps.2022.171876

Document Type : Research Paper

Authors

¹ Computer Center, University of Anbar

² ollege of Computer Sciences and IT , University of Anbar

10.37652/juaps.2022.171876

Abstract

The discovery of the similarity between two texts is very important and useful in many applications. The similarity between texts is the core research area of dataset, data warehouse, and data mining. This paper provides a framework that gives a similarity between two input texts based on pattern recognition and the use of approximate string matching; there is a weight that affects the proportion of similarity. The search compares the similarity of two texts without adherence to the grammar or the use of synonyms or meanings of words. Preliminary results showed the benefit of extracting some of the features in the discovery of the similarity between the texts.

Keywords

Main Subjects

Computer

References

[1] F. Sebastiani, “Machine learning in automated text categorization,”ACM Computing Surveys, vol. 34, no. 1, pp. 1–47, 2002.

[2] Mohammad A. Al-Ramahi , Suleiman H. Mustafa, .2012. N-Gram-Based Techniques for Arabic Text Document Matching; Case Study: Courses Accreditation. Abhath AL-Yarmouk: "Basic Sci. & Eng, 21( 1), pp: 85-105.

[3] M.K.Vijaymeena1 and K.Kavitha, A Survey on Similarity Measures in Text Mining, Machine Learning and Applications, Machine Learning and Applications: An International Journal (MLAIJ) 3(1), (2016) pp. 19-28.

[4] Harrag, F., Al-Qawasmah, E., 2010. Improving Arabic text categorization using neural network with SVD. JDIM 8 (4), 233–239.

[5] AMINUL I. and DIANA I., Semantic Text Similarity Using Corpus-Based Word Similarity and String Similarity, University of Ottawa, ACM Transactions on Knowledge Discovery from Data, Vol. 2, No. 2, July 2008.

[6] Cer, Daniel & Diab, Mona & Agirre, Eneko & Nigo Lopez-Gazpio, ˜ & Specia, Lucia. (2017). SemEval-2017 Task 1: Semantic Textual Similarity - Multilingual and Cross-lingual Focused Evaluation. 10.18653/v1/S17-2001.

[7] Suhad M., Aseel Q., Finding the Similarity between Two Arabic Texts, Iraqi Journal of Science, 2017, Vol. 58, No.1A, pp: 152-162.

[8] V Sharapova, E & V Sharapov, R. (2018). The problem of fuzzy duplicate detection of large texts. 270-277. 10.18287/1613-0073-2018-2212-270-277.

[9] Pawar, Atish & Mago, Vijay. (2018). Calculating the similarity between words and sentences using a lexical database and corpus statistics.

[10] Wang, Yue & Di, Xiaoqiang & Li, Jinqing & Yang, Huamin & Bi, Lin. (2018). Sentence Similarity Learning Method based on Attention Hybrid Model. Journal of Physics: Conference Series. 1069. 012119. 10.1088/1742-6596/1069/1/012119.

[11] Ramaprabha, J & Das, Sayan & Mukerjee, Pronay. (2018). Survey on Sentence Similarity Evaluation using Deep Learning. Journal of Physics: Conference Series. 1000. 012070. 10.1088/1742-6596/1000/1/012070.

[12] E. Ukkonen. Approximate string matching with q-grams and maximal matches. Theor. Comput. Sci., 92(1):191–212, 1992.

[13] Dice, L. R., Measures of the amount of ecologic association between species, Ecology, 26:297-302, 1945.

Journal of University of Anbar for Pure Science

A Technique for Discovering Similarities between Texts Based on Extracting Features from the Text

References

References

Volume 13, Issue 1
April 2019
Page 50-54

A Technique for Discovering Similarities between Texts Based on Extracting Features from the Text

References

References

Volume 13, Issue 1April 2019Page 50-54

Volume 13, Issue 1
April 2019
Page 50-54