Articles | Open Access |

CLUSTERED KNOWLEDGE: ENHANCING MULTI-DOCUMENT ARABIC TEXT SUMMARIZATION THROUGH KEYPHRASE EXTRACTION

Hamzah Omar , Center for AI Technology, Faculty of Information Science and Technology, University Kebangsaan Malaysia (UKM), Bangi, Selangor, Malaysia

Abstract

Multi-document text summarization is a critical task in information retrieval and natural language processing, aimed at distilling essential information from a collection of documents. Arabic, a complex and rich language, presents unique challenges for this task due to its morphology and syntax. In this paper, we propose a novel approach called "Clustered Knowledge" to enhance multi-document Arabic text summarization. Our approach leverages keyphrase extraction techniques to identify and cluster essential concepts from the input documents, facilitating the generation of coherent and informative summaries. We present experimental results demonstrating the effectiveness of our method on a diverse set of Arabic corpora, outperforming existing summarization techniques. "Clustered Knowledge" not only improves the quality of Arabic text summaries but also contributes to better content organization, making it a valuable tool for information retrieval in Arabic content-rich environments.

Keywords

Multi-Document Summarization, Arabic Text Summarization, Keyphrase Extraction

References

Cimiano, P., A. Hotho and S. Staab, 2005. Learning concept hierarchies from text corpora using formal concept analysis. J. Artif. Intell. Res., 24: 305-339.

Conroy, J.M., J.D. Schlesinger, D.P. O'Leary and J. Goldstein, 2006. Back to basics: CLASSY 2006. Proceedings of the 6th Document Understanding Conferences, November 2006, New York.

Douzidia, F.S. and G. Lapalme, 2004. Lakhas, an arabic summarising system. Proceedings of the 4th Document Understanding Conferences, May 2004, Rochester, pp: 128-135.

El-Haj, M., U. Kruschwitz and C. Fox, 2011a. Multi-document arabic text summarisation. Proceedings of the 3rd Computer Science and Electronic Engineering Conference, July 13-14, 2011, Colchester, UK., pp: 40-44.

El-Haj, M., U. Kruschwitz and C. Fox, 2011b. University of essex at the TAC 2011 multilingual summarisation pilot. Proceedings of the Text Analysis Conference, November 14-15, 2011, Pilot, Maryland, USA.

El-Haj, M., 2012. Multi-document arabic text summarisation. Ph.D. Thesis, University of Essex, UK.

Fiszman, M., D. Demner-Fushman, H. Kilicoglu and T.C. Rindflesch, 2009. Automatic summarization of MEDLINE citations for evidence-based medical treatment: A topic-oriented evaluation. J. Biomed. Inform., 42: 801-813.

Giannakopoulos, G., V. Karkaletsis, G. Vouros and P. Stamatopoulos, 2008. Summarization system evaluation revisited: N-gram graphs. ACM Trans. Speech Language Process., Vol. 5. 10.1145/1410358.1410359.

Hartigan, J.A., 1975. Clustering Algorithms. Books on Demand, New York, USA., ISBN-13: 9780608300498, Pages: 365.

Hirao, T., M. Okumura, N. Yasuda and H. Isozaki, 2007. Supervised automatic evaluation for summarization with voted regression model. Inform. Process. Manage., 43: 1521-1535.

Huang, A., 2008. Similarity measures for text document clustering. Proceedings of the New Zealand Computer Science Research Student Conference, April 14-17, 2008, Christchurch, New Zealand, pp: 49-56.

Article Statistics

Downloads

Download data is not yet available.

Copyright License

Download Citations

How to Cite

CLUSTERED KNOWLEDGE: ENHANCING MULTI-DOCUMENT ARABIC TEXT SUMMARIZATION THROUGH KEYPHRASE EXTRACTION. (2022). International Journal of Artificial Intelligence, 2(03), 01-05. https://www.academicpublishers.org/journals/index.php/ijai/article/view/55