• Home
  • Chitra Dadkhah

    List of Articles Chitra Dadkhah


  • Article

    1 - Designing a Semi-Intelligent Crawler for Creating a Persian Question Answering Corpus Called Popfa
    Journal of Information Systems and Telecommunication (JIST) , Issue 2 , Year , Spring 2024
    Question answering in natural language processing is an interesting field for researchers to examine their ability in solving the tough Alan Turing test. Every day computer scientists are trying hard to develop and promote question answering systems in various natural l More
    Question answering in natural language processing is an interesting field for researchers to examine their ability in solving the tough Alan Turing test. Every day computer scientists are trying hard to develop and promote question answering systems in various natural languages, especially English. However, in Persian, it is not easy to advance these systems. The main problem is related to low resources and not enough corpora in this language. Thus, in this paper, a Persian question answering text corpus is created, which covers a wide range of religious, midwifery, and issues related to youth marriage topics and question types commonly encountered in Persian language usage. In this regard, the most important challenge was introducing a method for data gathering in Persian as well as facilitating and expanding the data gathering process. Though, SIC (Semi-Intelligent Crawler) is proposed as a solution that can overcome the challenge and find a way to crawl the Persian websites, gather text and finally import it to a database. The outcome of this research is a corpus called Popfa, which stands for POrsesh Pasokh (question answering) in FArsi. This corpus contains more than 53,000 standard questions and answers. Besides, it has been evaluated with standard approaches. All the questions in Popfa are answered by specialists in two general topics: religious and medical questions. Therefore, researchers can now use this corpus for doing research on Persian question answering. Manuscript profile