The impact of frame semantic enrichment of texts on the task of factoid question answering (QA) is studied in this paper. In particular, we consider different techniques for answer processing with frame semantics: the level of semantic class identification and role assignment to texts, and the fusion of frame semantic-based answerprocessing approaches with other methods used in the Text REtrieval Conference (TREC). The impact of each of these aspects on the overall performance of a QA system is analyzed in this paper. The TREC 2004 and TREC 2006 factoid question sets were used for the experiments. These demonstrate that the exploitation of encapsulated frame semantics in FrameNet in a shallow semantic parsing process can enhance answer-processing performance in factoid QA systems. This improvement is dependent on the level of semantic annotation, the frame semantic alignment method, and the method of fusing frame semantic-based answer-processing models with other existing models. A more comprehensively annotated environment with all different part-of-speech target predicates provides a higher chance of correct factoid answer retrieval where semantic alignment is based on both semantic classes and a relaxed set of semantic roles for answer span identification. Our experiments on fusion techniques of frame semantic-based and entity-based answer-processing models show that merging answer lists with respect to their scores and redundancy by exploiting a fusion function leads to a more effective overall factoid QA system compared to the use of individual models.
This study empirically evaluates the effectiveness of different feature types for the classification of the first language of an author. In particular, it examines the utility of psycholinguistic features, extracted by the Linguistic Inquiry and Word Count (LIWC) tool, that have not previously been applied to the task of author profiling. As LIWC is a tool that has been developed in the psycholinguistic field rather than the computational linguistics field, it was hypothesized that it would be effective, both as a single type feature set because of its psycholinguistic basis, and in combination with other feature sets, because it should be sufficiently different to add insight rather than redundancy. It was found that LIWC features were competitive with previously used feature types in identifying the first language of an author, and that combined feature sets including LIWC features consistently showed better accuracy rates and average F measures than were achieved by the same feature sets without the LIWC features. As a secondary issue, this study also examined how effectively first language classification scaled up to a larger number of possible languages. It was found that the classification scheme scaled up effectively to the entire 16 language collection from the International Corpus of Learner English, when compared with results achieved on just 5 languages in previous research. 2012 ASIS&T.