Establishing phishing provenance using orthographic features
- Authors: Liping, Ma , Yearwood, John , Watters, Paul
- Date: 2009
- Type: Text , Conference paper
- Relation: Paper presented at 2009 eCrime Researchers Summit, eCRIME '09, Tacoma, Washington : 20th-21st October 2009
- Full Text:
- Description: After phishing message detection, determining the provenance of phishing messages and Websites is the second step to tracing cybercriminals. In this paper, we present a novel method to cluster phishing emails automatically using orthographic features. In particular, we develop an algorithm to cluster documents and remove redundant features at the same time. After collecting all the possible features based on observation, we adapt the modified global k-mean method repeatedly, and generate the objective function values over a range of tolerance values across different subsets of features. Finally, we identify the appropriate clusters based on studying the distribution of the objective function values. Experimental evaluation of a large number of computations demonstrates that our clustering and feature selection techniques are highly effective and achieve reliable results.
- Description: 2003007842
The impact of frame semantic annotation levels, frame-alignment techniques, and fusion methods on factoid answer processing
- Authors: Ofoghi, Bahadorreza , Yearwood, John , Liping, Ma
- Date: 2009
- Type: Text , Journal article
- Relation: Journal of the American Society for Information Science and Technology Vol. 60, no. 2 (2009), p. 247-263
- Full Text: false
- Description: The impact of frame semantic enrichment of texts on the task of factoid question answering (QA) is studied in this paper. In particular, we consider different techniques for answer processing with frame semantics: the level of semantic class identification and role assignment to texts, and the fusion of frame semantic-based answerprocessing approaches with other methods used in the Text REtrieval Conference (TREC). The impact of each of these aspects on the overall performance of a QA system is analyzed in this paper. The TREC 2004 and TREC 2006 factoid question sets were used for the experiments. These demonstrate that the exploitation of encapsulated frame semantics in FrameNet in a shallow semantic parsing process can enhance answer-processing performance in factoid QA systems. This improvement is dependent on the level of semantic annotation, the frame semantic alignment method, and the method of fusing frame semantic-based answer-processing models with other existing models. A more comprehensively annotated environment with all different part-of-speech target predicates provides a higher chance of correct factoid answer retrieval where semantic alignment is based on both semantic classes and a relaxed set of semantic roles for answer span identification. Our experiments on fusion techniques of frame semantic-based and entity-based answer-processing models show that merging answer lists with respect to their scores and redundancy by exploiting a fusion function leads to a more effective overall factoid QA system compared to the use of individual models.