Optimality conditions via weak subdifferentials in reflexive Banach spaces

- Hassani, Sara, Mammadov, Musa, Jamshidi, Mina

**Authors:**Hassani, Sara , Mammadov, Musa , Jamshidi, Mina**Date:**2017**Type:**Text , Journal article**Relation:**Turkish Journal of Mathematics Vol. 41, no. 1 (2017), p. 1-8**Full Text:****Reviewed:****Description:**In this paper the relation between the weak subdifferentials and the directional derivatives, as well as optimality conditions for nonconvex optimization problems in reflexive Banach spaces, are investigated. It partly generalizes several related results obtained for finite dimensional spaces. © Tübitak.

**Authors:**Hassani, Sara , Mammadov, Musa , Jamshidi, Mina**Date:**2017**Type:**Text , Journal article**Relation:**Turkish Journal of Mathematics Vol. 41, no. 1 (2017), p. 1-8**Full Text:****Reviewed:****Description:**In this paper the relation between the weak subdifferentials and the directional derivatives, as well as optimality conditions for nonconvex optimization problems in reflexive Banach spaces, are investigated. It partly generalizes several related results obtained for finite dimensional spaces. © Tübitak.

A generalization of a theorem of Arrow, Barankin and Blackwell to a nonconvex case

- Kasimbeyli, Nergiz, Kasimbeyli, Refail, Mammadov, Musa

**Authors:**Kasimbeyli, Nergiz , Kasimbeyli, Refail , Mammadov, Musa**Date:**2016**Type:**Text , Journal article**Relation:**Optimization Vol. 65, no. 5 (May 2016), p. 937-945**Full Text:****Reviewed:****Description:**The paper presents a generalization of a known density theorem of Arrow, Barankin, and Blackwell for properly efficient points defined as support points of sets with respect to monotonically increasing sublinear functions. This result is shown to hold for nonconvex sets of a partially ordered reflexive Banach space.

**Authors:**Kasimbeyli, Nergiz , Kasimbeyli, Refail , Mammadov, Musa**Date:**2016**Type:**Text , Journal article**Relation:**Optimization Vol. 65, no. 5 (May 2016), p. 937-945**Full Text:****Reviewed:****Description:**The paper presents a generalization of a known density theorem of Arrow, Barankin, and Blackwell for properly efficient points defined as support points of sets with respect to monotonically increasing sublinear functions. This result is shown to hold for nonconvex sets of a partially ordered reflexive Banach space.

Predicting and controlling the dynamics of infectious diseases

- Evans, Robin, Mammadov, Musa

**Authors:**Evans, Robin , Mammadov, Musa**Date:**2015**Type:**Text , Conference proceedings**Relation:**54th IEEE Conference on Decision and Control, CDC 2015; Osaka, Japan; 15th-18th December 2015; Published in Proceedings of the IEEE Conference on Decision and Control; p. 5378-5383**Full Text:****Description:**This paper introduces a new optimal control model to describe and control the dynamics of infectious diseases. In the present model, the average time to isolation (i.e. hospitalization) of infectious population is the main time-dependent parameter that defines the spread of infection. All the preventive measures aim to decrease the average time to isolation under given constraints. The model suggested allows one to generate a small number of possible future scenarios and to determine corresponding trajectories of infected population in different regions. Then, this information is used to find an optimal distribution of bed capabilities across countries/regions according to each scenario. © 2015 IEEE.

**Authors:**Evans, Robin , Mammadov, Musa**Date:**2015**Type:**Text , Conference proceedings**Relation:**54th IEEE Conference on Decision and Control, CDC 2015; Osaka, Japan; 15th-18th December 2015; Published in Proceedings of the IEEE Conference on Decision and Control; p. 5378-5383**Full Text:****Description:**This paper introduces a new optimal control model to describe and control the dynamics of infectious diseases. In the present model, the average time to isolation (i.e. hospitalization) of infectious population is the main time-dependent parameter that defines the spread of infection. All the preventive measures aim to decrease the average time to isolation under given constraints. The model suggested allows one to generate a small number of possible future scenarios and to determine corresponding trajectories of infected population in different regions. Then, this information is used to find an optimal distribution of bed capabilities across countries/regions according to each scenario. © 2015 IEEE.

Structure learning of Bayesian Networks using global optimization with applications in data classification

- Taheri, Sona, Mammadov, Musa

**Authors:**Taheri, Sona , Mammadov, Musa**Date:**2014**Type:**Text , Journal article**Relation:**Optimization Letters Vol. 9, no. 5 (2014), p. 931-948**Full Text:****Reviewed:****Description:**Bayesian Networks are increasingly popular methods of modeling uncertainty in artificial intelligence and machine learning. A Bayesian Network consists of a directed acyclic graph in which each node represents a variable and each arc represents probabilistic dependency between two variables. Constructing a Bayesian Network from data is a learning process that consists of two steps: learning structure and learning parameter. Learning a network structure from data is the most difficult task in this process. This paper presents a new algorithm for constructing an optimal structure for Bayesian Networks based on optimization. The algorithm has two major parts. First, we define an optimization model to find the better network graphs. Then, we apply an optimization approach for removing possible cycles from the directed graphs obtained in the first part which is the first of its kind in the literature. The main advantage of the proposed method is that the maximal number of parents for variables is not fixed a priory and it is defined during the optimization procedure. It also considers all networks including cyclic ones and then choose a best structure by applying a global optimization method. To show the efficiency of the algorithm, several closely related algorithms including unrestricted dependency Bayesian Network algorithm, as well as, benchmarks algorithms SVM and C4.5 are employed for comparison. We apply these algorithms on data classification; data sets are taken from the UCI machine learning repository and the LIBSVM. © 2014, Springer-Verlag Berlin Heidelberg.

**Authors:**Taheri, Sona , Mammadov, Musa**Date:**2014**Type:**Text , Journal article**Relation:**Optimization Letters Vol. 9, no. 5 (2014), p. 931-948**Full Text:****Reviewed:****Description:**Bayesian Networks are increasingly popular methods of modeling uncertainty in artificial intelligence and machine learning. A Bayesian Network consists of a directed acyclic graph in which each node represents a variable and each arc represents probabilistic dependency between two variables. Constructing a Bayesian Network from data is a learning process that consists of two steps: learning structure and learning parameter. Learning a network structure from data is the most difficult task in this process. This paper presents a new algorithm for constructing an optimal structure for Bayesian Networks based on optimization. The algorithm has two major parts. First, we define an optimization model to find the better network graphs. Then, we apply an optimization approach for removing possible cycles from the directed graphs obtained in the first part which is the first of its kind in the literature. The main advantage of the proposed method is that the maximal number of parents for variables is not fixed a priory and it is defined during the optimization procedure. It also considers all networks including cyclic ones and then choose a best structure by applying a global optimization method. To show the efficiency of the algorithm, several closely related algorithms including unrestricted dependency Bayesian Network algorithm, as well as, benchmarks algorithms SVM and C4.5 are employed for comparison. We apply these algorithms on data classification; data sets are taken from the UCI machine learning repository and the LIBSVM. © 2014, Springer-Verlag Berlin Heidelberg.

Turnpike theorem for an infinite horizon optimal control problem with time delay

**Authors:**Mammadov, Musa**Date:**2014**Type:**Text , Journal article**Relation:**SIAM Journal on Control and Optimization Vol. 52, no. 1 (2014), p. 420-438**Full Text:****Reviewed:****Description:**An optimal control problem for systems described by a special class of nonlinear differential equations with time delay is considered. The cost functional adopted could be considered as an analogue of the terminal functional defined over an infinite time horizon. The existence of optimal solutions as well as the asymptotic stability of optimal trajectories (that is, the turnpike property) are established under some quite mild restrictions on the nonlinearities of the functions involved in the description of the problem. Such mild restrictions on the nonlinearities allowed us to apply these results to a blood cell production model. Â© 2014 Society for Industrial and Applied Mathematics.

**Authors:**Mammadov, Musa**Date:**2014**Type:**Text , Journal article**Relation:**SIAM Journal on Control and Optimization Vol. 52, no. 1 (2014), p. 420-438**Full Text:****Reviewed:****Description:**An optimal control problem for systems described by a special class of nonlinear differential equations with time delay is considered. The cost functional adopted could be considered as an analogue of the terminal functional defined over an infinite time horizon. The existence of optimal solutions as well as the asymptotic stability of optimal trajectories (that is, the turnpike property) are established under some quite mild restrictions on the nonlinearities of the functions involved in the description of the problem. Such mild restrictions on the nonlinearities allowed us to apply these results to a blood cell production model. Â© 2014 Society for Industrial and Applied Mathematics.

A new auxiliary function method for general constrained global optimization

- Wu, Zhiyou, Bai, Fusheng, Yang, Yongjian, Mammadov, Musa

**Authors:**Wu, Zhiyou , Bai, Fusheng , Yang, Yongjian , Mammadov, Musa**Date:**2013**Type:**Text , Journal article**Relation:**Optimization Vol. 62, no. 2 (2013), p. 193-210**Full Text:****Reviewed:****Description:**In this article, we first propose a method to obtain an approximate feasible point for general constrained global optimization problems (with both inequality and equality constraints). Then we propose an auxiliary function method to obtain a global minimizer or an approximate global minimizer with a required precision for general global optimization problems by locally solving some unconstrained programming problems. Some numerical examples are reported to demonstrate the efficiency of the present optimization method. © 2013 Taylor & Francis.**Description:**2003011103

**Authors:**Wu, Zhiyou , Bai, Fusheng , Yang, Yongjian , Mammadov, Musa**Date:**2013**Type:**Text , Journal article**Relation:**Optimization Vol. 62, no. 2 (2013), p. 193-210**Full Text:****Reviewed:****Description:**In this article, we first propose a method to obtain an approximate feasible point for general constrained global optimization problems (with both inequality and equality constraints). Then we propose an auxiliary function method to obtain a global minimizer or an approximate global minimizer with a required precision for general global optimization problems by locally solving some unconstrained programming problems. Some numerical examples are reported to demonstrate the efficiency of the present optimization method. © 2013 Taylor & Francis.**Description:**2003011103

Attribute weighted Naive Bayes classifier using a local optimization

- Taheri, Sona, Yearwood, John, Mammadov, Musa, Seifollahi, Sattar

**Authors:**Taheri, Sona , Yearwood, John , Mammadov, Musa , Seifollahi, Sattar**Date:**2013**Type:**Text , Journal article**Relation:**Neural Computing & Applications Vol. , no. (2013), p. 1**Full Text:****Reviewed:****Description:**The Naive Bayes classifier is a popular classification technique for data mining and machine learning. It has been shown to be very effective on a variety of data classification problems. However, the strong assumption that all attributes are conditionally independent given the class is often violated in real-world applications. Numerous methods have been proposed in order to improve the performance of the Naive Bayes classifier by alleviating the attribute independence assumption. However, violation of the independence assumption can increase the expected error. Another alternative is assigning the weights for attributes. In this paper, we propose a novel attribute weighted Naive Bayes classifier by considering weights to the conditional probabilities. An objective function is modeled and taken into account, which is based on the structure of the Naive Bayes classifier and the attribute weights. The optimal weights are determined by a local optimization method using the quasisecant method. In the proposed approach, the Naive Bayes classifier is taken as a starting point. We report the results of numerical experiments on several real-world data sets in binary classification, which show the efficiency of the proposed method.

**Authors:**Taheri, Sona , Yearwood, John , Mammadov, Musa , Seifollahi, Sattar**Date:**2013**Type:**Text , Journal article**Relation:**Neural Computing & Applications Vol. , no. (2013), p. 1**Full Text:****Reviewed:****Description:**The Naive Bayes classifier is a popular classification technique for data mining and machine learning. It has been shown to be very effective on a variety of data classification problems. However, the strong assumption that all attributes are conditionally independent given the class is often violated in real-world applications. Numerous methods have been proposed in order to improve the performance of the Naive Bayes classifier by alleviating the attribute independence assumption. However, violation of the independence assumption can increase the expected error. Another alternative is assigning the weights for attributes. In this paper, we propose a novel attribute weighted Naive Bayes classifier by considering weights to the conditional probabilities. An objective function is modeled and taken into account, which is based on the structure of the Naive Bayes classifier and the attribute weights. The optimal weights are determined by a local optimization method using the quasisecant method. In the proposed approach, the Naive Bayes classifier is taken as a starting point. We report the results of numerical experiments on several real-world data sets in binary classification, which show the efficiency of the proposed method.

Globally convergent algorithms for solving unconstrained optimization problems

- Taheri, Sona, Mammadov, Musa, Seifollahi, Sattar

**Authors:**Taheri, Sona , Mammadov, Musa , Seifollahi, Sattar**Date:**2013**Type:**Text , Journal article**Relation:**Optimization Vol. , no. (2013), p. 1-15**Full Text:****Reviewed:****Description:**New algorithms for solving unconstrained optimization problems are presented based on the idea of combining two types of descent directions: the direction of anti-gradient and either the Newton or quasi-Newton directions. The use of latter directions allows one to improve the convergence rate. Global and superlinear convergence properties of these algorithms are established. Numerical experiments using some unconstrained test problems are reported. Also, the proposed algorithms are compared with some existing similar methods using results of experiments. This comparison demonstrates the efficiency of the proposed combined methods.

**Authors:**Taheri, Sona , Mammadov, Musa , Seifollahi, Sattar**Date:**2013**Type:**Text , Journal article**Relation:**Optimization Vol. , no. (2013), p. 1-15**Full Text:****Reviewed:****Description:**New algorithms for solving unconstrained optimization problems are presented based on the idea of combining two types of descent directions: the direction of anti-gradient and either the Newton or quasi-Newton directions. The use of latter directions allows one to improve the convergence rate. Global and superlinear convergence properties of these algorithms are established. Numerical experiments using some unconstrained test problems are reported. Also, the proposed algorithms are compared with some existing similar methods using results of experiments. This comparison demonstrates the efficiency of the proposed combined methods.

Learning the naive bayes classifier with optimization models

- Taheri, Sona, Mammadov, Musa

**Authors:**Taheri, Sona , Mammadov, Musa**Date:**2013**Type:**Text , Journal article**Relation:**International Journal of Applied Mathematics and Computer Science Vol. 23, no. 4 (2013), p. 787-795**Full Text:****Reviewed:****Description:**Naive Bayes is among the simplest probabilistic classifiers. It often performs surprisingly well in many real world applications, despite the strong assumption that all features are conditionally independent given the class. In the learning process of this classifier with the known structure, class probabilities and conditional probabilities are calculated using training data, and then values of these probabilities are used to classify new observations. In this paper, we introduce three novel optimization models for the naive Bayes classifier where both class probabilities and conditional probabilities are considered as variables. The values of these variables are found by solving the corresponding optimization problems. Numerical experiments are conducted on several real world binary classification data sets, where continuous features are discretized by applying three different methods. The performances of these models are compared with the naive Bayes classifier, tree augmented naive Bayes, the SVM, C4.5 and the nearest neighbor classifier. The obtained results demonstrate that the proposed models can significantly improve the performance of the naive Bayes classifier, yet at the same time maintain its simple structure.

**Authors:**Taheri, Sona , Mammadov, Musa**Date:**2013**Type:**Text , Journal article**Relation:**International Journal of Applied Mathematics and Computer Science Vol. 23, no. 4 (2013), p. 787-795**Full Text:****Reviewed:****Description:**Naive Bayes is among the simplest probabilistic classifiers. It often performs surprisingly well in many real world applications, despite the strong assumption that all features are conditionally independent given the class. In the learning process of this classifier with the known structure, class probabilities and conditional probabilities are calculated using training data, and then values of these probabilities are used to classify new observations. In this paper, we introduce three novel optimization models for the naive Bayes classifier where both class probabilities and conditional probabilities are considered as variables. The values of these variables are found by solving the corresponding optimization problems. Numerical experiments are conducted on several real world binary classification data sets, where continuous features are discretized by applying three different methods. The performances of these models are compared with the naive Bayes classifier, tree augmented naive Bayes, the SVM, C4.5 and the nearest neighbor classifier. The obtained results demonstrate that the proposed models can significantly improve the performance of the naive Bayes classifier, yet at the same time maintain its simple structure.

A new method for solving linear ill-posed problems

- Zhang, Jianjun, Mammadov, Musa

**Authors:**Zhang, Jianjun , Mammadov, Musa**Date:**2012**Type:**Text , Journal article**Relation:**Applied Mathematics and Computation Vol. 218, no. 20 (2012), p.10180-10187**Full Text:****Reviewed:****Description:**In this paper, we propose a new method for solving large-scale ill-posed problems. This method is based on the Karush-Kuhn-Tucker conditions, Fisher-Burmeister function and the discrepancy principle. The main difference from the majority of existing methods for solving ill-posed problems is that, we do not need to choose a regularization parameter in advance. Experimental results show that the proposed method is effective and promising for many practical problems. © 2012.

**Authors:**Zhang, Jianjun , Mammadov, Musa**Date:**2012**Type:**Text , Journal article**Relation:**Applied Mathematics and Computation Vol. 218, no. 20 (2012), p.10180-10187**Full Text:****Reviewed:****Description:**In this paper, we propose a new method for solving large-scale ill-posed problems. This method is based on the Karush-Kuhn-Tucker conditions, Fisher-Burmeister function and the discrepancy principle. The main difference from the majority of existing methods for solving ill-posed problems is that, we do not need to choose a regularization parameter in advance. Experimental results show that the proposed method is effective and promising for many practical problems. © 2012.

Solving systems of nonlinear equations using a globally convergent optimization algorithm

- Taheri, Sona, Mammadov, Musa

**Authors:**Taheri, Sona , Mammadov, Musa**Date:**2012**Type:**Text , Journal article**Relation:**Global Journal of Technology & Optimization Vol. 3, no. (2012), p. 132-138**Full Text:****Reviewed:****Description:**Solving systems of nonlinear equations is a relatively complicated problem for which a number of different approaches have been presented. In this paper, a new algorithm is proposed for the solutions of systems of nonlinear equations. This algorithm uses a combination of the gradient and the Newton’s methods. A novel dynamic combinatory is developed to determine the contribution of the methods in the combination. Also, by using some parameters in the proposed algorithm, this contribution is adjusted. We use the gradient method due to its global convergence property, and the Newton’s method to speed up the convergence rate. We consider two different combinations. In the first one, a step length is determined only along the gradient direction. The second one is finding a step length along both the gradient and the Newton’s directions. The performance of the proposed algorithm in comparison to the Newton’s method, the gradient method and an existing combination method is explored on several well known test problems in solving systems of nonlinear equations. The numerical results provide evidence that the proposed combination algorithm is generally more robust and efficient than other mentioned methods on someimportant and difficult problems.

Structure learning of Bayesian networks using a new unrestricted dependency algorithm

- Taheri, Sona, Mammadov, Musa

**Authors:**Taheri, Sona , Mammadov, Musa**Date:**2012**Type:**Text , Conference proceedings**Full Text:****Description:**Bayesian Networks have deserved extensive attentions in data mining due to their efficiencies, and reasonable predictive accuracy. A Bayesian Network is a directed acyclic graph in which each node represents a variable and each arc a probabilistic dependency between two variables. Constructing a Bayesian Network from data is the learning process that is divided in two steps: learning structure and learning parameter. In many domains, the structure is not known a priori and must be inferred from data. This paper presents an iterative unrestricted dependency algorithm for learning structure of Bayesian Networks for binary classification problems. Numerical experiments are conducted on several real world data sets, where continuous features are discretized by applying two different methods. The performance of the proposed algorithm is compared with the Naive Bayes, the Tree Augmented Naive Bayes, and the k

**Authors:**Taheri, Sona , Mammadov, Musa**Date:**2012**Type:**Text , Conference proceedings**Full Text:****Description:**Bayesian Networks have deserved extensive attentions in data mining due to their efficiencies, and reasonable predictive accuracy. A Bayesian Network is a directed acyclic graph in which each node represents a variable and each arc a probabilistic dependency between two variables. Constructing a Bayesian Network from data is the learning process that is divided in two steps: learning structure and learning parameter. In many domains, the structure is not known a priori and must be inferred from data. This paper presents an iterative unrestricted dependency algorithm for learning structure of Bayesian Networks for binary classification problems. Numerical experiments are conducted on several real world data sets, where continuous features are discretized by applying two different methods. The performance of the proposed algorithm is compared with the Naive Bayes, the Tree Augmented Naive Bayes, and the k

The effect of regularization on drug-reaction relationships

- Mammadov, Musa, Zhao, L., Zhang, Jianjun

**Authors:**Mammadov, Musa , Zhao, L. , Zhang, Jianjun**Date:**2012**Type:**Text , Journal article**Relation:**Optimization Vol. 61, no. 4 (2012), p. 405-422**Full Text:****Reviewed:****Description:**The least-squares method is a standard approach used in data fitting that has important applications in many areas in science and engineering including many finance problems. In the case when the problem under consideration involves large-scale sparse matrices regularization methods are used to obtain more stable solutions by relaxing the data fitting. In this article, a new regularization algorithm is introduced based on the Karush-Kuhn-Tucker conditions and the Fisher-Burmeister function. The Newton method is used for solving corresponding systems of equations. The advantages of the proposed method has been demonstrated in the establishment of drug-reaction relationships based on the Australian Adverse Drug Reaction Advisory Committee database. © 2012 Copyright Taylor and Francis Group, LLC.

**Authors:**Mammadov, Musa , Zhao, L. , Zhang, Jianjun**Date:**2012**Type:**Text , Journal article**Relation:**Optimization Vol. 61, no. 4 (2012), p. 405-422**Full Text:****Reviewed:****Description:**The least-squares method is a standard approach used in data fitting that has important applications in many areas in science and engineering including many finance problems. In the case when the problem under consideration involves large-scale sparse matrices regularization methods are used to obtain more stable solutions by relaxing the data fitting. In this article, a new regularization algorithm is introduced based on the Karush-Kuhn-Tucker conditions and the Fisher-Burmeister function. The Newton method is used for solving corresponding systems of equations. The advantages of the proposed method has been demonstrated in the establishment of drug-reaction relationships based on the Australian Adverse Drug Reaction Advisory Committee database. © 2012 Copyright Taylor and Francis Group, LLC.

Global asymptotic stability in a class of nonlinear differential delay equations

- Ivanov, Anatoli, Mammadov, Musa

**Authors:**Ivanov, Anatoli , Mammadov, Musa**Date:**2011**Type:**Text , Journal article**Relation:**Discrete and Continuous Dynamical Systems Vol. 2011, no. Supplement 2011 (2011), p.**Full Text:****Reviewed:****Description:**An essentially nonlinear dierential equation with delay serving as a mathematical model of several applied problems is considered. Sufficient conditions for the global asymptotic stability of a unique equilibrium are de- rived. An application to a physiological model by M.C. Mackey is treated in detail.**Description:**2003009358

**Authors:**Ivanov, Anatoli , Mammadov, Musa**Date:**2011**Type:**Text , Journal article**Relation:**Discrete and Continuous Dynamical Systems Vol. 2011, no. Supplement 2011 (2011), p.**Full Text:****Reviewed:****Description:**An essentially nonlinear dierential equation with delay serving as a mathematical model of several applied problems is considered. Sufficient conditions for the global asymptotic stability of a unique equilibrium are de- rived. An application to a physiological model by M.C. Mackey is treated in detail.**Description:**2003009358

Optimality conditions in nonconvex optimization via weak subdifferentials

- Kasimbeyli, Refail, Mammadov, Musa

**Authors:**Kasimbeyli, Refail , Mammadov, Musa**Date:**2011**Type:**Text , Journal article**Relation:**Nonlinear Analysis, Theory, Methods and Applications Vol. 74, no. 7 (2011), p. 2534-2547**Full Text:****Reviewed:****Description:**In this paper we study optimality conditions for optimization problems described by a special class of directionally differentiable functions. The well-known necessary and sufficient optimality condition of nonsmooth convex optimization, given in the form of variational inequality, is generalized to the nonconvex case by using the notion of weak subdifferentials. The equivalent formulation of this condition in terms of weak subdifferentials and augmented normal cones is also presented. Â© 2011 Elsevier Ltd. All rights reserved.

A new supervised term ranking method for text categorization

- Mammadov, Musa, Yearwood, John, Zhao, Lei

**Authors:**Mammadov, Musa , Yearwood, John , Zhao, Lei**Date:**2010**Type:**Text , Conference paper**Relation:**Paper presented at 23rd Australasian Joint Conference on Artificial Intelligence, AI 2010 Vol. 6464 LNAI, p. 102-111**Full Text:****Reviewed:****Description:**In text categorization, different supervised term weighting methods have been applied to improve classification performance by weighting terms with respect to different categories, for example, Information Gain, Ï‡^{2}statistic, and Odds Ratio. From the literature there are three term ranking methods to summarize term weights of different categories for multi-class text categorization. They are Summation, Average, and Maximum methods. In this paper we present a new term ranking method to summarize term weights, i.e. Maximum Gap. Using two different methods of information gain and Ï‡^{2}statistic, we setup controlled experiments for different term ranking methods. Reuter-21578 text corpus is used as the dataset. Two popular classification algorithms SVM and Boostexter are adopted to evaluate the performance of different term ranking methods. Experimental results show that the new term ranking method performs better. Â© 2010 Springer-Verlag.

**Authors:**Mammadov, Musa , Yearwood, John , Zhao, Lei**Date:**2010**Type:**Text , Conference paper**Relation:**Paper presented at 23rd Australasian Joint Conference on Artificial Intelligence, AI 2010 Vol. 6464 LNAI, p. 102-111**Full Text:****Reviewed:****Description:**In text categorization, different supervised term weighting methods have been applied to improve classification performance by weighting terms with respect to different categories, for example, Information Gain, Ï‡^{2}statistic, and Odds Ratio. From the literature there are three term ranking methods to summarize term weights of different categories for multi-class text categorization. They are Summation, Average, and Maximum methods. In this paper we present a new term ranking method to summarize term weights, i.e. Maximum Gap. Using two different methods of information gain and Ï‡^{2}statistic, we setup controlled experiments for different term ranking methods. Reuter-21578 text corpus is used as the dataset. Two popular classification algorithms SVM and Boostexter are adopted to evaluate the performance of different term ranking methods. Experimental results show that the new term ranking method performs better. Â© 2010 Springer-Verlag.

From convex to nonconvex: A loss function analysis for binary classification

- Zhao, Lei, Mammadov, Musa, Yearwood, John

**Authors:**Zhao, Lei , Mammadov, Musa , Yearwood, John**Date:**2010**Type:**Text , Conference paper**Relation:**Paper presented at10th IEEE International Conference on Data Mining Workshops, ICDMW 2010 p. 1281-1288**Full Text:****Reviewed:****Description:**Problems of data classification can be studied in the framework of regularization theory as ill-posed problems. In this framework, loss functions play an important role in the application of regularization theory to classification. In this paper, we review some important convex loss functions, including hinge loss, square loss, modified square loss, exponential loss, logistic regression loss, as well as some non-convex loss functions, such as sigmoid loss, Ã¸-loss, ramp loss, normalized sigmoid loss, and the loss function of 2 layer neural network. Based on the analysis of these loss functions, we propose a new differentiable non-convex loss function, called smoothed 0-1 loss function, which is a natural approximation of the 0-1 loss function. To compare the performance of different loss functions, we propose two binary classification algorithms for binary classification, one for convex loss functions, the other for non-convex loss functions. A set of experiments are launched on several binary data sets from the UCI repository. The results show that the proposed smoothed 0-1 loss function is robust, especially for those noisy data sets with many outliers. Â© 2010 IEEE.

**Authors:**Zhao, Lei , Mammadov, Musa , Yearwood, John**Date:**2010**Type:**Text , Conference paper**Relation:**Paper presented at10th IEEE International Conference on Data Mining Workshops, ICDMW 2010 p. 1281-1288**Full Text:****Reviewed:****Description:**Problems of data classification can be studied in the framework of regularization theory as ill-posed problems. In this framework, loss functions play an important role in the application of regularization theory to classification. In this paper, we review some important convex loss functions, including hinge loss, square loss, modified square loss, exponential loss, logistic regression loss, as well as some non-convex loss functions, such as sigmoid loss, Ã¸-loss, ramp loss, normalized sigmoid loss, and the loss function of 2 layer neural network. Based on the analysis of these loss functions, we propose a new differentiable non-convex loss function, called smoothed 0-1 loss function, which is a natural approximation of the 0-1 loss function. To compare the performance of different loss functions, we propose two binary classification algorithms for binary classification, one for convex loss functions, the other for non-convex loss functions. A set of experiments are launched on several binary data sets from the UCI repository. The results show that the proposed smoothed 0-1 loss function is robust, especially for those noisy data sets with many outliers. Â© 2010 IEEE.

Improving Naive Bayes classifier using conditional probabilities

- Taheri, Sona, Mammadov, Musa, Bagirov, Adil

**Authors:**Taheri, Sona , Mammadov, Musa , Bagirov, Adil**Date:**2010**Type:**Text , Conference proceedings**Full Text:****Description:**Naive Bayes classifier is the simplest among Bayesian Network classifiers. It has shown to be very efficient on a variety of data classification problems. However, the strong assumption that all features are conditionally independent given the class is often violated on many real world applications. Therefore, improvement of the Naive Bayes classifier by alleviating the feature independence assumption has attracted much attention. In this paper, we develop a new version of the Naive Bayes classifier without assuming independence of features. The proposed algorithm approximates the interactions between features by using conditional probabilities. We present results of numerical experiments on several real world data sets, where continuous features are discretized by applying two different methods. These results demonstrate that the proposed algorithm significantly improve the performance of the Naive Bayes classifier, yet at the same time maintains its robustness. © 2011, Australian Computer Society, Inc.**Description:**2003009505

**Authors:**Taheri, Sona , Mammadov, Musa , Bagirov, Adil**Date:**2010**Type:**Text , Conference proceedings**Full Text:****Description:**Naive Bayes classifier is the simplest among Bayesian Network classifiers. It has shown to be very efficient on a variety of data classification problems. However, the strong assumption that all features are conditionally independent given the class is often violated on many real world applications. Therefore, improvement of the Naive Bayes classifier by alleviating the feature independence assumption has attracted much attention. In this paper, we develop a new version of the Naive Bayes classifier without assuming independence of features. The proposed algorithm approximates the interactions between features by using conditional probabilities. We present results of numerical experiments on several real world data sets, where continuous features are discretized by applying two different methods. These results demonstrate that the proposed algorithm significantly improve the performance of the Naive Bayes classifier, yet at the same time maintains its robustness. © 2011, Australian Computer Society, Inc.**Description:**2003009505

Investment decision model via an improved BP neural network

- Shen, Jihong, Zhang, Canxin, Lian, Chunbo, Hu, Hao, Mammadov, Musa

**Authors:**Shen, Jihong , Zhang, Canxin , Lian, Chunbo , Hu, Hao , Mammadov, Musa**Date:**2010**Type:**Text , Conference paper**Relation:**Paper presented at 2010 IEEE International Conference on Information and Automation, ICIA 2010, Harbin, Heilongjiang 20th-23rd June 2010 p. 2092-2096**Full Text:****Description:**In macro investment, an investment decision model is established by using an improved back propagation (BP) artificial neural network (ANN). In this paper, the relations between elements of investment and output of products are determined, and then the optimal distribution of investment is determined by adjusting the distributions rationally. This model can reflect the highly nonlinear mapping relations among each element of investment by using nonlinear utility functions to improve the architecture of artificial neural network, which can be widely applied in investment problems. ©2010 IEEE.

**Authors:**Shen, Jihong , Zhang, Canxin , Lian, Chunbo , Hu, Hao , Mammadov, Musa**Date:**2010**Type:**Text , Conference paper**Relation:**Paper presented at 2010 IEEE International Conference on Information and Automation, ICIA 2010, Harbin, Heilongjiang 20th-23rd June 2010 p. 2092-2096**Full Text:****Description:**In macro investment, an investment decision model is established by using an improved back propagation (BP) artificial neural network (ANN). In this paper, the relations between elements of investment and output of products are determined, and then the optimal distribution of investment is determined by adjusting the distributions rationally. This model can reflect the highly nonlinear mapping relations among each element of investment by using nonlinear utility functions to improve the architecture of artificial neural network, which can be widely applied in investment problems. ©2010 IEEE.

Profiling phishing emails based on hyperlink information

- Yearwood, John, Mammadov, Musa, Banerjee, Arunava

**Authors:**Yearwood, John , Mammadov, Musa , Banerjee, Arunava**Date:**2010**Type:**Text , Conference paper**Relation:**Paper presented at 2010 International Conference on Advances in Social Network Analysis and Mining, ASONAM 2010, Odense : 9th-11th August 2010 p. 120-127**Full Text:****Description:**In this paper, a novel method for profiling phishing activity from an analysis of phishing emails is proposed. Profiling is useful in determining the activity of an individual or a particular group of phishers. Work in the area of phishing is usually aimed at detection of phishing emails. In this paper, we concentrate on profiling as distinct from detection of phishing emails. We formulate the profiling problem as a multi-label classification problem using the hyperlinks in the phishing emails as features and structural properties of emails along with whois (i.e.DNS) information on hyperlinks as profile classes. Further, we generate profiles based on classifier predictions. Thus, classes become elements of profiles. We employ a boosting algorithm (AdaBoost) as well as SVM to generate multi-label class predictions on three different datasets created from hyperlink information in phishing emails. These predictions are further utilized to generate complete profiles of these emails. Results show that profiling can be done with quite high accuracy using hyperlink information. Â© 2010 Crown Copyright.

**Authors:**Yearwood, John , Mammadov, Musa , Banerjee, Arunava**Date:**2010**Type:**Text , Conference paper**Relation:**Paper presented at 2010 International Conference on Advances in Social Network Analysis and Mining, ASONAM 2010, Odense : 9th-11th August 2010 p. 120-127**Full Text:****Description:**In this paper, a novel method for profiling phishing activity from an analysis of phishing emails is proposed. Profiling is useful in determining the activity of an individual or a particular group of phishers. Work in the area of phishing is usually aimed at detection of phishing emails. In this paper, we concentrate on profiling as distinct from detection of phishing emails. We formulate the profiling problem as a multi-label classification problem using the hyperlinks in the phishing emails as features and structural properties of emails along with whois (i.e.DNS) information on hyperlinks as profile classes. Further, we generate profiles based on classifier predictions. Thus, classes become elements of profiles. We employ a boosting algorithm (AdaBoost) as well as SVM to generate multi-label class predictions on three different datasets created from hyperlink information in phishing emails. These predictions are further utilized to generate complete profiles of these emails. Results show that profiling can be done with quite high accuracy using hyperlink information. Â© 2010 Crown Copyright.