Your selections:

Show More

Show Less

From convex to nonconvex: A loss function analysis for binary classification

- Zhao, Lei, Mammadov, Musa, Yearwood, John

**Authors:**Zhao, Lei , Mammadov, Musa , Yearwood, John**Date:**2010**Type:**Text , Conference paper**Relation:**Paper presented at10th IEEE International Conference on Data Mining Workshops, ICDMW 2010 p. 1281-1288**Full Text:****Reviewed:****Description:**Problems of data classification can be studied in the framework of regularization theory as ill-posed problems. In this framework, loss functions play an important role in the application of regularization theory to classification. In this paper, we review some important convex loss functions, including hinge loss, square loss, modified square loss, exponential loss, logistic regression loss, as well as some non-convex loss functions, such as sigmoid loss, Ã¸-loss, ramp loss, normalized sigmoid loss, and the loss function of 2 layer neural network. Based on the analysis of these loss functions, we propose a new differentiable non-convex loss function, called smoothed 0-1 loss function, which is a natural approximation of the 0-1 loss function. To compare the performance of different loss functions, we propose two binary classification algorithms for binary classification, one for convex loss functions, the other for non-convex loss functions. A set of experiments are launched on several binary data sets from the UCI repository. The results show that the proposed smoothed 0-1 loss function is robust, especially for those noisy data sets with many outliers. Â© 2010 IEEE.

**Authors:**Zhao, Lei , Mammadov, Musa , Yearwood, John**Date:**2010**Type:**Text , Conference paper**Relation:**Paper presented at10th IEEE International Conference on Data Mining Workshops, ICDMW 2010 p. 1281-1288**Full Text:****Reviewed:****Description:**Problems of data classification can be studied in the framework of regularization theory as ill-posed problems. In this framework, loss functions play an important role in the application of regularization theory to classification. In this paper, we review some important convex loss functions, including hinge loss, square loss, modified square loss, exponential loss, logistic regression loss, as well as some non-convex loss functions, such as sigmoid loss, Ã¸-loss, ramp loss, normalized sigmoid loss, and the loss function of 2 layer neural network. Based on the analysis of these loss functions, we propose a new differentiable non-convex loss function, called smoothed 0-1 loss function, which is a natural approximation of the 0-1 loss function. To compare the performance of different loss functions, we propose two binary classification algorithms for binary classification, one for convex loss functions, the other for non-convex loss functions. A set of experiments are launched on several binary data sets from the UCI repository. The results show that the proposed smoothed 0-1 loss function is robust, especially for those noisy data sets with many outliers. Â© 2010 IEEE.

The effect of regularization on drug-reaction relationships

- Mammadov, Musa, Zhao, L., Zhang, Jianjun

**Authors:**Mammadov, Musa , Zhao, L. , Zhang, Jianjun**Date:**2012**Type:**Text , Journal article**Relation:**Optimization Vol. 61, no. 4 (2012), p. 405-422**Full Text:****Reviewed:****Description:**The least-squares method is a standard approach used in data fitting that has important applications in many areas in science and engineering including many finance problems. In the case when the problem under consideration involves large-scale sparse matrices regularization methods are used to obtain more stable solutions by relaxing the data fitting. In this article, a new regularization algorithm is introduced based on the Karush-Kuhn-Tucker conditions and the Fisher-Burmeister function. The Newton method is used for solving corresponding systems of equations. The advantages of the proposed method has been demonstrated in the establishment of drug-reaction relationships based on the Australian Adverse Drug Reaction Advisory Committee database. © 2012 Copyright Taylor and Francis Group, LLC.

**Authors:**Mammadov, Musa , Zhao, L. , Zhang, Jianjun**Date:**2012**Type:**Text , Journal article**Relation:**Optimization Vol. 61, no. 4 (2012), p. 405-422**Full Text:****Reviewed:****Description:**The least-squares method is a standard approach used in data fitting that has important applications in many areas in science and engineering including many finance problems. In the case when the problem under consideration involves large-scale sparse matrices regularization methods are used to obtain more stable solutions by relaxing the data fitting. In this article, a new regularization algorithm is introduced based on the Karush-Kuhn-Tucker conditions and the Fisher-Burmeister function. The Newton method is used for solving corresponding systems of equations. The advantages of the proposed method has been demonstrated in the establishment of drug-reaction relationships based on the Australian Adverse Drug Reaction Advisory Committee database. © 2012 Copyright Taylor and Francis Group, LLC.

Structure learning of Bayesian Networks using global optimization with applications in data classification

- Taheri, Sona, Mammadov, Musa

**Authors:**Taheri, Sona , Mammadov, Musa**Date:**2014**Type:**Text , Journal article**Relation:**Optimization Letters Vol. 9, no. 5 (2014), p. 931-948**Full Text:****Reviewed:****Description:**Bayesian Networks are increasingly popular methods of modeling uncertainty in artificial intelligence and machine learning. A Bayesian Network consists of a directed acyclic graph in which each node represents a variable and each arc represents probabilistic dependency between two variables. Constructing a Bayesian Network from data is a learning process that consists of two steps: learning structure and learning parameter. Learning a network structure from data is the most difficult task in this process. This paper presents a new algorithm for constructing an optimal structure for Bayesian Networks based on optimization. The algorithm has two major parts. First, we define an optimization model to find the better network graphs. Then, we apply an optimization approach for removing possible cycles from the directed graphs obtained in the first part which is the first of its kind in the literature. The main advantage of the proposed method is that the maximal number of parents for variables is not fixed a priory and it is defined during the optimization procedure. It also considers all networks including cyclic ones and then choose a best structure by applying a global optimization method. To show the efficiency of the algorithm, several closely related algorithms including unrestricted dependency Bayesian Network algorithm, as well as, benchmarks algorithms SVM and C4.5 are employed for comparison. We apply these algorithms on data classification; data sets are taken from the UCI machine learning repository and the LIBSVM. © 2014, Springer-Verlag Berlin Heidelberg.

**Authors:**Taheri, Sona , Mammadov, Musa**Date:**2014**Type:**Text , Journal article**Relation:**Optimization Letters Vol. 9, no. 5 (2014), p. 931-948**Full Text:****Reviewed:****Description:**Bayesian Networks are increasingly popular methods of modeling uncertainty in artificial intelligence and machine learning. A Bayesian Network consists of a directed acyclic graph in which each node represents a variable and each arc represents probabilistic dependency between two variables. Constructing a Bayesian Network from data is a learning process that consists of two steps: learning structure and learning parameter. Learning a network structure from data is the most difficult task in this process. This paper presents a new algorithm for constructing an optimal structure for Bayesian Networks based on optimization. The algorithm has two major parts. First, we define an optimization model to find the better network graphs. Then, we apply an optimization approach for removing possible cycles from the directed graphs obtained in the first part which is the first of its kind in the literature. The main advantage of the proposed method is that the maximal number of parents for variables is not fixed a priory and it is defined during the optimization procedure. It also considers all networks including cyclic ones and then choose a best structure by applying a global optimization method. To show the efficiency of the algorithm, several closely related algorithms including unrestricted dependency Bayesian Network algorithm, as well as, benchmarks algorithms SVM and C4.5 are employed for comparison. We apply these algorithms on data classification; data sets are taken from the UCI machine learning repository and the LIBSVM. © 2014, Springer-Verlag Berlin Heidelberg.

- «
- ‹
- 1
- ›
- »

Are you sure you would like to clear your session, including search history and login status?