Deep Matrix Factorization for Trust-Aware Recommendation in Social Networks

Recent years have witnessed remarkable information overload in online social networks, and social network based approaches for recommender systems have been widely studied. The trust information in social networks among users is an important factor for improving recommendation performance. Many successful recommendation tasks are treated as the matrix factorization problems. However, the prediction performance of matrix factorization based methods largely depends on the matrixes initialization of users and items. To address this challenge, we develop a novel trust-aware approach based on deep learning to alleviate the initialization dependence. First, we propose two deep matrix factorization (DMF) techniques, i.e., linear DMF and non-linear DMF to extract features from the user-item rating matrix for improving the initialization accuracy. The trust relationship is integrated into the DMF model according to the preference similarity and the derivations of users on items. Second, we exploit deep marginalized Denoising Autoencoder (Deep-MDAE) to extract the latent representation in the hidden layer from the trust relationship matrix to approximate the user factor matrix factorized from the user-item rating matrix. The community regularization is integrated in the joint optimization function to take neighbours’ effects into consideration. The results of DMF are applied to initialize the updating variables of Deep-MDAE in order to further improve the recommendation performance. Finally, we validate that the proposed approach outperforms state-of-the-art baselines for recommendation, especially for the cold-start users.


I. INTRODUCTION
D UE to the rapidly growing amount of information and explosive appearance of new services available in the web, the overloaded information prevents users from obtaining useful information conveniently [1]- [6]. How to help the overwhelmed users to select the interested part of online information is becoming an unprecedentedly important task. Both academic and industrial fields pay much attention to this problem. To satisfy this requirement, recommender systems have emerged as an effective mechanism to provide suitable recommendation for the costumers about what kinds of the items or persons that they may be potentially interested [7]. At present, there are many popular recommender systems such as the items recommendation in Amazon, the music recommendation in Last.fm, the movies recommendation in Netflix and the friends recommendation in Linked [8], [9], etc.

A. Motivation
Although recommender systems have been widely studied and used both in academic and industrial fields, some important problems still exist. First, only a small proportional items have been rated. The users' ratings on items may reflect the interest of users, and the users' historical rating data can generally be formalized as the user-item rating matrix, which usually influences the collaborative filtering-based recommender systems. The host information systems of recommender systems can provide a large amount of information, and thus the dimension of user-item rating matrix is generally very high. However, the users only visit a relative small number of items, and the number of ratings that the users assign to the items is rare. Thus a large number of ratings are lacked in the useritem rating matrix. As illustrated in [10], the available ratings for implementing recommendation are very rare. The classical user-based collaborative filtering approaches exploit the neighbors' ratings of the target user to predict his/her rating. When the user-item rating matrix is egregious sparsity, the target user's neighbors can hardly be identified. Thus the recommendation coverage would deteriorate dramatically. The classical item-based collaborative filtering approaches exploit the target users' ratings which already exist in the user-item rating matrix to achieve the recommendation. Due to the existing of sparsity, the users' historical rating data is too rare, which means that the essence of sparsity is information missing in the user-item rating matrix. The neighbors' items of target items that have already been visited cannot be identified. The recommendation accuracy would deteriorate dramatically, and the item-based approaches even fail to recommend the items to target users. The recommendation quality of collaborative filtering approaches cannot be guaranteed without sufficient data of ratings assigned to items by users.
Second, the challenge of cold-start exists in recommender systems, including the cold-start users and the cold-start items. When new users appear in the systems, no items has been assigned by them. There are no ratings for the new users can be utilized. From this perspective, the content-based collaborative filtering approaches do not work since the user profile cannot be constructed in recommender systems. For the userbased collaborative filtering approaches, the similar neighbors of target user cannot be identified in the recommender systems. In terms of the item-based collaborative filtering approaches, the item similarity can be determined by recommender systems. The ratings of items have not been assigned by new users, and thus the prediction cannot be completed. When new items appear in the systems, for the content-based collaborative filtering approaches, the model of new items can be constructed in recommender systems without ratings assigned by users. For collaborative filtering approaches, the similarity calculation and the predictions of ratings cannot be completed when the data deficiency of items' ratings exists. The cold-start problem exists during the whole life cycle of recommender systems, especially in the initial stage of construction of recommender systems. The new users and items are difficult for investigation without any prior information. The defect of cold-start is the excessive dependence of the rating data. Thus other information about users or items should be considered to design or improve the recommender systems.
Third, the trust relationship [11], [12] among the users has not been considered in traditional recommender systems. In real application, compared with other ordinary users, their friends' comments can better affect the users' decisions about which items that they are interested. Therefore, the traditional recommender systems cannot sufficiently provide accurate and reliable predictions since they only consider the user-item rating matrix. Thus new approaches and techniques are needed to address these problems.
In this paper, the trust relationship is integrated to solve the problem mentioned above. With the help of the trust relationship, the effect of cold-start problem can be relieved as well as the data sparsity. Although the neighbours of the cold-start users cannot be known accurately, the user's preference can be inferred via the friends-based social networks.

B. Contributions
In this paper, we propose several approaches based on matrix factorization for recommendation exploiting trustaware relationship in social networks. Since the initialization is very important for matrix factorization-based approaches, we propose an initialization method based on deep learning, where Deep Matrix Factorization (DMF) is used for pertaining the initial feature matrices for our learning model. We propose a DMF-based model by integrating and exploiting the trust relationship for overcoming the data sparsity. Finally, a series of experiments verify the superiority of our proposed methods by extracting user data from online social networks Epinions and Flixster. From the experiment results, our proposed methods have higher recommendation accuracy compared with other baseline methods. In conclusion, the main contributions of this paper are listed as follows: 1) The deep learning techniques are integrated with the social trust relationship to improve the recommendation performance.
2) The Linear Representation DMF (LRDMF) and Non-Linear Representation DMF (NLRDMF) are adopted to improve the initialized accuracy of matrix factorization. The user preferences and the derivations of users on items are taken into consideration as social trust relationship, and they have been integrated in the DMF for overcoming data sparsity issue.
3) We propose a joint optimization function to enforce the user factor matrix as close as possible to the latent representation of the trust relationship in the hidden layers of the deep Marginalized Denoising Autoencoder (Deep-MDAE). In addition, we integrate the community regularization in the joint optimization function to take the neighbours' effects into consideration. The two factorized matrices obtained from DMF are utilized for initializing the updating variables of Deep-MDAE.
4) The data sets are extracted from Epinions and Flixster. From the recommendation results, our DMF trust based methods obtain higher recommendation accuracy compared with other methods, especially in the case of sparse data and coldstart users.

C. Organization of the Paper
This paper is organized as follows. The related work is discussed in Sectin 2. The problem formulation is given in Section 3. The proposed DMF algorithm is analyzed in Section 4. The proposed social trust based DMF method is presented in Section 5. The experiments are shown and analyzed in Section 6. The conclusions are drawn in Section 7.

D. Notations
In this paper, the operator ðÁÞ { and ðÁÞ T stand for the pseudoinverse and the transpose of a matrix, respectively. I I M stands for an M Â M identity matrix. k Á k F denotes the Frobenius norm. stands for the Hadamard products. E stands for the expectation operator, and tr stands for the trace operator.

II. RELATED WORK
Collaborative filtering has been widely used in recommender systems [13]. However, the sparse data and the cold-start problem exist in the collaborative filtering-based approaches [8]. Many frameworks have been proposed by researchers for solving the problems in the collaborative filtering-based approaches. The interpolation technique has been applied to fill the missing entries in the user-item rating matrix [14] for overcoming the problem of sparse data. The memory-based collaborative filtering approaches are also known as the neighbour-based collaborative filtering approaches. The user-item rating matrix is utilized to generate recommendation items. First, various similarity metrics are adopted to calculate the similarity among different users or items. Then the neighbours of active users or target items can be found. The sum of weight of the neighbours belonging to active users or target items can be regarded as the prediction ratings. The typical memory-based collaborative filtering approaches are divided into user-based and item-based collaborative filtering approaches. However, the memory-based collaborative filtering approaches do not scale well to commercial recommender systems. In addition, the model-based collaborative filtering approaches provide a scalable solution for the scenario with relatively sufficient data.
Model-based collaborative filtering approaches utilize the machine learning skills to train prediction model. Users' behaviours are depicted by prediction model, and then modelbased collaborative filtering approaches can predict the items' ratings assigned by users via the learned prediction model. However, the entries in user-item rating matrix are not all used. Typical model-based collaborative filtering approaches include Bayesian network-based approaches, clustering model-based approaches, potential semantic model-based approaches, limited Boltzmann machine-based approaches and association rules-based approaches.
Matrix factorization is also a widely used method based on collaborative filtering [17]. In practice, we cannot directly factorize matrices in most of the cases because of its low recommendation accuracy. Deep learning [29] has been applied to improve the performance of matrix factorization, as well as to find an appropriate way to represent matrices in low dimension and relieve the data sparsity and cold-start problems partly. Trust information has been derived from [8] recently, and then the trust-aware recommendation becomes a developing area to enhance the recommendation performance of learning based methods.

A. Matrix Factorization Based Methods
Matrix factorization-based recommendation approaches [18] have been widely adopted by researchers. The main reason is that the matrix factorization technique can effectively deal with the user-item rating matrix with large scale. The matrix factorization technique assumes that users' behaviors are influenced by only a few implicit factors. Matrix factorization simultaneously maps the feature vectors of users and items into low dimension hidden feature space, in which the inner product between users and items can be directly calculated.
Constrained non-negative matrix factorisation (CNMF) incorporates the additional constraints as regularization of the error function on the prime problem [19]. However, there exists an apparent problem for CNMF, i.e., it concerns the problem in the global scope. However, for a local or pairwise situation, there is a lack of consideration. For solving this problem, a method named graph regularized nonnegative matrix factorization (GNMF) has been proposed in [20]. The geometrical information is represented by constructing a nearest neighbor graph, and the graph structure is incorporated into a new factorization objective function.
Recently, Relative Pairwise Relationships Non-negative Matrix Factorisation (RPR-NMF) has been proposed in [21]. The penalties imposed on relative pairwise relationship can be written as triplets. By adjusting the conditions of factors, RPR-NMF is able to implement on more recommendation issues conveniently.
Besides, there are various forms of matrix factorisation methods as well, such as nonnegative matrix factorization (NMF), SVD++, Bayesian probabilistic matrix factorization (BPMF), probabilistic matrix factorization (PMF) and maximum-margin matrix factorization (MMMF) [22]. Each of implicit eigenvectors of NMF constraint must be positive. The PMF uses probabilistic graph models with Gaussian noise to represent implicit eigenvectors of users and items. BPMF assumes that users' and items' hyperparameters are a priori, and they obey the Gaussian-Wishart distribution. The Markov chain Monte Carlo method has been performed for approximate reasoning. SVD++ generates recommendations based on both the explicit and the implicit effect of ratings.

B. Deep Learning Based Methods
Matrix factorization is an ideal way to integrate trust-aware recommendations. In fact, factorizing user-item rating matrix directly in a reasonable way is almost impossible. Because of the complex connections among users and items, we need a more effective approach to capture these connections. Based on deep learning, some architectural paradigms, including multilayer perceptron (MLP), autoencoder, recurrent neural network (RNN), convolutional neural network (CNN), restricted Boltzmann machine (RBM), neural autoregressive distribution estimation (NADE), adversarial networks, attentional models and deep reinforcement learning (DRL) have been proposed in [23].
Compared with traditional algorithms such as matrix factorization, the merits deep learning based methods consist of three aspects. 1) It can deal with nonlinear mapping efficiently, which can capture complex interactions among users and items. 2) It is useful for learning the underlying factors, that is convenient for us to extract key information from massive data.
3) It has improvement on sequence modeling.
Among all deep learning based recommendation methods, MLP is a simple but powerful idea to achieve desirable accuracy by approximating objective function. The functions of Neural network matrix factorization (NNMF) representing the sum of vectors are selected to learn, and the function is set as a feed-forward neural network [24]. Neural collaborative filtering (NCF) [25] has became a useful tool in recommendation systems recently, and it generalizes traditional matrix factorization to NCF. Researchers trained this network by adopting weighted square loss or binary cross-entropy loss functions.
Autoencoder is also a common technique in deep learning. Among the various variants of autoencoder, denoising autoencoder is the most studied one. Many researchers consider collaborative filtering from autoencoder aspect. User or item factor matrices are set as input [26], and they are desired to recover in the output layer. The algorithm proposed in [27] has extended AutoRec proposed in [26] by denoising and has used side information to strengthen the robustness and refine the two difficulties we mentioned before. The autoencoding variational matrix factorization and graph convolutional matrix factorization autoencoder approaches have been proposed in [28]. A hierarchical Bayesian model has been proposed in [29], which runs the deep representation learning for the content information and collaborative filtering for the user-item rating matrix jointly. The recommendation performance has been improved significantly when deep learning is embedded in the recommendation systems [30].
Besides, CNN [31] and RNN [32] also achieve excellent performance in recommendation systems such as sequential recommendation. The DMF has been used for clustering by learning hidden representations in [33]. In addition, some works such as [34], [35] has tended to use both explicit and implicit information, or complete/missing data model to create a joint model for prediction with a new loss function. They obtain excellent recommendation results based on DMF. However, the trust relationship is not considered in the approaches mentioned above.

C. Trust-Aware Methods
Although the matrix factorization methods mentioned above can effectively deal with the user-item rating matrix with large scale, it cannot effectively solve the cold-start problem because of the intrinsic sparsity of the user-item rating matrix [36]. In recent years, some matrix factorization-based recommendation algorithms solve the cold-start problem by integrating additional sources of information. For example, users' tag information is integrated in a matrix factorization framework to improve the recommendation performance [37]. Based on the common interests among friends, users tend to accept recommendations from their friends. Many researchers have improved the quality of recommendation performance by integrating the information of social networks in a matrix factorization framework. The typical recommendation approaches based on social networks includes: social trust ensemble (STE) [38], SocialMF [39], TidalTrust [41] and MoleTrust [42]. SDAE proposed in [43] gives a joint objective function enforcing latent representations of social relationships and users to be as close as possible in the hidden layer of marginalized DAE. A deep learning based matrix factorization scheme for trust-aware recommendation has been proposed in [44]. Deep learning has been applied to initialize input, and a social trust ensemble learning model involving both influence of friends and communities has been adopted. Due to the efficiency of autoencoder, DAE is selected as the main method to solve the two problems stated above with trust information [45]. Users are described by not only the rating information but also the explicit trust information, and this method is named as TDAE. They also extract the implicit trust information to boost the performance, and the improved algorithm is called TDAE++. Besides, the trust relationships are also added to the input and output layer of autoencoder to map the nonlinear relationships. Now, how to distinguish a neighbor is trust-worthy? A part of researchers also propose models for neighbors selection, such as the availability evaluation module and the trust evaluation module in [46].

D. Similarity Metrics
Various similarity metrics have been applied in different scenarios, e.g., the user similarity metrics for friends recommendation in online social networks (OSNs) [47], [48], and the topological similarity metrics for link prediction and community detection [49]. The local similarity metrics utilize the local topological information to measure the similarity between nodes in networks, such as common neighbors (CN), Adamic-Adar (AA), resource allocation (RA) and preferential attachment (PA) [50]. The local path (LP) [51] index has been designed based on the path information. The widely used metric based on the structural similarity in networks is local random walk [52]. Random walk can to quantify relevance between nodes, and it is usually implemented for link prediction and recommendation tasks. In addition, a series of similarity indices including Jaccard similarity, cosine similarity, and Pearson correlation coefficient are used for measuring the interest similarity between users in OSNs [53]. Recently, the technology of network embedding has attracted lots of attention, such as DeepWalk and Node2vec [54]. They learn the low-dimensional vector representation of each node in networks [55], and then compute the similarity between vectors via different similarity metrics for the recommendation and prediction tasks.
In recent years, deep learning techniques can extract latent features and representations from the user-item rating matrix, which has been proved as an efficient method to improve the recommendation accuracy. Trust relationship matrix has been used for collecting the persons that the user trusts when this mechanism has been integrated in the recommender systems. The trust relationship matrix can be used for deducing the user preferences from his/her trust persons. This mechanism is extremely effective for the cold-start users, who have little information to deduce their preferences. Thus how to integrate the deep learning technique with the trust relationship becomes an important problem to solve. The matrix factorization technique is one of the important techniques for recommending the items to the target users in recommender systems. DMF has been used for improving item recommendation accuracy for target users to enhance the latent representations in hidden layers [34]. However, the trust relationship has not been used to further improve the recommendation performance. The autoencoder has been used for item recommendation [44], [56], which exploits the user-item rating matrix and the trust relationship matrix, respectively. However, none of them exploits both the deep structure for learning the model parameters of user-item rating matrix and the trust relationship matrix jointly. In this paper, we amalgamate the DMF of the user-item rating matrix into the autoencoder of the trust relationship matrix to achieve better recommendation performance in recommender systems.

III. PROBLEM FORMULATION
In recommender systems, we assume that U ¼ fu 1 ; u 2 ; . . . ; u M g represents user set and I ¼ fi 1 ; i 2 ; . . . ; i N g represents the item set. The ratings assigned by users on items are represented as a user-item rating matrix R R 2 R MÂN . R R m;n is the rating of item n assigned by user m. Ratings are often represented as integers between 1 and 5. We normalize the ratings for mapping the ratings into the interval of ½0; 1. In a rating network of users and items, each user m has a set of neighbors N m , and t m;v represents the social trust value that the user m assigns on user v in the range ½0; 1. If the value is zero, it means no trust relationship exists. Otherwise, it denotes full trust. In terms of the binary trust networks, the trust values among users are represented as a trust relationship matrix T T 2 R MÂM . T T m;v in T T represents the social relationship from user m to user v, and T T is asymmetric.
The rating network consists of nodes and edges. Users and items are represented as nodes in a network. The edges between users represent their trust relationships, and the edge weights between users and items denote the ratings on the items assigned by the users. An example of social rating network is shown in Fig. 1(a), and the corresponding user-item rating matrix is shown in Fig. 1(b). It can be seen from the rating matrix that only part of the user-item rating matrix can be used for recommendation, and the other ratings are not known.
Therefore, the trust-aware recommendation task is described as follows: given a user m and an item n, we aim to predict the rating on item n from user m by using the user-item rating matrix R R and the trust relationship matrix T T .
Next, we will introduce the basic matrix factorization approach from a probability perspective. It should be noted that the basic matrix factorization approach only use the known part of the user-item rating matrix R R to predict the unknown part of the user-item rating matrix R R.
Matrix factorization is an efficient model for predicting missing values in a given matrix. This problem is also known as matrix completion [15], which has attracted increasing attentions from researchers in the field of recommender systems [16]. Matrix factorization model represents both users and items by using a low-dimensional latent feature space. i.e., the user-item rating matrix is modeled as a product of two user and item matrices with low rank. The scenario that the matrix factorization technique adopted is that the user-item interactions are influenced by a few key features, and the application of each feature will influence a user's interactive experience [17]. The trustaware relationship is not used for predicting missing values in the user-item rating matrix.
The probabilistic linear model with Gaussian observation is adopted. The conditional distribution over the observed rating is defined as where N ðxjm; s 2 Þ is the probability density function of the Gaussian distribution, and it is determined by the mean m and variance s 2 . I mn is the indicator function, and it is equal to 1 if user m has rated item n, otherwise, it is 0. We assume that user and item feature vectors have zero-mean spherical Gaussian priors, and then the objective of matrix factorization is to maximize the posterior distribution over the user and item features, i.e., training and learning above latent variables by minimizing the following equation as follows where P and Q are both regularization terms for avoiding the overfitting of our model, and k Á k 2 F is the Frobenius norm. We initialize P P and Q Q randomly. Then we can perform the stochastic gradient descent technique [18] in P P and Q Q to minimize the objective function given by (2). The update formulation is given as follows where g 1 > 0 and g 2 > 0 are learning rates. A probabilistic foundation for regularizing the learned variables is given in [57], and some recent recommended approaches have adopted this form for the item recommendation in social networks [38]- [40].

IV. DEEP MATRIX FACTORIZATION FOR RECOMMENDATION SYSTEMS
As shown in Fig. 1(b), the matrix that we need to factorize is the user-item rating matrix, whose entries are assigned by the users (the corresponding column) to the items (the corresponding row). The two factorized matrices are corresponding to the users and items, which are called latent user and item feature matrices, respectively. Learning models by matrix factorization is a mature approach for solving recommendation problem when only part of the user-item rating matrix can be used. In general, P P 2 R KÂM and Q Q 2 R KÂN are latent user and item feature matrices, and the column vectors P P m 2 R K and Q Q n 2 R K represent use-specific and item-specific latent feature vectors, respectively. Hence, we divide the user-item rating matrix into two sub-matrices P P and Q Q with the constraints of K-dimensional features: The definition of NMF satisfies that factors S ¼ P T (We define S ¼ P T for the simplicity of the derivation in the following paragraph.) and Q are non-negative. The approach proposed in [58] has extended the applicability of NMF, which is called Semi-NMF. It is a NMF variant which both positive and negative signs can exist in the user-item rating matrix R and the first factor S. However, only positive signs can exist in the second factor Q.

A. Linear Representations
In order to solve the recommendation problem, a Semi-NMF exploiting deep learning framework named as Deep Semi-NMF is proposed inspired from [57]. Based on the unsupervised learning pattern, the matrix can be factorized into multiple factors. The differences between traditional Semi-NMF and the proposed Deep Semi-NMF frameworks are shown in Fig. 2. As shown in Fig. 2(a), there is a linear transformation between the new representation Q and the original user-item rating matrix R deriving from Semi-NMF. As demonstrated in Fig. 2(b), multiple hidden representations of the identical hierarchy are learned by Deep Semi-NMF, which uncovers the final low-dimensional representation of the original user-item rating matrix. The cost function can be written as In general, training a deep neural network should cost lots of time. The factor matrices S l and Q l in the proposed Deep Semi-NMF framework need to be approximated in an accelerated way, and thus we pre-train each layer of this neural network. Then the initial approximation of factor matrices S l and Q l can be achieved. As reported in [59], the deep autoencoder networks have been pre-trained to reduce the training time greatly. For example, the original user-item rating matrix can be decomposed into In the similar way, the feature matrix can be decom- Thus we pre-train all the layers. Afterwards, the weight of each layer, i.e., S l and Q l , can be fine-tuned by exploiting alternating minimization, and then the reconstruction error of the model can be reduced dramatically.
1) Updating Step for Weight Matrix S: We fix the remaining weights for the lth layer, and S l makes the cost function (5) achieve the minimum, i.e., the partial derivation on S l is set to be zero @C deep @S l ¼ 0: Thus the updates for S l can be expressed as where F ¼ S 1 S 2 Á Á Á S lÀ1 , andQ l is the inference of the lth layer's feature matrix.

2) Updating
Step for the Feature Matrix Q: Since the nonnegativity of Q l needs to be enforced, the feature matrix Q l can be updated in a similar way as given in [58]. The feature matrix Q l can be formulated as .
We set h to be a small number for preventing (8) to be zero. 0 takes place of matrix A with negative elements, and this matrix is defined as A pos . 0 takes place of matrix A with positive elements, and this matrix is defined as A neg : At first, the Semi-NMF algorithm [58] has been used to approximate the factors greedily. Then the factors are fine-tuned until the convergence criterion is reached. In this paper, the maximum iteration number is fixed at 1000. If the difference between previous update and the current update is smaller than a threshold 10 À6 , we stop the iteration. Thus we can train a Deep Semi-NMF model as described above. The pseudo code of the suggested training algorithm is summarized in Algorithm 1.
Thus the linear representation of Deep Semi-NMF (LRDMF) can be written as The Semi-NMF model actually is the special case of Deep Semi-NMF model with a single layer. The cost function can be written as subject to Q ! 0. The pseudo code of Semi-NMF model with a single layer is summarized in Algorithm 2.

B. Non-Linear Representations
According to the neurophysiology, the human visual system will process it automatically in a hierarchical and non-linear way when person looking at an image. Specifically, the corresponding neurons in the brain process the complex image features sequentially [60]. The authors in [61] exploit an adaptable non-linear image representation algorithm to reduce the statistical and the perceptual redundancy of representation elements for image processing. Inspired from the way of image processing, we introduce the non-linear representation into deep Semi-NMF model for recommendation.
In the previous section, the original user-item rating matrix has been decomposed in a linear way. However, the latent attributes of the non-linearity are ignored in the model. Besides, the linear representation cannot account for the nonlinear relationship efficiently. As a consequence, the non-linear functions should be introduced between the layers in order to extract feature for each latent attribute.
We utilize a non-linear function gðÁÞ between every implicit representation ðQ 1 ; . . . ; Q LÀ1 Þ for approximating the non-linear manifolds on which the user-item rating matrix R lies [33]. In other words, the Deep Semi-NMF model has an enhanced explainability by using a non-linear squashing function. Therefore, we can reconstruct the original user-item rating matrix in an explicit way. This method has been proved in [62] under the scenario of multilayer feedforward network structures. If the hidden units are provided sufficiently and explicitly, any interest function can be approximated by arbitrary squashing functions with any desired accuracy. Deep Semi-NMF is just an instance of multi-layer feedforward network.
It is straightforward to introduce non-linearity in Deep Semi-NMF model, and thus the lth feature matrix Q l can be modified as The cost function of the model is rewritten as Remark 1: It should be noted that the model (13) is more general compared with model (5). The feature vectors of users and items of model (5) are assumed to have zero-mean spherical Gaussian priors. However, the model (13) does not have this constraint.
By using the chain rule, the derivation of lth feature layer can be described as Therefore, the derivation of the first feature layer Q 1 is concordant with the model of one layer Similarly, the derivation for the weight matrices S l can be expressed as Algorithm 2: Semi-NMF with a single layer.
Input: R R 2 R MÂN , the number of components K Output: weight matrix S S 2 R MÂK and feature matrix By using these derivations, the cost function corresponding to the weight of the model can be minimized with gradient decent optimizations by using Nesterov's optimal gradient [63]. Based on the non-linear representation of Deep Semi-NMF (NLRDMF), the original user-item rating matrix can be written as

C. Stochastic Optimization
Unfortunately, for Semi-NMF or NMF, it is difficult to compute for large datasets since the computational complexity of these algorithms would grow quadratically in proportional to the number of items n. In addition, the whole training dataset is required to be resided in main memory. In recent years, the stochastic optimization techniques have been proposed to mitigate these two problems. In each iteration, only a small portion of the dataset is processed. Thus several iterations are required to process the whole dataset, and this method is known as mini-batch [64]. The number of mini-batches is set to be H. The cost function of the stochastic Deep Semi-NMF can be expressed as subject to 8h, where Q h ! 0, and R ½h is the subset of the training set. i.e., a small batch of training set. R ½h contains b ¼ n H examples. The stochastic optimization techniques such as Adam and SGD [65] are adopted for updating all the parameters in the Deep Semi-NMF model. This is an approximation implementation over the whole training set, but the stochastic optimization techniques take effect even for a small batch sizes (32 samples).

V. SOCIAL TRUST ENSEMBLE
The social trust networks will affect users' strategic decisions when users select items. In this section, we analyze this phenomenon, and propose an extended Deep Semi-NMF model based on users' trusted friends. First, we integrate the trust degree to improve the recommendation accuracy. Then, we exploit the autoencoder to extract the latent representation in the trust relationship matrix T 2 R MÂM to further enhance the recommendation accuracy.

A. Trust Degree Ensemble
As mentioned in Section III, a trust relationship matrix T 2 R MÂM is used for representing the trust values among users. Users tend to allocate the scores to their trusted friends in most social networks, and these scores correspond to the trust degrees between user-friend pairs. However, there is no explicit trust values measuring the trust degree in existing online social networks. Consequently, it is significant to construct a model to measure the trust degree of the social networks without trust values.
Generally speaking, most users tend to trust their friends, as well as the recommendations provided by their friends. However, target users may not well satisfy with the recommendations from their trusted friends since the difference between them exists, including their potential interests, preferences or habits. In this case, we take the trust degree into consideration, it contains the similarity and the social based trust degrees. The trust degree trustðv i ; v j Þ assigned to v i from v j is calculated as where T I v i ;v j and T O v i ;v j stand for similarity and the social based trust degrees, respectively, and b is a weight coefficient.
The social behavior in a social network is usually modeled as the trust relationship, such as the followers of a person and the message forwarding in microblog. Thus the similarity based trust degree T I v i ;v j can be modelled as where simðv i ; v j Þ is the similarity between user v i and user v j , which is calculated via cosine similarity between their corresponding vectors P v i and P v j . trustðv i ; v j Þ is the trust value assigned to v j from v i . If the trust value is given, we set trustðv i ; v j Þ ¼ 1. i.e., it represents the existence of a social relationship from v i to v j . Otherwise, trustðv i ; v j Þ ¼ 0. S v i is the user set, including the users connected by v i . The social similarity based trust degree between two nodes measures the effect of local network topologies. When two nodes in a social network have two or more overlapped neighbours, they tend to have the community similarity which is a higher level of node similarity. The adjacent node sets of nodes v i and v j are defined as Nðv i Þ and Nðv j Þ, respectively. The social similarity based trust degree is defined as where DðtÞ represents the degree of node t. Remark 2: It should be noted that (21) and (22) are only applicable to the scenario of two users who connect with each other directly. In terms of the scenario of two users who do not connect directly, the trust degree is calculated by using multiplication as the trust propagation operator. In addition, we only consider the shortest path if multiple trust propagation paths exist in a network.
We contribute to integrate the trust degree into Deep Semi-NMF model. In terms of the representation of Deep Semi-NMF approaches, the user's preference is the only factor which determines the estimated rating assigned to an item, and it can be described as:R m;n ¼ P T m Q n . The recommendations from the target users' trusted friends should also be taken into consideration. Additionally, we distinguish the favors of user m's trust friends and user m, instead of using the favors of user m directly. Namely, after obtaining user m's favors on item n, i.e., P T m Q n , we adjust it by the biases from his/her trusted friends' favors. As a consequence, the difference of the estimated ratings between he/she and his/her trust friends is expressed as where D m;v indicates the average bias on the rating between m and v. For example, there are two users m and v, and they have different rating assignment behaviors. User m is generous and he/she usually assigns high scores to the items, while user v is critical and usually assigns low scores to the items. We assume the ratings for item n from these two users are 5 and 3, respectively. They have different preferences on item n at first glance. However, if the derivations on the ratings and their rating assignment behaviors are analyzed, the score 3 is almost the highest score allocated by v in his/her rating assignment history. From this perspective, we can conclude that both m and v have preferences for item n.
Additionally, we need to consider the biases of users on items. For instance, when we predict the rating assigned on the item n by the user m, if we know that the average rating over all items assigned by m is 3, and m assigns the rating 2.5 to item n, which is 0.5 lower than the average rating. Moreover, user m is critical, and he/she usually rate 0.4 lower than the mean. Finally, the estimated rating of n from m would be 2.1 (2.5-0.4). Therefore, we extend (23) with the biases of users and items as follows where the parameters bias m and bias n indicate the biases of user m and item n, respectively. Thus based on (24), the cost function for LRDMF integrated trust degree (LRDMF-TD) can be rewritten as R LR mn denotes the estimated rating on item n assigned by user m via LRDMF-TD. In order to prevent over-fitting and reduce the complexity of the model, P ¼ Q is set in our experiments. It can be seen that all users are considered in the model for minimizing the whole difference when the social trust relationship are considered. The global minimum of L cannot be achieved because of its inherent inner structure [66] of the matrix factorization model. Fortunately, based on the gradient descent on P m and Q n for user m and item n, the local minimum of the cost function can be expressed as The cost function for NLRDMF integrated trust degree (NLRDMF-TD) can be rewritten as where R NLR mn denotes the estimated rating on item n assigned by user m by NLRDMF-TD. The derivation of gradient descent on P m and Q n for NLRDMF-TD is similar as (26) and (27)

B. Autoencoder Ensemble
In this subsection, we exploit a variant of autoencoder, denoising autoencoder (DAE), to extract the latent representation in the hidden layer of this network. The latent representation of the social relationship can be enhanced from the trust relationship matrix T 2 R MÂM based on DAE. Then the latent representation of the users can be approximated as much as possible by the learned latent representation based on DAE.
1) Marginalized Denoising Autoencoders: Autoencoder is one kind of neural network, and it attempts to copy its input to its output after training the autoencoder which has a hidden layer in the interior of itself. A basic autoencoder consists of two components. An encoder is expressed as an activation function dðÁÞ mapping an input data T into a hidden layer, and the representation of this hidden layer is fðT Þ. A decoder is expressed as a deactivation function fðÁÞ mapping the hidden representation back into the output, and the representation of this reconstructed version of T is fðdðT ÞÞ. In order to learn the most significant features of data distribution from the input trust relationship matrix T , the activation and deactivation functions, such as sigmoid and identify functions, should be selected properly. In this section, the identify and the sigmoid functions are selected as the activation and deactivation functions, respectively.
A DAE is a variant of autoencoder whose input is the corrupted data, and the output is the original, uncorrupted data based on the training of DAE. The randomly generated artificial noise, such as binary masking noise or Gaussian noise, can be injected into the input data with a probability p. From a deep learning perspective, multiple DAEs can be stacked sequentially [26]. The input of the lth DAE is the output of the l À 1th DAE performing as a hidden representation. Therefore, the DAEs contained in the stacked DAEs (SDAE) have to be trained in an iterative way. This layer by layer operation has high computational burden due to the learning of the model parameter in each layer training [56]. In order to overcome the defects of SDAE, a modified version, SDAE with marginalized corruption, is performed in [26]. In contrary to the two-level encoder and decoder in SDAE, we utilize a weight matrix W to map the corrupted input T i into the reconstructed output by minimizing the squared loss function as follows 1 2I where I is the number of samples in the input data, W 2 R MÂM is the mapping consisting of the reconstructed weights, andT i is the corrupted version of the original, uncorrupted T i . c passing with different corruptions have been taken into the input data to reduce the variance. Then the c-times version of T can be represented as T ¼ ½T ; T ; . . .; T 2 R MÂcM . The corrupted version of T can be defined asT 2 R MÂcM . We can rewrite (31) as When c approximates the positive infinity, there are infinity copies of corrupted input data. The mapping weight matrix W has the following closed-form solution where U ¼TT T , V ¼ TT T . Thus we do not have to solve the highly non-convex optimization problem in each layer when we train each layer based on iterative procedure to learn the model parameters. In the procedure of obtaining the weight matrix W , the computational complexity for training a marginalized DAE is reduced significantly. From a deep learning perspective, multiple marginalized DAEs can be stacked sequentially [26]. The input of the lth marginalized DAE is the output of the l À 1th marginalized DAE, which performs as a hidden representation.
2) Latent Representation of Trust Relationship Considering User Preferences: In order to integrate social relationship with the user preferences, we propose a framework that integrating the Deep Semi-NMF with marginalized SDAE to further improve the recommendation accuracy. The framework is shown in Fig. 3. The trust friends of user m can be represented as T m 2 R M , an m-dimensional binary vector, corresponding to the mth row of T . In order to compute the corrupted versioñ T 2 R MÂcM of the trust relationship matrix T 2 R MÂcM , the generated binary masking noise is injected with a probability p based on the deep learning strategy of DAE [56]. Generally, T is sparse, and only the non-zero values of T representing the trust relationship are corrupted by the artificial noise.
We take both the weight matrix W and the user factor matrix S into consideration for improving the recommendation accuracy. The latent representation of the trust relationship, which should be as close as possible to the user factor matrix S 2 R MÂK , can be enhanced from the trust relationship matrix T 2 R MÂM based on the DAE. The orange double arrow indicates the coupling of the trust relationship represented by W and the user preference represented by S.
3) Latent Representation of Trust Relationship: We map an m-dimensional binary vector T m 2 R M into a latent space represented as T 0 m ¼ S T T m 2 R K , m ¼ 1; 2; . . .; M, where S 2 R MÂK is the user factor matrix, and K is the number of latent factors. Different from the similarity measurement in (21), the user similarity can be characterized by the inner product. Thus the similarity in the latent space can be expressed as the inner product of the user factor matrices of two users in the users' latent space In order to integrate the weight matrix W and the user factor matrix S, (34) can be formulated in a matrix form as where W 2 R MÂM is the weight matrix mapping the trust relationship matrix T into the hidden layer, and the latent representation of the trust relationship matrix T in the hidden layer corresponds to the product WT 2 R MÂM . In order to learn the weight matrix W based on the corrupted versionT 2 R MÂcM , we minimize the objective function as where c-times version of T is expressed as T 2 R MÂcM , and the corrupted version of T is expressed asT 2 R MÂcM . 4) Community Regularization: Intuitively, community refers to some dense groups in a network. The nodes within each community are closely connected, but the connections among various communities are sparse. In social networks, the users who share the same opinions or interests tend to form a community [67]. It means that the opinion of one user can be affected by the opinions of other users in the identical community. Thus we need to integrate the community effect into the objective function to improve the recommendation performance. We need to introduce some trust based parameters used in the community detection algorithm.
Trust Potential. As we know, the trust degree of two users in a social network decrease as the distance of two users increases. Thus we need to define a parameter to measure the trust degree of inter-node objectively. Given a social network GðV; EÞ, user v i 2 E is randomly selected from this network. We adopt Uðv i Þ ¼ v 1 ; v 2 ; . . .; v n to denote the users closely connected with v i in this network. The trust potential of user v i at user v j is defined by the Gaussian potential-function as [68] p v i ;v j ¼ exp where the user interaction range is control by the parameter s, and s is determined by the network details. The trust potential for user v i is expressed as Local High-Potential User. A community usually consists of a cluster center and its neighbours. In order to detect a community in a social network, we need to identify the high-potential users in the local network as the initial cluster centers. Given the adjacent users of user v as NðvÞ ¼ u 1 ; u 2 ; . . .; u n . User v is a local high-potential user if it satisfies pðvÞ max pðv; u 1 Þ; pðv; u 2 Þ; . . .; pðv; u n Þ: Trust Condensation. We define trust condensation to identify if a cluster has a good structure. Given a cluster C i and its center node u i 2 C, the trust condensation of C is defined as [68] CT ðC i ; u i Þ ¼ where C i and C i stand for the lower approximation set and the upper approximation set of clustering C i , respectively. The weight for the lower approximation set and the upper approximation set of clustering C i is defined as w low and w up , respectively and w low þ w up ¼ 1. The trust potential of center user u i on user v i is denoted as pðu i ; v i Þ.
For all v i 2 V , the potential difference s in C i and C l is defined as s ¼ pðv i ; C l Þ À pðv i ; C j Þ. If s b, i.e., the potentials of v i in two clusters are similar, we assign v i to the upper approximation set of the intersection of C i and C l ; otherwise, to the lower approximation set of C l . Based on the definition of trust condensation, we can update the cluster as Overlapping Clusters. Given two clusters C i and C j , their overlapping clustering degree is defined as [68] OverðC i ; where minðjC i j; jC j jÞ gives the size of the smaller cluster of C i and C j . The range of overðC i ; C j Þ falls into [0,1]. The overlapping community detection algorithm considering trust-based characteristic can be summarized as follows.
1) The trust degree between different users and the trust potential of each user in a social network are computed via (20) and (38), respectively.
2) The high trust potentials of the users in a social network can be identified via (39).
3) According to the trust potentials, the users in the networks can be classified and placed into clustering upper approximation and the clustering lower approximation sets by exploiting K-medoids clustering, respectively. The clustering center can be updated based on (41) after computing clustering upper approximation and the clustering lower approximation sets. The classification can be terminated until the clustering centers reach a stable state. 4) Different clusters can be merged if most of the users in different clusters are overlapping.
As we know, the users in the same community tend to share similar preferences on items with their trusted friends, who are usually regarded as the neighbors of the target user. We can utilize the meaningful information of these neighbours to improve the prediction accuracy. The neighbors of user u can be defined as NðuÞ ¼ fvjv 2 C^u 2 C; u 6 ¼ vg; where C is the community that contains user u. It should be noted that multiple communities can contain user u, and thus all the communities containing u should be taken into consideration. The behavior of the given user u would be affected by his/her neighbors NðuÞ because of the community effect. It means that the behavior difference between the given user and his/her neighbours should be minor. This phenomena can be expressed in a mathematical form by minimizing the following formulation: The above equation can be utilized to minimize the preference between a user and his/her neighborhood to an average level. It means that a user's preference should be similar to the general preferences of all neighbours NðuÞ. 5) Parameters Training: By integrating the matrix factorization technique and the community effect with (36), we have the joint objective function as follows where m is regularization parameter, and the coefficient of Frobenius norm of S and Q are utilized for controlling the overfitting problem of model parameters. The third term couples the trust relationship with user preferences as mentioned in the former subsection.
Since the optimization function in (45) is a non-convex problem involving the matrices W , S and Q, we utilize a suboptimal strategy to solve this problem. In each iteration, we fix two variables and update one variable as an alterative suboptimal strategy.
By discarding the irrelevant term with respect to W in (45), we can reformulate the objective function by only considering W and fixing S and Q by minimizing the following optimal problem as According to (33), W has a closed solution W ¼ E½UE½V À1 , where U 2 R MÂM , and V 2 R MÂM . They can be updated using the equations as follows Then, by discarding the irrelevant term with respect to matrices S and Q in (45), the objective function (45) can be rewritten as By taking the partial derivations of (48) with the matrices S and Q, we have @LðS; Then, we can update the model parameter S and Q based on the classical gradient descent method, which are expressed as where t stands for the tth iteration, and h 1 and h 2 stand for the learning rates. The maximum number of iterations is fix at 1000. The terminal rule is that the difference between two adjacent iterations satisfies L ðtþ1Þ À L ðtÞ =L ðtÞ 1e À 05, where L ðtÞ is the value of (48) in the tth iteration. 6) DMF-Based Initialization for Marginalized DAEs: From a deep learning perspective, multiple marginalized DAEs as shown in Fig. 3 can be stacked sequentially, which is called Deep-MDAEs. The input of the lth marginalized DAE, which performs as a hidden representation, is the output of the ðl À 1Þth marginalized DAE. If there are L layers in the Deep-MDAEs, the deepest layer will be the ðL þ 1Þ=2th layer. In different hidden layers, we have different latent representations for trust relationships. Thus the latent representations of the deepest layer in the Deep-MDAEs should be as close as the user factor matrix S. As shown in (33), the close-form expression of the weight matrix W ¼ E½UE½V À1 , which has been updated based on (47). However, in this paper, we update U by using U ¼TT T until we reach the ðL þ 1Þ=2th layer, i.e., the deepest layer. In the ðL þ 1Þ=2th layer, we update U by using U ¼TT T þ T SS T T T T , where S is update based on the LRDMF and NLRDMF methods in (10) and (18), respectively. Then, the model parameter S and Q can be updated based on (50). The final recommendation matrix can be calculated byR ¼ŜQ. The Deep-MDAE initialized by LRDMF and NLRDMF without community effect are called LRDMF-DMDAE and NLRDMF-DMDAE, respectively. The Deep-MDAE initialized by NLRDMF with community effect is called NLRDMF-DMDAECE.
The steps of DMF-DMDAECE are summarized as follows: 1) We perform the deep matrix factorization for the known part of the user-item rating matrix R to obtain better initializations of latent user and item feature matrices; 2) We calculate the trust degree based on (20) including the similarity and the social based trust degrees, respectively; 3) The DMF-TD algorithms can be achieved by optimizing (25) and (28); 4) We perform the overlapped community detection algorithm to detect the community in a trust relationship network; 5) We formulate a new joint objective function (45) by taking trust information and community effect into consideration; 6) We construct the marginalized DAEs to optimize the objective function (45); 7) We utilize the results of deep matrix factorization to initialize the marginalized DAEs; 8) We obtain the final result by training the marginalized DAEs.

VI. EXPERIMENTS
In the field of trust-aware recommendation, the publicly available and suitable dataset is rare, we mainly adopt the following two datasets: 1) Epinions: Epinions is available freely [42], which is composed of 49 290 users and 139 738 items. The number of ratings and trust relationships contained in Epinions are 664 824 and 487 181, respectively. The scale of rating is from 1 to 5. We build a social trust network by using these records. Each user in Epinions keeps a trust relationship with others. In addition, the density of the user-item rating matrix is less than 0.01%.
2) Flixster: This is a social network allowing users to assign scores for movies [39]. It consists of 1 049 445 users who have rated 492 359 different items. The total number of ratings is 8 238 597. The total number of trust relationships is 26 771 123. The density of the rating matrix is lower than 0.0016%.
The rating matrixes extracted from Epinions and Flixster are both sparse. The density of Movielens, which consists of 6040 users, 3900 movies, and 1 000 209 ratings, is 4.25%, and the density of Eachmovie, which consists of 74 424 users, 1648 movies, and 2 811 983 ratings, is 2.29% [38]. Therefore, Epinions and Flixster are both ideal sources for make our trust-aware recommendations.
We use three metrics, the Root Mean Square Error (RMSE), the precision and F-Measure, to measure the performance of our proposed methods, i.e., LRDMF-TD, NLRDMF-TD, LRDMF-DMDAE, NLRDMF-DMDAE and NLRDMF-DMDAECE comparing with other the state-of-the-art recommendation methods. The metrics RMSE for measuring the error in recommendation is defined as where R m;n denotes the rating assigned to item n by user m, R m;n denotes the predicted rating assigned to item n by user m via a method, and T test denotes the number of tested ratings. Meanwhile, most recommendation approaches cannot deal with the task that predicting all the ratings in the test data under the scenario of high sparse data. Therefore, the metric coverage rate can be adopted to measure the proportional of huser; itemi pairs, and the values can be predicted as where S represents the number of ratings being predicted, and T test represents the number of ratings being tested. Moreover, we integrate RMSE and coverage to form a full metric following the F-Measure's example. Therefore, the RMSE has to be converted into the metric of precision, whose value is distributed in the range of [0, 1]. We formulate the precision as: It can be inferred from this equation that the maximum possible error is 4, since all rating values are between 1 and 5. The definition of F-Measure is given as The initializations of most existing matrix factorization methods are straightforward, i.e., P and Q are initialized as dense matrices consisting of random numbers. We propose specific initialization methods in this paper, and compare it with random initialization, K-means initialization, normalized-cut (Ncut) initialization [69] and SVD-based initialization [70], autoencoder initialization [44] in order to verify the superiority of initialization with LRDMF and NLRDMF based approaches. Moreover, we remove the community detection algorithm in [44] for fair comparison, and then the method in [44] is called Auto-TD.
The dimension of the feature matrices is set to be K ¼ 80. The DMF model consists of two representation layers (1260-625), and the scaled hyperbolic tangent stanhðxÞ ¼ atanhðbxÞ with a ¼ 0:7159 and b ¼ 2 3 is used as the non-linearity function. For DMDAE based methods, the regularization parameter is 0.1, and the number of the stacked MDAEs is 10. For community detection, we use 2-trust-cliques. For Epinions and Flixster dataset, the regularization parameter are set as m ¼ 10 and m ¼ 5, respectively. We set the parameter s to 1.886, the clustering overlapping threshold to 0.75, W up is set to 0.1 and the weight parameter b ¼ 0:6. The percentage of the input data corrupted by the binary masking noise is 50%.
It can be seen from Fig. 4 that the RMSEs of LRDMF and NLRDMF are much smaller than the RMSEs of other approaches. In particular, the RMSEs of DMF based approaches are smaller than those of the Auto-TD approach proposed in [44]. This is because that LRDMF and NLRDMF extract more abstract features from the original space compared with other approaches, and the initialized latent feature matrices learned by LRDMF and NLRDMF make (25) and (28) more closer to the global minimum. The RMSE of NLRDMF is smaller than that of LRDMF because of the non-linearity learned by NLRDMF. The RMSEs of trust DAE based methods are smaller than those of the trust degree based methods. This is caused by the reason that the deep structure of the DAE based methods enforces the latent represent of the trust relationship as much as the user factor matrix S. NLRDMF-DMDAECE performs best of all since we take the community effect into consideration.
In order to find the best dimension K, the RMSEs versus various dimensions for NLRDMF-TD and NLRDMF-DMDAE are depicted in Fig. 5. P and Q are fixed at 0.1. It can be seen that there is a turning point around 80 for Epinions, and there is a turning point around 70 for Flixster. The main reason is that a relative larger dimension can improve the prediction accuracy. However, when the number of dimension is too large, the overfitting may exist which leads to the degradation of the prediction accuracy.
In order to have an intuitive expression about the relationship between the value of P ¼ Q and the best dimension K, we draw different values of P ¼ Q ¼ 0:2; 0:1, respectively, with varying the best dimension K. It can be seen from Fig. 5 that when P ¼ Q ¼ 0:1, our proposed framework NLRDMF-DMDAECE performs best of all.
In our model, the LRDMF and NLRDMF approaches are used to reduce dimension and extract features from the useritem rating matrix. We compare these two approaches with not only classical algorithms such as principal component analysis (PCA) [71] and locally linear embedding (LLE) [72] but also with other NMF variants such as multi-layer NMF [73] and NeNMF [74] for pretraining in order to validate the effectiveness of feature extraction. It can be seen from Fig. 6 that compared with the classical approaches, the four NMF variants approaches can improve the prediction accuracy greatly. The RMSEs of LRDMF-TD and NLRDMF-TD approaches are smaller than those of multi-layer NMF and NeNMF approaches. This is because that LRDMF-TD and NLRDMF-TD approaches extract better features compare with Multi-Layer NMF and NeNMF approaches. The RMSEs of LRDMF-DMDAE and NLRDMF-DMDAE methods are smaller than those of LRDMF-TD and NLRDMF-TD methods. This is caused by the reason that the deep structure of the LRDMF-DMDAE and NLRDMF-DMDAE methods enforces the latent represent of the trust relationship as much as the   user factor matrix S. More latent representations in the hidden layers have been learned compared with the trust degree. The RMSE of NLRDMF-DMDAECE is the smallest of all because it takes both trust information and community effect into consideration.
We compare our proposed approach with the following baseline methods to show the superiority of our proposed methods. It should be noted that NLRDMF approach is used for the initialization of DMFTrust since it has a better prediction accuracy compared with LRDMF. 1) UserCF: This is a typical user-based collaborative filtering method, and it utilizes the users' similarity for predicting missing values.
2) ItemCF: This is a typical item-based collaborative filtering method, and it capture the items' similarity for predicting missing values.
3) TidalTrust: A trust inference algorithm is used for recommendation [41]. 4) MoleTrust: This algorithm can promote trust in social networks, and the trust weight corresponds to the similarity weight [42]. 5) BMF: This is the basic matrix factorization method proposed in [17], and the trust social network is not considered. 6) STE: This method combines users' preferences with their trusted friends' favors [38]. 7) SocialMF: This method fuses the trust propagation in recommendation systems [39]. The parameter is set to be 5, which provides the best recommendation performance in this experiment. 8) CDL: This is a deep learning-based method proposed in [29]. However, the content information and the trust network are not used. 9) NLRDMF: The Non-Linear Representation DMF part of NLRDMF-TD is considered, while Trust Degree (TD) is not considered.
10) TD: The Trust Degree (TD) part of NLRDMF-TD is considered, while the Non-Linear Representation DMF is not considered. 11) NLRDMF-TD: The proposed NLRDMF-TD method that considers both Non-Linear Representation DMF and the Trust Degree (TD). 14) NLRDMF-DMDAECE: The proposed NLRDMF-DMDAECE method that considers community effect.
In this section, we mainly analyze the comparison results with different approaches and datasets. Specifically, we compare the above methods with our proposed methods by using Epinions and Flixster datasets, respectively, which have different data sparsity. In terms of the parameters, the dimension of latent feature matrix is fixed at K ¼ 80 for Epinions dataset and K ¼ 70 for Flixster dataset. As a result, we can see from Table I that the NLRDMF-TD and NLRDMF-DMDAE outperform other methods, and the STE and SocialMF methods outperform the BMF method, which only adopts the user-item rating matrix for recommendation. Besides, the TidalTrust and MoleTrust methods are superior to BMF method. It can be known that the performance of collaborative filtering-based approaches are not well enough. It relies on the trusted friends' comments, and thus it is not suitable for sparse data. It can be known that UserCF and ItemCF cannot work well at this situation. TD method performs worse than the traditional UserCF and ItemCF based methods. The NLRDMF-DMDAECE performs best of all.
Experimental has been implemented to analyze the independent contributions benefited from the two steps, i.e., DMF and trust degree, in Table I and Table II. It can be seen that the performance of the trust degree (TD) based method is worse than that of the traditional UserCF and ItemCF methods. The NLRDMF performs better than the recent developed deep learning based method, and outperforms other traditional matrix factorization based methods as well. The coverage of NLRDMF is five times of that of TD, and the F-measure of NLRDMF is three times of that of TD. Thus we can conclude that "deep" is more important and beneficial than trust.
We regard cold-start users as those who have rating assignment behaviors less than 5 times [42]. Thus how to recommend items to cold-start users is still a challenge for recommender systems. In the Epinions, most users belong to cold-start users. Therefore, it is meaningful for recommending cold-start users with high effectiveness. In order to validate the performance superiority of our proposed methods compared with other methods, the cold-start users are picked out from the two datasets for comparing the recommendation performance of these methods. The final comparison results are shown in Table II. Our proposed methods outperform other methods for cold-start users. This indices that our proposed methods are more suitable for dealing with cold-start users. We can also obtain from both Table I and Table II that the improvement on our proposed methods for cold-start users is higher than that for all users.

VII. CONCLUSION
In this paper, based on deep learning technique and trust relationship, a novel trust-based deep learning method is proposed for recommendation by integrating community effect. Since the matrix factorization-based approaches rely much on the initialization of the latent feature matrices, a novel deep architecture for matrix factorization named DMF is proposed. This method can extract better features from the original space to improve the initialization accuracy. Then the DMF is integrated into the Deep-MDAE to extract the trust relationship. By taking the community effect into consideration, the recommendation accuracy is further improved. Our experimental results verify that our proposed methods outperform other baseline methods. In the future work, we will integrate the DMF technique into collaborative filtering approaches to further improve the recommendation performance. He has authored or coauthored two books and more than 300 scientific papers in international journals and conferences. His research interests include data science, social computing, and systems engineering. He is a Senior Member of ACM.
Xiangjie Kong (Senior Member, IEEE) received the B.Sc. and Ph.D. degrees from Zhejiang University, Hangzhou, China. He is currently a Full Professor with the College of Computer Science and Technology, Zhejiang University of Technology. He has authored or coauthored more than 130 scientific papers in international journals and conferences. His research interests include network science, mobile computing, and computational social science. He is a Senior Member CCF and is a member of ACM.
Ching-Hsien Hsu (Senior Member, IEEE) is the Chair Professor and the Dean of the College of Information and Electrical Engineering, Asia University, Taiwan. His research interests include cloud computing, parallel and distributed systems, big data analytics, artificial intelligence, and smart medical. He was the recipient of seven Talent awards from the Ministry of Science and Technology, Ministry of Education, Taiwan. He is a Fellow of the Institution of Engineering and Technology and the Chair of IEEE Technical Committee on Cloud Computing (TCCLD).
Runhe Huang received the Ph.D degree in computer science and mathematics from the University of the West of England, Bristol, U.K., in 1993. He is currently a Full Professor with the Faculty of Computer and Information Sciences, Hosei University, Japan. Her research interests include multi-agent systems, computational intelligence computing, ubiquitous intelligence computing, and big data.
Jianhua Ma received the B.S. and M.S. degrees from the National University of Defense Technology, Changsha, China, in 1982 and 1985, respectively, and the Ph.D. degree from Xidian University, Xi'an, China, in 1990. He is a Professor with the Faculty of Computer and Information Sciences, Hosei University, Tokyo, Japan. He has authored or coauthored more than 200 papers and edited more than 20 books/proceedings and more than 20 journal special issues. His research interests include multimedia, networks, ubiquitous computing, social computing, and cyber intelligence.