为了学习生成器分布 p g p_g pg在数据 x x x上的分布,生成器从先验噪声分布 p z ( z ) p_z(z) pz(z)构建到数据空间的映射函数 G ( z ; θ g ) G(z; \theta_g) G(z;θg)。而判别器 D ( x ; θ d ) D(x; \theta_d) D(x;θd)输出一个标量,表示 x x x来自训练数据而不是 p g p_g pg的概率。
G和D都同时进行训练:我们调整G的参数以使 log ( 1 − D ( G ( z ) ) \log(1 - D(G(z)) log(1−D(G(z))最小化,并调整D的参数以使 log D ( X ) \log D(X) logD(X)最小化,就好像它们在遵循具有值函数 V ( G , D ) V(G, D) V(G,D)的两玩家极小极大博弈:
min G max D V ( D , G ) = E x ∼ p data ( x ) [ log D ( x ) ] + E z ∼ p z ( z ) [ log ( 1 − D ( G ( z ) ) ) ] 。 (1) \min_G \max_D V(D, G) = \mathbb{E}{x \sim p{\text{data}}(x)} [\log D(x)] + \mathbb{E}_{z \sim p_z(z)} [\log(1 - D(G(z)))]。 \tag{1} GminDmaxV(D,G)=Ex∼pdata(x)[logD(x)]+Ez∼pz(z)[log(1−D(G(z)))]。(1)
3.2 条件生成对抗网络
生成对抗网络可以扩展到条件模型,如果生成器和判别器都基于一些额外的信息 y y y进行条件化。 y y y可以是任何类型的辅助信息,例如类标签或来自其他模态的数据。我们可以通过将 y y y作为附加的输入层输入到判别器和生成器中来执行条件化。
在生成器中,先验输入噪声 p z ( z ) p_z(z) pz(z)和 y y y结合在联合隐藏表示中,而对抗训练框架允许在组成这个隐藏表示方面具有相当大的灵活性。[1](#1)
在判别器中, x x x和 y y y被呈现为输入,并输入到判别函数(在这种情况下再次由MLP体现)。
两个玩家极小极大博弈的目标函数将与等式2相同
min G max D V ( D , G ) = E x ∼ p data ( x ) [ log D ( x ∣ y ) ] + E z ∼ p z ( z ) [ log ( 1 − D ( G ( z ∣ y ) ) ) ] 。 (2) \min_G \max_D V(D, G) = \mathbb{E}{x \sim p{\text{data}}(x)} [\log D(x|y)] + \mathbb{E}_{z \sim p_z(z)} [\log(1 - D(G(z|y)))]。 \tag{2} GminDmaxV(D,G)=Ex∼pdata(x)[logD(x∣y)]+Ez∼pz(z)[log(1−D(G(z∣y)))]。(2)
在生成器网络中,从单位超立方体中均匀分布抽取了一个具有100个维度的噪声先验 z z z。 z z z和 y y y都被映射到具有整流线性单元(ReLu)激活[4, 11]的隐藏层,层大小分别为200和1000,然后再被映射到维度为1200的第二个组合隐藏ReLu层。然后我们有一个最终的Sigmoid单元层作为生成784维MNIST样本的输出。
判别器将 x x x映射到具有240个单元和5个部分的maxout [6]层,并将 y y y映射到具有50个单元和5个部分的maxout层。在被送入Sigmoid层之前,两个隐藏层都映射到一个具有240个单元和4个部分的联合maxout层。(判别器的确切架构并不关键,只要它具有足够的能力;我们发现maxout单元通常适合这项任务。)
Goodfellow, I. J.,Warde-Farley, D.,Mirza, M.,Courville, A.和Bengio, Y.(2013b)。最大输出网络。在ICML'2013上。
Goodfellow, I. J., Warde-Farley, D., Lamblin, P., Dumoulin, V., Mirza, M., Pascanu, R., Bergstra, J., Bastien, F.和Bengio, Y.(2013c)。Pylearn2:一个机器学习研究库。arXiv预印本arXiv:1308.4214。
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.和Bengio, Y.(2014)。生成对抗网络。在NIPS'2014上。
Hinton, G. E.,Srivastava, N.,Krizhevsky, A.,Sutskever, I.和Salakhutdinov, R.(2012)。通过防止特征检测器的共适应来改善神经网络。技术报告,编号:arXiv:1207.0580。
Huiskes, M. J.和Lew, M. S.(2008)。mir flickr检索评估。在MIR'08:2008年ACM国际多媒体信息检索大会上,纽约,美国。ACM。
Bengio, Y., Mesnil, G., Dauphin, Y., and Rifai, S. (2013). Better mixing via deep representations. In ICML'2013.
Bengio, Y., Thibodeau-Laufer, E., Alain, G., and Yosinski, J. (2014). Deep generative stochastic networks trainable by backprop. In Proceedings of the 30th International Conference on Machine Learning (ICML'14).
Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., Mikolov, T., et al. (2013). Devise: A deep visual-semantic embedding model. In Advances in Neural Information Processing Systems, pages 2121--2129.
Glorot, X., Bordes, A., and Bengio, Y. (2011). Deep sparse rectifier neural networks. In International Conference on Artificial Intelligence and Statistics, pages 315--323.
Goodfellow, I., Mirza, M., Courville, A., and Bengio, Y. (2013a). Multi-prediction deep Boltzmann machines. In Advances in Neural Information Processing Systems, pages 548--556.
Goodfellow, I. J., Warde-Farley, D., Mirza, M., Courville, A., and Bengio, Y. (2013b). Maxout networks. In ICML'2013.
Goodfellow, I. J., Warde-Farley, D., Lamblin, P., Dumoulin, V., Mirza, M., Pascanu, R., Bergstra, J., Bastien, F., and Bengio, Y. (2013c). Pylearn2: a machine learning research library. arXiv preprint arXiv:1308.4214.
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative adversarial nets. In NIPS'2014.
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. Technical report, arXiv:1207.0580.
Huiskes, M. J. and Lew, M. S. (2008). The mir flickr retrieval evaluation. In MIR '08: Proceedings of the 2008 ACM International Conference on Multimedia Information Retrieval, New York, NY, USA. ACM.
Jarrett, K., Kavukcuoglu, K., Ranzato, M., and LeCun, Y. (2009). What is the best multi-stage architecture for object recognition? In ICCV'09.
Kiros, R., Zemel, R., and Salakhutdinov, R. (2013). Multimodal neural language models. In Proc. NIPS Deep Learning Workshop.
Krizhevsky, A., Sutskever, I., and Hinton, G. (2012). ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 (NIPS'2012).
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient estimation of word representations in vector space. In International Conference on Learning Representations: Workshops Track.
Russakovsky, O. and Fei-Fei, L. (2010). Attribute learning in large-scale datasets. In European Conference of Computer Vision (ECCV), International Workshop on Parts and Attributes, Crete, Greece.
Srivastava, N. and Salakhutdinov, R. (2012). Multimodal learning with deep Boltzmann machines. In NIPS'2012.
Vision (ECCV), International Workshop on Parts and Attributes, Crete, Greece.
Srivastava, N. and Salakhutdinov, R. (2012). Multimodal learning with deep Boltzmann machines. In NIPS'2012.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabiovich, A. (2014). Going deeper with convolutions. arXiv preprint arXiv:1409.4842.