Neural Architecture Transfer

Abstract ---Neural architecture search (NAS) has emerged as a promising avenue for automatically designing task-specific neural networks. Existing NAS approaches require one complete search for each deployment specification of hardware or objective. This is a computationally impractical endeavor given the potentially large number of application scenarios. In this paper, we propose Neural Architecture Transfer (NAT) to overcome this limitation. NAT is designed to efficiently generate task-specific custom models that are competitive under multiple conflicting objectives. To realize this goal we learn task-specific supernets from which specialized subnets can be sampled without any additional training. The key to our approach is an integrated online transfer learning and many-objective evolutionary search procedure. A pre-trained supernet is iteratively adapted while simultaneously searching for task-specific subnets.
We demonstrate the efficacy of NAT on 11 benchmark image classification tasks ranging from large-scale multi-class to small-scale fine-grained datasets. In all cases, including ImageNet, NATNets improve upon the state-of-the-art under mobile settings (≤ 600M Multiply-Adds). Surprisingly, small-scale fine-grained datasets benefit the most from NAT. At the same time, the architecture search and transfer is orders of magnitude more efficient than existing NAS methods. Overall, experimental evaluation indicates that, across diverse image classification tasks and computational objectives, NAT is an appreciably more effective alternative to conventional transfer learning of fine-tuning weights of an existing network architecture learned on standard datasets. Code is available at
Index Terms ---Convolutional Neural Networks, Neural Architecture Search, AutoML, Transfer Learning, Evolutionary Algorithms.
1 I NTRODUCTION
I
MAGE classification is a fundamental task in computer vision,
where given a dataset and, possibly, multiple objectives to
optimize, one seeks to learn a model to classify images. Solutions to
address this problem fall into two categories: (a) Sufficient Data: A
custom convolutional neural network architecture is designed and
its parameters are trained from scratch using variants of stochastic
gradient descent, and (b) Insufficient Data: An existing architec
ture designed on a large scale dataset, such as ImageNet [1], along
with its pre-trained weights (e.g., VGG [2], ResNet [3]), is fine
tuned for the task at hand. These two approaches have emerged as
the mainstays of present day computer vision.
Success of the aforementioned approaches is primarily at
tributed to architectural advances in convolutional neural net
works. Initial efforts at designing neural architectures relied on
human ingenuity. Steady advances by skilled practitioners has
resulted in designs, such as AlexNet [4], VGG [2], GoogLeNet

5\], ResNet \[3\], DenseNet \[6\] and many more, which have led to performance gains on the ImageNet Large Scale Visual Recogni tion Challenge \[1\]. In most other cases, a recent large scale study \[7\] has shown that, across many tasks, transfer learning by fine tuning ImageNet pre-trained networks outperforms networks that are trained from scratch on the same data. Moving beyond manually designed network architectures, Neural Architecture Search (NAS) \[8\] seeks to automate this process and find not only good architectures, but also their associated weights for a given image classification task. This goal has led to notable improvements in convolutional neural network architectures on standard image classification benchmarks, such as ImageNet, CIFAR-10 \[9\], CIFAR-100 \[9\] etc., in terms of predictive performance, computational complexity and mod el size. However, apart from transfer learning by fine-tuning the *weights* , current NAS approaches have failed to deliver new models for both *weights* and *topology* on custom non-standard datasets. The key barrier to realizing the full potential of NAS is the large data and computational requirements for employing existing NAS algorithms on new tasks. In this paper, we introduce *Neural Architecture Transfer* (NAT) to breach this barrier. Given an image classification task, NAT obtains custom neural networks (both *topology* and *weights* ), optimized for possibly many conflicting objectives, and does so without the steep computational burden of running NAS for each new task from scratch. A single run of NAT efficiently obtains multiple custom neural networks spanning the entire trade-off front of objectives. Our solution builds upon the concept of a supernet \[10\] which comprises of many subnets. All subnets are trained simultaneously through weight sharing, and can be sampled very efficiently. This procedure decouples the network training and the search phases of NAS. A many-objective 1 search can then be employed on top of the supernet to find all network architectures that provide the best trade-off among the objectives. However, training such supernets for each task from scratch is very computationally and data intensive. The key idea of NAT is to leverage an existing supernet and efficiently transfer it into a task-specific supernet, whilst simultaneously searching for architectures that offer the best trade off between the objectives of interest. Therefore, unlike standard supernet-based NAS, we combine supernet transfer learning with the search process. At the conclusion of this process, NAT returns ![](https://i-blog.csdnimg.cn/direct/2d5c6f75709e4ff29fcc1b81ec827fdf.png) Fig. 1: Overview: Given a dataset and objectives to optimize, NAT designs custom architectures spanning the objective trade-off front. NAT comprises of two main components, supernet adaptation and evolutionary search, that are iteratively executed. NAT also uses an online accuracy predictor model to improve its computational efficiency (i) subnets that span the entire objective trade-off front, and (ii) a task-specific supernet. The latter can now be utilized for all future deployment-specific NAS, i.e., new and different hardware or objectives, without any additional training. The core of NAT's efficiency lies in only adapting the subnets of the supernet that will lie on the efficient trade-off front of the new dataset, instead of all possible subnets. But, the structure of the corresponding subnets is unknown before adaptation. We resolve this "chicken-and-egg problem" by adopting an online procedure that alternates between the two primary stages of NAT: (a) *supernet adaptation* of subnets that are at the current trade-off front, and (b) *evolutionary search* for subnets that span the many objective trade-off front. A pictorial overview of the entire NAT method is shown in Fig.1. In the *adaptation* stage, we first construct a layer-wise em pirical distribution from the promising subnets returned by evo lutionary search. Then, subnets sampled from this distribution are fine-tuned. In the *search* stage, to improve the efficiency of the search, we adopt a surrogate model to quickly predict the objectives of any sampled subnet without a full-blown and costly evaluation. Furthermore, the predictor model itself is also learned online from previously evaluated subnets. We alternate between these two stages until our computational budget 2 is exhausted. The key contributions of this paper are: -- We introduce *Neural Architecture Transfer* as a NAS-powered alternative to fine-tuning based transfer learning. NAT is powered by a simple, yet highly effective online supernet fine-tuning and online accuracy predicting surrogate model. -- We demonstrate the scalability and practicality of NAT on multiple datasets corresponding to different scenarios; large-scale multi-class (ImageNet \[1\], CINIC-10 \[12\]), medium-scale multiclass (CIFAR-10, CIFAR-100 \[9\]), small-scale multi-class (STL- 10 \[13\]), large-scale fine-grained (Food-101 \[14\]), medium-scale fine-grained (Stanford Cars \[15\], FGVC Aircraft \[16\]) and smallscale fine-grained (DTD \[17\], Oxford-IIIT Pets \[18\], Oxford Flowers102 \[19\]) datasets. -- Under mobile settings (≤ 600M MAdds), NATNets lead to state-of-the-art performance across all these tasks. For instance, on ImageNet, NATNet achieves a Top-1 accuracy of 80.5% at 600M MAdds. **2 R** **ELATED** **W** **ORK** Recent years have witnessed growing interests in neural architec ture search. The promise of being able to automatically search for task-dependent network architectures is particularly appealing as deep neural networks are widely deployed in diverse applications and computational environments. Early methods \[33\], \[34\] made efforts to simultaneously evolve the topology of neural networks along with weights and hyperparameters. These methods per form competitively with hand-crafted networks on simple control tasks with shallow fully connected networks. Recent efforts \[35

primarily focus on designing deep convolutional neural network
architectures.
The development of NAS largely happened in two phases.
Starting from NASNet [8], the focus of the first wave of methods
was primarily on improving the predictive accuracy of CNNs in
cluding Block-QNN [36], Hierarchical NAS [37], and AmoebaNet

38\], etc. These methods relied on Reinforcement Learning (RL) or Evolutionary Algorithm (EA) to search for an optimal modular structure that is repeatedly stacked together to form a network architecture. The search was typically carried out on relatively small-scale datasets (e.g. CIFAR-10/100 \[9\]), following which the best architectures were transferred to ImageNet for validation. A steady stream of improvements over state-of-the-art on numerous datasets were reported. The focus of the second wave of NAS methods was on improving the search efficiency. A few methods have also been proposed to adapt NAS to other scenarios. These include meta-learning based approaches \[39\], \[40\] with application to few-shot learning tasks. XferNAS \[41

and EAT-NAS [42] illustrate how architectures can be transferred
between similar datasets or from smaller to larger datasets. Some
approaches [43], [44] proposed RL-based NAS methods that
TABLE 1: Comparison of NAT and existing NAS methods. † indicates methods that scalarize multiple objectives into one composite objective or as an additional constraint, see text for details.

search on multiple tasks during training and transfer the learned
search strategy, as opposed to searched networks, to new tasks at
inference. Next, we provide short overviews on methods that are
closely related to the technical approach in this paper. Table 1 pro
vides a comparative overview of NAT to existing NAS approaches.
Performance Prediction: Evaluating the performance of an archi
tecture requires a computationally intensive process of iteratively
optimizing model weights. To alleviate this computational burden,
regression models have been learned to predict an architecture's
performance without actually training it. Baker et al. [45] use a
radial basis function to estimate the final accuracy of architectures
from its accuracy in the first 25% of training iterations. PNAS

23\] uses a multilayer perceptron (MLP) and a recurrent neural network to estimate the expected improvement in accuracy if the current modular structure (which is later stacked together to form a network) is expanded with a new branch. Conceptually, both of these methods seek to learn a prediction model that extrapolate (rather than interpolate), resulting in poor correlation in prediction. OnceForAll \[31\] also uses a MLP to predict accuracy from architecture encoding. However, the model is trained offline for the entire search space, thereby requiring a large number of samples for learning (16K samples - \> 2 GPU-days for just constructing the surrogate model). Instead of using uniformly sampled archi tectures to train the prediction model to approximate the entire landscape, ChamNet \[29\] trains many architectures through full SGD and selects only 300 samples of high accuracy with diverse efficiency (Multiply-adds, Latency, Energy) to train a prediction model offline. In contrast, NAT learns a prediction model in an online fashion only on the samples at the current trade-off front as we explore the search space. Such an approach only needs to interpolate over a much smaller space of architectures constituting the current trade-off front. Consequently, this procedure signifi- cantly improves both the accuracy and the sample complexity of constructing the prediction model. Weight Sharing: Approaches in this category involve training a *supernet* that contains all searchable architectures as its subnets. They can be broadly classified into two categories depending on whether the supernet training is coupled with architecture search or decoupled into a two-stage process. Approaches of the former kind \[24\], \[26\], \[46\] are computationally efficient but return sub optimal models. Numerous studies \[47\], \[48\], \[49\] allude to weak correlation between performance at the search and final evaluation stages. Methods of the latter kind \[10\], \[31\], \[50\] use performance of subnets (obtained by sampling the trained supernet) as a metric to select architectures during search. However, training a supernet beforehand for each new task is computationally prohibitive. In this work, we take an integrated approach where we train a supernet on large-scale datasets (e.g. ImageNet) once and couple it with our architecture search to quickly adapt it to a new task. An elaborated discussion connecting our method to existing approaches is provided in Section A. Multi-Objective NAS: Methods that consider multiple objectives for designing hardware specific models have also been developed. The objectives are optimized either through (i) scalarization, or (ii) Pareto-based solutions. The former include, ProxylessNAS \[26\], MnasNet \[27\], ChamNet \[29\], MobileNetV3 \[22\], and FBNetV2 \[32\] which use a scalarized objective or an additional constraint to encourage high accuracy and penalize compute inefficiency at the same time, e.g., maximize Acc ∗ ( Latency/T arget ) − 0 . 07 . Conceptually, the search of architectures is still guided by a single objective and only one architecture is obtained per search. Em pirically, multiple runs with different weighting of the objectives are needed to find an architecture with the desired trade-off, or multiple architectures with different complexities. Methods in the latter category include \[25\], \[51\], \[52\], \[53\], \[54\] and aim to approximate the entire Pareto-efficient frontier simultaneously--- i.e. multiple architectures with different complexities are obtained in a single run. These approaches rely on heuristics (e.g., EA) to efficiently navigate the search space allowing practitioners to visualize the trade-off between the objectives and to choose a suitable network *a posteriori* to the search. NAT falls into the latter category and uses an accuracy prediction model and weight sharing for efficient architecture transfer to new tasks. **3 P** **ROPOSED** **A** **PPROACH** *Neural Architecture Transfer* consists of three main components: an accuracy predictor, an evolutionary search routine, and a supernet. NAT starts with an archive A of architectures (subnets) created by uniform sampling from our search space. We evaluate the performance f i of each subnet ( a i ) using weights inherited from the supernet. The accuracy predictor is then constructed from ( a i , f i ) pairs which (jointly with any additional objectives provided by the user) drives the subsequent many-objective evolu tionary search towards optimal architectures. Promising architec- 4 tures at the conclusion of the evolutionary process are added to the archive A . The (partial) weights of the supernet corresponding to the top-ranked subnets in the archive are fine-tuned. NAT repeats this process for a pre-specified number of iterations. At the con clusion, we output both the archive and the task-specific supernet. Networks that offer the best trade-off among the objectives can be post-selected from the archive. Detailed descriptions of each component of NAT are provided in the following subsections. Figure 1 and Algorithm 1 provide an overview of our entire approach. ![](https://i-blog.csdnimg.cn/direct/f7f34ae6c6204b7881430a5e7161b9b3.png) ![](https://i-blog.csdnimg.cn/direct/d989edfb2a7d48be9ad3e0dc56052592.png) Fig. 2: The architectures in our search space are variants of MobileNetV2 family of models \[22\], \[27\], \[28\], \[56\]. (a) Each networks consists of five stages. Each stage has two to four layers. Each layer is an inverted residual bottleneck block. The search space includes, input image resolution (R), width multiplier (W), the number of layers in each stage, the # of output channels (expansion ratio E) of the first 1 × 1 convolution and the kernel size (K) of the depth-wise separable convolution in each layer. (b) Networks are represented as 22-integer strings, where the first two correspond to resolution and width multiplier, and the rest correspond to the layers. Each value indicates a choice, e.g. the third integer ( L 1 ) takes a value of "1" corresponds to using expansion ratio of 3 and kernel size of 3 in layer 1 of stage 1. ![](https://i-blog.csdnimg.cn/direct/6e096eebbfd4484b9b33c823c8be19ad.png) ![](https://i-blog.csdnimg.cn/direct/cbec238d1db7468ca0828182e61f65d8.png) **3.2 Search Space and Encoding** The search for optimal network architectures can be performed over many different search spaces. The generality of the chosen search space has a major influence on the quality of results that are feasible. We adopt a modular design for overall structure of the *network* , consisting of a stem, multiple stages and a tail (see Fig. 2a). The *stem* and *tail* are common to all networks and not searched. Each *stage* in turn comprises of multiple layers, and each *layer* itself is an inverted residual bottleneck structure \[56\]. -Network: We search for the input image resolution and the width multiplier (a factor that scales the # of output channels of each layer uniformly \[57\]). Following previous work \[27\], \[28\], \[31\], we segment the CNN architecture into five sequentially connected stages. The stages gradually reduce the feature map size and increase the number of channels (Fig. 2a *Left* ). -Stage: We search over the number of layers, where only the first layer uses stride 2 if the feature map size decreases, and we allow each block to have minimum of two and maximum of four layers (Fig. 2a *Middle* ). -Layer: We search over the expansion ratio (between the # of output and input channels) of the first 1 × 1 convolution and the kernel size of the depth-wise separable convolution (Fig. 2a *Right* ). ![](https://i-blog.csdnimg.cn/direct/3b9c146594f84ee78257f5b22292c401.png) Fig. 3: Top Path: A typical process of evaluating an architecture in NAS algorithms. Bottom Path: Accuracy predictor aims to bypass the time consuming components for evaluating a network's performance by directly regressing its accuracy f from a (architecture in the encoded space). Overall, we search over four primary hyperparameters of CNNs i.e., the depth (# of layers), the width (# of channels), the kernel size, and the input resolution. The resulting volume of our search space is approximately 3 . 5 × 10 19 for each combination of image resolution and width multiplier. To encode these architectural choices, we use an integer string of length 22, as shown in Fig. 2b. The first two values represent the input image resolution and width multiplier, respectively. The remaining 20 values denote the expansion ratio and kernel size settings for each of the 20 layers. The available options for expansion ratio and kernel size are \[3, 4, 6\] and \[3, 5, 7\], respectively. It is worth noting that we sort the layer settings in ascending #MAdds order, which is beneficial to the mutation operator used in our evolutionary search algorithm. **3.3 Accuracy Predictor** The main computational bottleneck of NAS arises from the nested nature of the bi-level optimization problem. The inner optimiza tion requires the weights of the subnets to be thoroughly learned prior to evaluating its performance. Methods like weight-sharing \[31\], \[46\], \[50\] allow sampled subnets to inherit weights among themselves or from a supernet, avoiding the time-consuming process (typically requiring hours) of learning weights through SGD. However, standalone weight-sharing still requires inference on validation data (typically requiring minutes) to assess per formance. Therefore, simply having to evaluate the subnets can still render the overall process computationally prohibitive for methods \[8\], \[27\], \[38\] that sample thousands of architectures during search. To mitigate the computational burden of fully evaluating the subnets, we adopt a surrogate accuracy predictor that regresses the performance of a sampled subnet without performing training or inference. By learning a functional relation between the integer strings (subnets in the encoded space) and the corresponding performance, this approach decouples the evaluation of an archi tecture from data-processing (including both SGD and inference). Consequently, the evaluation time reduces from hours/minutes to seconds. We illustrate this concept in Fig. 3. The effectiveness of this idea, however, is critically dependent on the quality of the surrogate model. Below we identify three desired properties of such a model: 1) Reliable prediction: high rank-order correlation3 between predicted and true performance. ![](https://i-blog.csdnimg.cn/direct/2893a7a90e3f4488b3fcf814cd46883a.png) Fig. 4: Accuracy predictor performance as a function of training samples. For each model, we show the mean and standard deviation of the Spearman rank correlation on 11 datasets (Table 3). The size of RBF ensemble is 500. 2) Consistent prediction: the quality of the prediction should be consistent across different datasets. 3) Sample efficiency: minimizing the number of training examples necessary to construct an accurate predictor model, since each training sample requires costly training and evaluation of a subnet. Current approaches \[23\], \[29\], \[31\] that use surrogate based accuracy predictors, however, do not satisfy property (1) and (3) simultaneously. For instance, PNAS \[23\] uses 1,160 subnets to build the surrogate but only achieves a rank-order correlation of 0.476. Similarly, OnceForAll \[31\] uses 16,000 subnets to build the surrogate. The poor sample complexity and rank-order correlation of these approaches, is due to the offline learning of the surrogate model. Instead of focusing on models that are at the trade-off front of the objectives, these surrogate models are built for the entire search space. Consequently, these methods require a significantly larger and more complex surrogate model. We overcome the aforementioned limitation by restricting the surrogate model to the search space that constitutes the current objective trade-off. Such a solution significantly reduces the sam ple complexity of the surrogate and increases the reliability of its predictions. We adopt four low-complexity predictors, namely, Gaussian Process (GP) \[29\], Radial Basis Function (RBF) \[45\], Multilayer Perceptron (MLP) \[23\], and Decision Tree (DT) \[58\]. Empirically, we observe that RBFs are consistently better than the other three models if the # of training samples is more than 100. To further improve RBF's performance, especially under a high sam ple efficiency regime, we construct an ensemble of RBF models. As outlined in Algorithm 2, each RBF model is constructed with a subset of samples and features randomly selected from the training instances. The correlation between predicted accuracy and true accuracy from an ensemble of 500 RBF models outperforms all ![](https://i-blog.csdnimg.cn/direct/6076af184fbc441faf0fabfbf56b4dfe.png) ![](https://i-blog.csdnimg.cn/direct/3f936a849a70495ea2b8fade3294abec.png) Fig. 5: (a) Crossover Operator : new offspring architectures are created by recombining integers from two parent architectures. The probability of choosing from either one of the parents is equal. (b) Mutation Operator : histograms showing the probabilities of mutated values with current value at 5 under different hyperparameter η m settings. ![](https://i-blog.csdnimg.cn/direct/f3414289012b4b458df95dbe3ac3c8c4.png) integer. The PM operator inherits the *parent-centric* convention, in which the offspring are intentionally created around the parents. The centricity is controlled via an index hyperparameter η m . In particular, high-values of η m tend to create mutated offspring around the parent, and low-values encourage mutated offspring to be further away from the parent architecture. See Fig. 5b for a visualization of the effect of η m . It is the worth noting that the PM operator was originally proposed for continuous optimization where distances between variable values are naturally defined. In contrast, in context of our encoding, our variables are categorical in nature, indicating a particular layer hyperparameter. So we sort the searched subnets in ascending order of #MAdds, such that η m now controls the difference in #MAdds between the parent and the mutated offspring. We apply PM to every member in the offspring population (created from crossover). We then merge the mutated offspring population with the parent population and select the top half using many-objective selection operator described in Algorithm 4. This procedure creates the parent population for the next generation. We repeat this overall process for a pre-specified number of generations and output the parent population at the conclusion of the evolution. **3.5 Many-Objective Selection** In addition to high predictive accuracy, real-world applications demand NAS algorithms to simultaneously balance a few other conflicting objectives that are specific to the deployment scenarios. For instance, mobile or embedded devices often have restrictions in terms of model size, multiply-adds, latency, power consump tion, and memory footprint. With no prior assumption on the correlation among these objectives, a scalable (to the number of objectives) selection is required to drive the search towards the high dimensional Pareto front. In this work, we adopt the reference point guided selection originally proposed in NSGA-III \[11\], which has been shown to be effective in handling problems ![](https://i-blog.csdnimg.cn/direct/09bff2f22303454688baa458cb4f9c7a.png) ![](https://i-blog.csdnimg.cn/direct/6f2856d96ef94869b13a4769bac23319.png) ![](https://i-blog.csdnimg.cn/direct/56601f56c8c145d28077f934be7a05cd.png) ![](https://i-blog.csdnimg.cn/direct/560fd00c606846f58e0c3026c77e2b82.png) A solution a i is said to be non-dominated if these conditions hold against all the other solutions a j (with j = i ) in the entire search space of a . With the above definition, we can sort solutions to different ranks of domination, where solutions in the same rank are non dominated to each other, and there exists at least one solution in lower rank that dominates any solution in the higher rank. Thus, a lower non-dominated ranked set is lexicographically better than a higher ranked set. This process is referred as *non dominated sort* , and it is the first step in the selection process. During the many objective selection process, the lower ranked sets are chosen one at a time until no more sets can be included to maintain the popu lation size. The final accepted set may have to be *split* to choose only a part. For this purpose, we choose the most diverse subset based on a diversity-maintaining mechanism. We first create a set of reference directions from a set of uniformly distributed (in ( m − 1 )-dimensional space) reference points in the unit simplex by using Das-and-Dennis method \[61\]. Then we associate each solution to a reference direction based on orthogonal distance of the solution from the direction. Then, for every reference direction, we choose the closest associated solution in a systematic manner by adaptively computing a niche count ρ so that every reference direction gets an equal opportunity to choose a representative closest solution in the selected population. The domination and diversity-preserving procedures are easily scalable to any number of objectives and importantly are free from any user-defined hyperparameter. See Algorithm 4 for the pseudocode and Fig. 6 for a graphical illustration. A more elaborated discussion on the necessity of the reference point based selection is provided in Section B. **3.6 Supernet Adaptation** Instead of training every architectures sampled during search from scratch, NAS with weight sharing \[24\], \[46\] inherits weights from previously-trained networks or from a supernet. Directly inheriting the weights obviates the need to optimize the weights from scratch and speeds up the search from thousands of GPU days to only a few. In this work, we focus on the supernet approach \[10\], \[31\]. It involves first training a large network model (in which search able architectures become subnets) prior to the search. Then the performance of the subnets, evaluated with the inherited weights, is used to guide the selection of architectures during search. The key to the success of this approach is that the performance of the subnets with the inherited weights be highly correlated with the performance of the same subnet when thoroughly trained from scratch. Satisfying this desideratum necessitates that the supernet weights be learned in such a way that *all* subnets are optimized *simultaneously* . Existing methods \[30\], \[53\] attempt to achieve the above goal by imposing *fairness* in training the supernet, where the proba bilities of training any particular subnet for each batch of data is uniform in expectation. However, we argue that simultaneously training all the subnets in the search space is practically not feasible and, more importantly, not necessary. Firstly, it is evident from existing NAS approaches \[26\], \[62\] that different objectives (#Params, #MAdds, latency on different hardware, etc.) require ![](https://i-blog.csdnimg.cn/direct/b94b4d08ebb2431ea963485a648dfb3b.png) 1.联系看这篇文章的末尾: [基于深度学习的人脸情绪识别检测系统(VGG、CNN、ResNet)-CSDN博客](https://blog.csdn.net/weixin_42380711/article/details/133078875?spm=1001.2014.3001.5502 "基于深度学习的人脸情绪识别检测系统(VGG、CNN、ResNet)-CSDN博客")

相关推荐
技能咖17 分钟前
2025春招市场迎AI热潮:生成式人工智能(GAI)认证如何重构人才竞争力
人工智能
2301_764441331 小时前
基于BERT的序列到序列(Seq2Seq)模型,生成文本摘要或标题
人工智能·python·深度学习·bert
说私域1 小时前
开源链动2+1模式与AI智能名片赋能的S2B2C共享经济新生态
人工智能·微信·小程序·开源
蹦蹦跳跳真可爱5892 小时前
Python----计算机视觉处理(Opencv:霍夫变换)
人工智能·python·opencv·计算机视觉
Angel Q.2 小时前
3D点云的深度学习网络分类(按照作用分类)
深度学习·3d·分类
livefan2 小时前
英伟达「虚拟轨道+AI调度」专利:开启自动驾驶3.0时代的隐形革命
人工智能·机器学习·自动驾驶
wd2099882 小时前
手绘的思维导图怎么转成电子版思维导图?分享今年刚测试出来的方法
人工智能·powerpoint
魔珐科技2 小时前
专访中兴通讯蒋军:AI数字人驱动企业培训,“内容生产”与“用户体验”双重提升
人工智能·aigc·ai数字人